5/27/2012

05-27-12 - Prefer Push-Restore to Push-Pop

It's a standard paradigm to use a Stack for a subsystem which parallels the execution stack.

Some common examples : profilers that push/pop scopes, logging scopes such as setting tab depth or subsystem, mutex lock scopers, exception handler stacks, etc. Basically the model is at the start of some function you push something on your parallel stack, and pop on exit.

I've only recently realized that this is bad programming. The problem is it's redundant with the execution stack, and any time you have two separate subsystems which must be exactly in sync to work correctly, you have bad code. In particular, it's (unnecessarily) fragile.

To be concrete lets consider a "stack" that's just an integer counting the depth for something. You might naively maintain it like :


Func1()
{
  s_depth ++; // push

  ... other stuff ..
  Func2()

  s_depth --; // pop
}

Now, sophomoric programmers may already be thinking you could use a "scoper" class to do the push/pop for you; as long as you use the scoper that ensures that you do push/pops in pairs, and it eliminates one type of bug, which is accidentally returning without popping.

But really the scoper is just putting a band-aid on the problem. There is a whole other class of bugs that you cannot fix that way - what if Func2() is written wrong and has a mismatched push/pop ? It will fuck up the stack to you. The entire mechanism is fragile because it is vulnerable to inheriting mistakes in any child code.

Now, sophomoric programmers are thinking "of course it inherits mistakes from calling broken code", they think that's just the way code is. They think the way you make working programs is by finding every single bug in every possible execution path and fixing them all. They think fragile code that has to be used "just so" is fine, because they will ensure that the code is used in exactly the right way.

This is very wrong, and it's why so many programs suck so bad. It comes from a great arrogance that is very common in programmers from beginners to experts. It's the (untrue) belief that you can write bug free code. It's the belief that fragile systems without fail-safes and inherent self-protection are okay because you are so great you will use them correctly.

It becomes obvious if we add the invariant checks that test program correctness :


Func1()
{
  int depthBefore = s_depth;
  s_depth ++; // push

  ... other stuff ..
  Func2()

  s_depth --; // pop
  ASSERT( s_depth == depthBefore );
}

The assert checks that after the pop, the stack should be back to where it was before the push.

But if that's the requirement, why not just do that directly? It's much more robust, it's not sensitive to unmatched push/pops in the children. So instead of Push/Pop we should just do Push/Restore :


Func1()
{
  int depthBefore = s_depth++; // push

  ... other stuff ..
  Func2()

  s_depth = depthBefore; // restore
}

A few quick notes on this : 1. Obviously for super-robust code you should run both methods simultaneously and check that they match; in case of mismatch, perhaps prompt the user (footnote *1*), or just prefer the more robust, and 2. it's always a good idea to write out the asserts that check the invariants you believe to be true in the code. Often you will find that the assert is simpler than the code it was checking. 3. the code is more robust if you just go ahead and make the invariant true. eg. you often see something like :


int Func3(int x)
{
  ASSERT( x == 1 || x == 3 );
  int y = x * 4 + 12;
  ASSERT( y == 16 || y == 24 );
  return y;
}

well if you know the parameters and you know what the answer should be, then just fucking return that. If you require a simple invariant, then just fucking make it true. Don't assume it's true without testing it. Asserting is slightly better, but it's still fragile. Just make it true.

Anyhoo, our s_depth example is so trivial let's do a slightly more complex one, a profiler :


struct ProfBlock { tick_t start; tickt_t end; }
vector<ProfBlock> s_profile_vec;

void Profile_Push()
{
  s_profile_vec.push_back();
  s_profile_vec.back().start = tick();
}

void Profile_Pop()
{
  s_profile_vec.back().end = tick();
  // [start,end] for this block is now set, as is stack depth, store it somewhere permanent
  s_profile_vec.pop_back();
}

Func1()
{
  Profile_Push();

  ... other stuff ..
  Func2()

  Profile_Pop();
}

and as noted in the comment, I assume you actually do something with the block once it's recorded, but the details of that are not necessary here.

Anyhoo, this is very standard, and it's sort of okay, but it's fragile. The issue is in Profile_Pop() - you are making a pure leap of faith that back() is actually the element you should be popping.

(in particular, a very common source of bugs or fragility in this type of code is if the Profiler can be enabled & disabled; even if you always use a Push/Pop scoper class to avoid unmatched pairs, people can enable/disable anywhere and that creates a mismatch (assuming you don't continue to do the push/pops when disabled)).

A better way is Push/Retore :


int Profile_Push()
{
  int i = s_profile_vec.size();
  s_profile_vec.push_back();
  s_profile_vec.back().start = tick();
  return i;
}

void Profile_Restore(int i)
{
  s_profile_vec[i].end = tick();
  // [start,end] for this block is now set, as is stack depth, store it somewhere permanent
  s_profile_vec.resize(i);
}

Func1()
{
  int p = Profile_Push();

  ... other stuff ..
  Func2()

  Profile_Restore(p);
}

If you like you can assert in Restore that it's equivalent to a Pop. But crucially, if you are doing the redundant method and asserting, the fallback behavior should be Restore. (in practice for me, once I realized that Push-Restore is what I really want, that opened the door for me to allow Pushes without Pops, so I don't check that Restore is equal to a Pop, I let them be unequal).

What is the fundamental principle that makes this so much more robust that doing push/pop? It's elimination of parallel states that must be kept in sync. That's one of the greatest causes of bugs and should be done whenever possible; I don't do it enough, it's so tempting sometimes and you don't really see the necessity of it (because we all have Programmer Arrogance that makes us think we can keep it sync correctly).

eg. you constantly see code like :


struct Blah
{
  int m_x;

  int m_y;

  // NOTE : m_y must = m_x * 2;
};

At least this example code has a note about the redundant relationship that must be kept in sync, which is better than lots of real production code, but it's still just a bug waiting to happen. The redundant state should be eliminated and any queries for m_y should just go directly to the single variable that is authoritative.

That's obvious, but the stack is the exact same situation. The thing is, the program is maintaining its own stack, the execution stack, so any other stack that mirrors the execution stack is redundant state. You should just use the execution stack, which is what Push/Restore does for you.

3 comments:

castano said...

On a similar note, instead of having parallel data structures like:

std::vector stack;

struct Scope {
Scope(...) {
stack.push(Data(...));
}
~Scope() {
stack.pop();
}
};

or if you prefer:

struct Scope {
Scope(...) {
restore = stack.size();
stack.push(Data(...));
}
~Scope() {
stack.resize(restore);
}
int restore;
};

Often a better approach is to simply use the program stack:

struct Scope {
Scope * next;
Data data;
};

Scope * top = NULL;

Scope::Scope(...) : data(...) {
next = top;
top = this;
}
Scope::~Scope() {
top = next;
}

Jonathan said...

For this particular example, castano has the right idea, but I'm not buying your argument against a scope class either.

If you were using such a technique you would write it such that the scope class is the only API available. A potential bad implementer of Func2 can't subvert that without active malice on their part.

cbloom said...

@ Jonathan -

It doesn't require active malice to break.

I've seen it many times with enables/disables.

Things like profilers, memory trackers, etc. usually get enabled & disabled.

You wind up having incredibly fragile rules, like "make sure you don't put a profile scope around the key-processing function because there's a key that can enable/disable the profiler".

old rants