For each i, find the next entry j (j > i) such that the value is lower (A[j] < A[i]).
Fill out B[i] = j for all i.
For array size N this can be done in O(N).
Here's how :
I'll call this algorithm "stack of fences". Walk the array A[] from start to finish in one pass.
At i, if the next entry (A[i+1]) is lower than the current (A[i]) then you have the ordering you want immediately and you just assign B[i] = i+1.
If not, then you have a "fence", a value A[i] which is seeking a lower value. You don't go looking for it immediately, instead you just set the current fence_value to A[i] and move on via i++.
At each position you visit when you have a fence, you check if the current A[i] < fence_value ? If so, you set B[fence_pos] = i ; you have found the successor to that fence.
If you have a fence and find another value which needs to be a fence (because it's lower than its successor) you push the previous fence on a stack, and set the current one as the active fence. Then when you find a value that satisfies the new fence, you pop off the fence stack and also check that fence to see if it was satisfied as well. This stack can be stored in place in the B[] array, because the B[] is not yet filled out for positions that are fences.
The pseudocode is :
fence_val = fence_pos = none
for(int i=1;i
<
size;i++)
{
int prev = A[i-1];
int cur = A[i];
if ( cur > prev )
{
// make new fence and push stack
B[i_prev] = fence_pos;
fence_pos = i_prev;
fence_val = prev;
}
else
{
// descending, cur is good :
B[i_prev] = i;
while( cur < fence_val )
{
prev_fence = B[fence_pos];
B[fence_pos] = i;
fence_pos = prev_fence;
if ( fence_pos == -1 )
{
fence_val = -1;
break;
}
fence_val = A[fence_pos];
}
}
}
This is useful in string matching, as we will see forthwith.
5 comments:
I'm not sure this is O(N). What if the input is an array starting with even integers from 2..M and ending with odd integers from 1..M-1.
It's pretty trivial to prove.
Every iteration writes to B[i] at least once
each B[i] can written to at most twice (once for a fence and once for its final correct value)
there are N elements of B[]
therefore # of operations is >= N and <= 2N
QED
The full 2N time is taken by
2,4,6,8..M,M-1,M-3,M-5,...1
I wish I could find the version of this algorithm that works for windowed ranges, not just monotonic conditions.
Thanks for all your posts, they've been very helpful. I implemented this algorithm, and discovered that it was incomplete. I'm sure you handled this in your local implementation years ago, but I wanted to document it for others who find this helpful as well.
The problem happens when the stack is not empty at the end of the loop. In this case, the stack holds all the values "i" that don't yet have any "j" such that "j > i && A[j] < A[i]". There are no more "j" values to consider, so this condition will never become true.
In your notation, "B[i] = j", so we want the invariant on exit to be that "B[i] > i && A[B[i]] < A[i]".
Since the stack is stored in the B[] array in the order encountered, and since i increments, these entries for B[i] do satisfy "B[i] > i". The fact that they are still on the stack means "!(B[i] > i && A[B[i]] < A[i])". Since "B[i] > i", this proves "A[B[i]] >= A[i]". If the "A[i]" array has no duplicates, as in the suffix array use case, this becomes a strict inequality: "A[B[i]] > A[i]".
Again, in the suffix array case, this means the array that was supposed to only hold pointers to longest *past* matches also has some pointers to *future* matches.
There are two fixes I can think of. The most obvious is, after exiting the loop, just pop the stack until it is empty, setting B[i] to "none" as you go.
Another solution is to have a sentinel token A[end] at the end of the list that satisfies A[end] < A[i] for all i != end. If "end" is the same as "none", these two solutions are effectively equivalent. The first solution is more obvious, the second solution uses less code and avoids special case handling.
Yep, you're totally right, I failed to mention that.
Both your solutions are good.
Lots of stuff in suffix trees & suffix arrays works neatly if you have a sentinel token at the end, unfortunately we don't get a byte that's > 255 ;) So in practice the code gets a lot uglier with lots of special case handling for the end-of-string case that would've been handled very neatly with a sentinel.
I use the first option of manually bubbling back a null entry at the end. It's in the String Match Test code that I released, in MakeNextLowerPosArray.
Post a Comment