The good way to do the standard Zip-style Hash -> Linked List for LZ77 parsing.
There are two tables : the hash entry point table, which gives you the head of the linked list, and the link table, which is a circular buffer of ints which contain the next position where that hash occured.
That is :
hashTable[ hash ] contains the last (most recent preceding) position that hash occured
chainTable[ pos & (window_size-1) ] contains the last position of the hash at pos before pos
To walk the table you do :
i = hashTable[ hash ];
while ( i in window )
i = chainTable[ i & (window_size-1) ]
To update the table you do :
head = hashTable[ hash ];
hashTable[hash] = pos;
chainTable[ pos & (window_size-1) ] = head;
And now for some minor details that are a bit subtle. We're going to go through "Hash1" from StringMatchTest
which I know I still haven't posted.
int64 Test_Hash1(MATCH_TEST_ARGS)
{
uint8 * buf = (uint8 *)charbuf;
const int window_mask = window_size-1;
vector
for small files or small windows, you can only get good per-byte speed if you make hash size proportional with
that MIN. (what you can't see is outside the Test_ I also made window_size be no bigger than the smallest
power of 2 that encloses file size).
<
int> chain; // circular buffer on window_size
chain.resize(window_size);
int * pchain = chain.data();
const int hash_size = MIN(window_size,1<<20);
const int hash_mask = hash_size-1;
vector
As noted previously, for large hashes you can get a big win by using a malloc that gives you zeros. I don't do
it here for fairness to the other tests. I do make sure that my initialization value is zero so you can switch
to VirtualAlloc/calloc.
<
int> hashTable; // points to pos of most recent occurance of this hash
hashTable.resize(hash_size);
int * phashTable = hashTable.data();
memset(phashTable,0,hash_size*sizeof(int));
int64 total_ml = 0;
// fiddle the pointers so that position 0 counts as being out of the window
int pos = window_size+1;
buf -= pos;
ASSERT( (char *)&buf[pos] == charbuf );
size += pos;
I don't want to do two checks in the inner loop for whether a position is a null link
vs. out of the window. So I make the "null" value of the linked list (zero) be out of the window.
for(;pos<(size-TAIL_SPACE_NEEDED);)
{
MATCH_CHECK_TIME_LIMIT();
// grab 4 bytes (@ could slide here)
uint32 cur4 = *((uint32 *)&buf[pos]);
//ASSERT( cur4 == *((uint32 *)&buf[pos]) );
On PC's it's fastest just to grab the unaligned dword. On PowerPC it's faster to slide bytes through
the dword. Note that endian-ness changes the value, so you may need to tweak the hash function differently
for the two endian cases.
// hash them
uint32 hash = hashfour(cur4) & hash_mask;
int hashHead = phashTable[hash];
int nextHashIndex = hashHead;
int bestml = 0;
int windowStart = pos-window_size;
ASSERT( windowStart >= 0 );
#ifdef MAX_STEPS
int steps = 0;
#endif
Start the walk. Not interesting.
while( nextHashIndex >= windowStart )
{
uint32 vs4 = *((uint32 *)&buf[nextHashIndex]);
int hi = nextHashIndex&window_mask;
if ( vs4 == cur4 )
{
int ml = matchlenbetter(&buf[pos],&buf[nextHashIndex],bestml,&buf[size]);
if ( ml != 0 )
{
ASSERT( ml > bestml );
bestml = ml;
// meh bleh extra check cuz matchlenbetter can actually go past end
if ( pos+bestml >= size )
{
bestml = size - pos;
break;
}
}
}
#ifdef MAX_STEPS
if ( ++steps > MAX_STEPS )
break;
#endif
nextHashIndex = pchain[hi];
}
This is the simple walk of the chain of links. Min match len is 4 here which is particularly fast, but other lens
can be done similarly.
"MAX_STEPS" is the optimal "amortize" (walk limit) which hurts compression in an unbounded way but is necessary for speed.
"matchlenbetter" is a little trick ; it's best to check the character at "bestml" first because it is the most likely to differ. If that char doesn't match, we know we can't beat the previous match, so we can stop immediately. After that I check the chars in [4,bestml) to ensure that we really do match >= bestml (the first 4 are already checked) and lastly the characters after, to extend the match.
The remainder just updates the hash and is not interesting :
ASSERT( bestml == 0 || bestml >= 4 );
total_ml += bestml;
// add self :
// (this also implicitly removes the node at the end of the sliding window)
phashTable[hash] = pos;
int ci = pos & window_mask;
pchain[ci] = hashHead;
if ( greedy && bestml > 0 )
{
int end = pos+bestml;
pos++;
ASSERT( end <= size );
for(;pos
<
end;pos++)
{
uint32 cur4 = *((uint32 *)&buf[pos]);
// hash them
uint32 hash = hashfour(cur4) & hash_mask;
int hashHead = phashTable[hash];
phashTable[hash] = pos;
int ci = pos & window_mask;
pchain[ci] = hashHead;
}
pos = end;
}
else
{
pos++;
}
}
return total_ml;
}
Note that for non-greedy parsing you can help the O(N^2) in some cases by setting bestml to lastml-1. This helps enormously in practice because of the heuristic of matchlenbetter but does not eliminate the O(N^2) in theory. (the reason it helps in practice is because typically when you find a very long match, then the next byte will not have any match longer than lastml-1).
(but hash1 is really just not the right choice for non-greedy parsing; SuffixArray or SuffixTrie are vastly superior)
Nice trick
ReplyDelete