SuffixArray3 : suffix array string matcher which uses a min/max tree to find allowed offsets.
The min/max tree is a binary hierarchy ; at level L there are (size>>L) entries, and each entry covers a range of size (1 << L). Construction is O(N) because N/2+N/4+N/8 ... = N
The min/max tree method is generally slightly slower than the elegant "chain of fences" approach used for SuffixArray2, but it's close. The big advantage is the min/max tree can also be used for windowed matching, which is not easy to integrate in SA2.
First check that it satisfies the O(N) goal on the stress tests :
0 = stress_all_as
1 = stress_many_matches
2 = stress_search_limit
3 = stress_sliding_follow
4 = stress_suffix_forward
5 = twobooks
Yep. Then check optimal parse, large window vs. the good options :
0 = ares_merged.gr2_sec_5.dat
1 = bigship1.x
2 = BOOK1
3 = combined.gr2_sec_3.dat
4 = GrannyRocks_wot.gr2
5 = Gryphon_stripped.gr2
6 = hetero50k
7 = lzt20
8 = lzt21
9 = lzt22
10 = lzt23
11 = lzt24
12 = lzt25
13 = lzt27
14 = lzt28
15 = lzt29
16 = movie_headers.bin
17 = orange_purple.BMP
18 = predsave.bin
The good Suffix Trie is clearly the best, but we're in the ballpark.
Now optimal parse, 17 bit (128k) window :
totals:
Test_MMC2 : DNF
Test_LzFind2 : 506.954418 , 1355.992865
Test_SuffixArray3 : 506.954418 , 514.931740
Test_MMC1 : 13.095260 , 1298.507490
Test_LzFind1 : 12.674177 , 226.796123
Test_Hash1 : 503.319301 , 1094.570022
Finally greedy parse, 17 bit window :
totals:
Test_MMC2 : 0.663028 , 110.373098
Test_LzFind2 : DNF
Test_SuffixArray3 : 0.663036 , 236.896551
Test_MMC1 : 0.663028 , 222.626069
Test_LzFind1 : 0.662929 , 216.912409
Test_Hash1 : 0.662718 , 62.385071
average match length :
And once more for clarity :
Greedy parse, 16 bit window , just the good candidates :
totals:
Test_SuffixArray3 : 0.630772 , 239.280605
Test_LzFind1 : 0.630688 , 181.430093
Test_MMC2 : 0.630765 , 88.413339
Test_Hash1 : 0.630246 , 51.980073
It should be noted that LzFind1 is approximate, and Hash1 is even more approximate. Though looking at the match length chart you certainly can't see it.
0 comments:
Post a Comment