string_match_stress_tests.7z (60,832 bytes)
Consists of :
paper1_twice
stress_all_as
stress_many_matches
stress_search_limit
stress_sliding_follow
stress_suffix_forward
An optimal parse matcher (matching at every position in each file against all previous bytes within that file) should get these average
match lengths :
(min match length of 4, and no matches searched for in the last 8 bytes of each file)
paper1_twice : 13294.229727
stress_all_as : 21119.499148
stress_many_matches : 32.757760
stress_search_limit : 823.341331
stress_sliding_follow : 199.576550
stress_suffix_forward : 5199.164464
total ml : 2896554306
total bytes : 483870
Previous post on the same test set : 09-27-11 - String Match Stress Test
And these were used in the String Match Test post series , though there I used "twobooks" instead of "paper1_twice".
These stress tests are designed to make imperfect string matchers show their flaws. Correct implementations of Suffix Array or Suffix Tree searchers should find this total match length without ever going into bad N^2 slowdowns (their speed should be roughly constant). Other matchers like hash-link, LzFind (hash-btree) and MMC will either find lower total match length (due to an "amortize" step limit) or will fall into bad N^2 (or worse!) slowdowns.
No comments:
Post a Comment