string_match_stress_tests.7z (60,832 bytes)
Consists of :
paper1_twice stress_all_as stress_many_matches stress_search_limit stress_sliding_follow stress_suffix_forward
An optimal parse matcher (matching at every position in each file against all previous bytes within that file) should get these average match lengths : (min match length of 4, and no matches searched for in the last 8 bytes of each file)
paper1_twice : 13294.229727 stress_all_as : 21119.499148 stress_many_matches : 32.757760 stress_search_limit : 823.341331 stress_sliding_follow : 199.576550 stress_suffix_forward : 5199.164464 total ml : 2896554306 total bytes : 483870
Previous post on the same test set : 09-27-11 - String Match Stress Test
And these were used in the String Match Test post series , though there I used "twobooks" instead of "paper1_twice".
These stress tests are designed to make imperfect string matchers show their flaws. Correct implementations of Suffix Array or Suffix Tree searchers should find this total match length without ever going into bad N^2 slowdowns (their speed should be roughly constant). Other matchers like hash-link, LzFind (hash-btree) and MMC will either find lower total match length (due to an "amortize" step limit) or will fall into bad N^2 (or worse!) slowdowns.
No comments:
Post a Comment