Chart of clocks per byte, on each file of a test set :
On my "lztest" data set :
0 = almost_incompressable 1 = bigship1.x 2 = Dolphin1.x 3 = GrannyRocks_wot.gr2 4 = Gryphon_stripped.gr2 5 = hetero50k 6 = movie_headers.bin 7 = orange_purple.BMP 8 = particles.max 9 = pixo_run_animonly_stripped.gr2 10 = poker.bin 11 = predsave.bin 12 = quick.exe 13 = RemoteControl_stripped.gr2 14 = ScriptVolumeMgr.cpp
"lztest" is not a stress test set, it's stuff I've gathered that I think is roughly reflective of what games actually compress. It's interesting that this data set causes lots of DNF's (did not finish) for MMC and LzFind.
Suffix5 (the real suffix trie) is generally slightly faster than the suffix array. It should be, of course, if I didn't do a bonehead trie implementation, since the suffix array method basically builds a trie in the sort, then reads it out to sorted indexes, and then I convert the sorted indexes back to match lengths.
Good old CCC (Calgary Compression Corpus) :
0 = BIB 1 = BOOK1 2 = BOOK2 3 = GEO 4 = NEWS 5 = OBJ1 6 = OBJ2 7 = PAPER1 8 = PAPER2 9 = PAPER3 10 = PAPER4 11 = PAPER5 12 = PAPER6 13 = PIC 14 = PROGC 15 = PROGL 16 = PROGP 17 = TRANSI won't be showing results on CCC for the most part because it's not very reflective of real world modern data, but I wanted to run on a set where MMC and LzFind don't DNF too much to compare their speed when they do succeed. Suffix Trie is almost always very close to the fastest except on paper4 & paper5 which are very small files.
0 = BIB 1 = BOOK1 2 = BOOK2 3 = GEO 4 = NEWS 5 = OBJ1 6 = OBJ2 7 = PAPER1 8 = PAPER2 9 = PROGC 10 = PROGL 11 = PROGP 12 = TRANSTwo new tests in the mix.
Test_Hash1 : traditional "Zip" style fixed size hash -> linked list. In this run there's no chain limit so matching is exact.
Test_Hash3 : uses cblib::hash_table (a reprobing ("open addressing" or "closed hashing", I prefer reprobing)) to hash the first 4 bytes then a linked list. I was surprised to find that this is almost the same speed as Hash1 and sometimes faster, even though it's a totally generic template hash table (that is not particularly well suited to this usage).
No comments:
Post a Comment