It's a shame it never got a good mainstream implementation. It could/should have been the LZ we all used for the past 10 years.
One of the little mistakes in LZX was the 21 bit offset limit. This must have seemed enormous back on the Amiga in 1995, but very quickly became a major handicap against LZs with unlimited windows.
LZX with unlimited window (eg. on files less than 2 MB) is competitive with any modern LZ, especially on binary structured data where it really shines. In hindsight, LZX is the clear ancestor to LZMA and it was way ahead of its time. We're only clearly beating it in the past year or two (!!).
2. RAR. The primary LZ in RAR is a pretty straightforward LZ-Huff (I believe). It's fine, it's nothing bad or special.
What makes RAR special is the filters. It still has the best filters of any compressor I know.
RAR+filters often *beats* LZMA and other very slow high ratio compressors.
The special thing about the RAR filters is that they aren't like most of the "precomp" solutions that just try to recognize WAV headers and things like that - RAR may do some of that (I have no idea) - but it also definitely finds filters that work on headerless data. Like, you can take a BMP or WAV and strip off the header and RAR will still figure out that there's data to filter in there; it must have some analysis heuristics, and they're better than anything else I've seen.
As an example of when RAR filters do magic, here's a 24-bit RGB BMP with the first 100k stripped, so it's headerless and not easily recognized by file-type-detection filters :
PDI_1200_bmp_no_header.zl8.LZNA,1241369 PDI_1200_bmp_no_header.nz,1264747 PDI_1200_bmp_no_header.BitKnit,1268120 // <- wow BitKnit ! PDI_1200_bmp_no_header.LZNA,1306670 PDI_1200_bmp_no_header.rar,1312621 // <- RAR filters! PDI_1200_bmp_no_header.7z,1377603 PDI_1200_bmp_no_header.brotli10,1425996 PDI_1200_bmp_no_header_lp2.7z,1469079 PDI_1200_bmp_no_header.Kraken,1506915 PDI_1200_bmp_no_header.lzx21,1580593 PDI_1200_bmp_no_header.zstd060,1619920 PDI_1200_bmp_no_header.mc-.rar,1631419 // <- RAR unfiltered PDI_1200_bmp_no_header.brotli9,1635105 PDI_1200_bmp_no_header.z9.zip,1708589 PDI_1200_bmp_no_header.lz4xc4,1835955 PDI_1200_bmp_no_header.raw,2500000
That said, it does make mistakes. Sometimes filters can make things way worse if they make a wrong decision. They don't have a "filters must help" safety check. This is easy to prevent, you just run also with no filter and make sure it helped, but they seem to not do that (presumably to save the encode time) and the results can be disastrous :
lightmap.bc3.LZNA,361185 lightmap.bc3.7z,373909 lightmap.bc3_lp2.7z,387590 lightmap.bc3.brotli10,391602 lightmap.bc3.BitKnit,416208 lightmap.bc3.zstd060,417956 lightmap.bc3.Kraken,431476 lightmap.bc3.lzx21,441669 lightmap.bc3.mc-.rar,457893 // <- RAR with disabled filters lightmap.bc3.brotli9,498802 lightmap.bc3.z9.zip,583178 lightmap.bc3.rar,778363 // <- RAR with filters huge fuckup !! lightmap.bc3.raw,4194332RAR filters fucking up on DXTC (BCn) is pretty consistent :
c.dds.nz,371610 c.dds.7z,371749 c.dds.zl8.LZNA,371783 c.dds.LZNA,373245 c.dds_lp2.7z,375384 c.dds.lzx21,395866 c.dds.brotli10,399674 c.dds.BitKnit,400528 c.dds.Kraken,405563 c.dds.mc-.rar,408363 // <- unfiltered is okay c.dds.zstd060,411515 c.dds.brotli9,426948 c.dds.z9.zip,430952 c.dds.rar,438070 // <- oops! c.dds.raw,524416Sometimes it does magic :
horse.vipm_lp2.7z,925996 horse.vipm.LZNA,942950 horse.vipm.7z,945707 horse.vipm.brotli10,955363 // <- brotli10 big step horse.vipm.rar,971716 // <- RAR with filters does magic horse.vipm.BitKnit,1017740 horse.vipm.lzx21,1029541 horse.vipm.mc-.rar,1066205 // <- RAR with disabled filters horse.vipm.zstd060,1100219 horse.vipm.Kraken,1106081 horse.vipm.brotli9,1108858 horse.vipm.z9.zip,1155056 horse.vipm.raw,1573070Here's an XRGB dds where the RAR filters do magic :
d.dds.zl8.LZNA,352232 d.dds.nz,356649 d.dds.BitKnit,360220 // (at zl6 BitKnit beats LZNA ! crushes 7z! wow) d.dds.LZNA,381250 d.dds.rar,382282 // <- RAR filter crushes 7z d.dds_lp2.7z,427395 d.dds.7z,452898 d.dds.brotli10,471413 d.dds.Kraken,480257 d.dds.lzx21,520632 d.dds.mc-.rar,534913 // <- RAR unfiltered is poor d.dds.brotli9,542792 d.dds.zstd060,545583 d.dds.z9.zip,560708 d.dds.raw,1048704
happy.zl8.LZNA,949709 happy.LZNA,955700 happy_lp2.7z,974550 happy.BitKnit,979832 happy.7z,1004359 happy.cOO.nz,1015048 happy.co.nz,1028196 happy.Kraken,1109748 happy.brotli10,1135252 happy.lzx21,1168220 happy.mc-.rar,1177426 // <- RAR unfiltered is okay happy.zstd060,1199064 happy.brotli9,1219174 happy.rar,1354649 // <- RAR filters fucks up happy.z9.zip,1658789 happy.lz4xc4,2211700 happy.raw,4155083Not about RAR, but for historical comparison, lzt24 is another mesh (the "struct72" file here )
lzt24.zl8.LZNA,1164216 lzt24.LZNA,1177160 lzt24.nz,1206662 lzt24_lp2.7z,1221821 lzt24.BitKnit,1224524 lzt24.7z,1262013 lzt24.Kraken,1307691 lzt24.brotli10,1323486 lzt24.brotli9,1359566 lzt24.lzx21,1475776 lzt24.zstd060,1498401 lzt24.mc-.rar,1612286 lzt24.rar,1612286 lzt24.z9.zip,2290695 lzt24.raw,3471552Found another weird one where RAR filters do magic; lzt25 is super-structured 13-byte structs :
lzt25.rar,40024 // <- WOW RAR filters! lzt25.nz,45397 lzt25.7z,51942 lzt25_lp2.7z,52579 lzt25.LZNA,58903 lzt25.zl8.LZNA,61582 // <- zl8 LZNA worse than zl6 - weird file lzt25.lzx21,63198 lzt25.zstd060,64550 // <- ZStd does surprisingly well here, I thought you needed more reps on this file lzt25.brotli9,67856 lzt25.Kraken,67986 lzt25.brotli10,68472 // <- brotli10 worse than brotli9 ! lzt25.BitKnit,92940 // <- BitKnit oddly struggling lzt25.mc-.rar,106423 // <- unfiltered RAR is the worst of the LZ's lzt25.z9.zip,209811 lzt25.lz4xc4,324125 lzt25.raw,1029744
A lot of interesting things to pick out in those reports. (just saying, I'm not gonna address them all)
One just general thing is that the performance of these LZ's is in no way consistent. You can't just say that "X LZ is 5% better than Y", there's no really consistent pattern, they have wildly variable relative performance.
There's a family of sort of normal LZ's - LZX, Brotli9, ZStd, & unfiltered RAR. Then there's the family of the high-compress LZ's, LZNA, 7z, nz. Those are pretty consistently together, and form two end-points.
But then there are the floaters. BitKnit, Kraken, Filtered RAR, and Brotli10 can jump around between the "normal LZ" and "high-compress LZ" region. BitKnit and Brotli10 are the most variable - they both can jump right up to the high-compress LZ's like 7z, but on other files they drop right down into the pack of normal LZ's (LZX, etc.).
I have a guess about what's happening with Brotli. I haven't looked at the code at all, but my guess is that between level 9 and 10 the order-1 context optimization is turned on. In particular, there's this "signed int" context mode which I believe is what does the magic for brotli on things like horse.vipm (for example it has contexts for the case of last two bytes = 0x0000 , or last two bytes = 0xFFFF , which are pretty common on horse). My guess is that this mode is just not even tried at all at level 9, and at level 10 it turns on the code to pick the best context mode, and finds the signed int mode which is great on these files. Not sure.
No comments:
Post a Comment