Slow slow compressors

lzma is really too slow to decode.

total                : Kraken     : 2.914 to 1 : 1053.961 MB/s
total                : lzma       : 3.186 to 1 : 52.660 MB/s

(Win64 Core i7-3770 3.4 GHz)

Kraken is around 20X faster than lzma, but lzma compresses better (about 9%). That's already a tip that something is horribly wrong; you have a 2000% speed difference and a 9% size difference.

If we look at the total time to load compressed from disk + decompress, we can make these speedup factor curves :

At very low disk speeds, the higher compression of lzma provides a speedup over Kraken. But how slow does the disk have to be? You can see the intersection of the curves is between 0 and 1 on the log scale, that's 1-2 MB/s !!

For any disk faster than 2 MB/s , load+decomp is *way* faster with Kraken. At a disk speed of 16 MB/s or so (log scale 4) the full load+decomp for Kraken is around 2X faster than with lzma. And that's still a very slow disk (around optical speed).

Now, this is a speedup factor for load *then* decomp. If you are fully overlapping overlapping IO with decompression, then some of the decode time is hidden.

*But* that also assumes that you have a whole core to give to decompression. And it assumes you have no other CPU to work to do after loading.

The idea that you can hide decompressor time in IO time only works if you have enough independent loads so that there's lots to overlap (because if you don't, then the first IO and last decompress will never overlap anything), and it assumes you have no other CPU work to do.

In theory I absolutely love the idea that you just load pre-baked data which is all ready to go, and you just point at it, so there's no CPU work in loading other than decompression, but in practice that is almost never the case. eg. for loading compressed web pages, there's tons of CPU work that needs to be ton to parse the HTML or JS or whatever, so the idea that you can hide the decompressor time in the network latency is a lie - the decompressor time adds on to the later processing time and adds directly onto total load latency.

The other factor that people often ignore is the fact that loading these days is heterogeneous.

What you actually encounter is something like this :

Download from internet ~ 1 MB/s
Load from optimal disc ~ 20 MB/s
Load from slow HDD ~ 80 MB/s
Fast SSD ~ 500 MB/s
NVMe drive on PCIe ~ 1-2 GB/s
Load from cache in RAM ~ 8 GB/s

We have very heterogeneous loading - even for a single file loaded by the same application.

The first time you load it, maybe you download from the internet, and in that case a slow decompressor like lzma might be best. But the next time you load it's from the cache in RAM. And the time after that it's from HDD. In those cases, using lzma is a disaster (in the sense that the loading is now nearly instant, but you spend seconds decoding; or in the sense that just loading uncompressed would have been way faster).

One issue that I think is not considered is that making the right choice in the slow-disk zone is not that big of a deal. On a 1 MB/s disk, the difference in "speedup" between lzma and Kraken is maybe 2% in favor of lzma. But on a 100 MB/s it's something like 400% in favor of Kraken.

Now in theory maybe it would be nice to have different compressors for download & disk storage; like you use something like lzma for downloadable, and then decode and re-encode to ZStd for HDD loading. In practice nobody does that and the advantage over just using ZStd all the time is very marginal.

Also in theory it would be nice if the OS cache would cache the decompressed data rather than caching the compressed data.

TODO : time lzma on PS4. Usually PS4 is 2-4X slower than my PC, so that puts lzma somewhere in the 10-25 mb/s range, which is very borderline for keeping up with the optical disc.

DVD 16x is ~ 20 MB/s (max)
PS4 Blu-Ray is 6x ~ 27 MB/s (max)

PS4 transparently caches Blu-Ray to HDD

Of course because of the transparent caching to HDD, if you actually keep files in lzma on the disc, and they are cached to HDD, loading them from HDD is a huge mismatch and makes lzma the bottleneck.

That is, in practice on the PS4 when you load files from disc, they are sometimes actually coming from the HDD transparent cache, so you sometimes get 20 MB/s speeds, and sometimes 100 MB/s.

Now of course we'd love to have a higher-ratio compressor than Kraken which isn't so slow. Right now, we just don't have it. We have Kraken at 1000 MB/s , LZNA at 120 MB/s , lzma at 50 MB/s - it's too big of a step down in speed, even for LZNA.

In order for the size gain of lzma/LZNA to be worth it, it needs to run a *lot* faster, more like 400 mb/s. There needs to be a new algorithmic step in the high compress domain to get there.

At the moment the only reason to use the slower decoders than Kraken is if you simply must get smaller files and damn the speed; like if you have a downloadable app size hard limit and just have to fit in 100 MB, or if you are running out of room on an optical disc or whatever.

No comments:

old rants