2/12/2018

Oodle 2.6.0 : Impending Deprecation of pre-Kraken codecs

Starting in Oodle 2.6.0 the pre-Kraken codecs will be slowly phased out.

In 2.6.0 they will still be available for both encoding & decoding. However, they will be marked "deprecated" to discourage their use. I'm doing this two ways.

One : if you encode with them, they will log a warning through the new Oodle "UsageWarning" system. Decoding with them will not log a warning. This warning can be disabled by calling Oodle_SetUsageWarnings(false). (Of course in shipping you might also have set the Oodle log function to NULL via OodlePlugins_SetPrintf, which will also disable the warning log). (Also note that in MSVC builds of Oodle the default log function only goes to OutputDebugString so you will not see anything unless you either have a debugger attached, change the log function, or use OodleX to install the OodleX logger).

Two : the enum for the old compressor names will be hidden in oodle2.h by default. You must define OODLE_ALLOW_DEPRECATED_COMPRESSORS before including oodle2.h to enable their definition.

As noted, decoding with the old codecs will not log a usage warning, and can be done without setting OODLE_ALLOW_DEPRECATED_COMPRESSORS. That is, Oodle 2.6.0 requires no modifications to decode old codecs and will not log a warning. You only need to consider these steps if you want to *encode* with the old codecs.

In some future version the encoders for the old codecs will be removed completely. All decoders will continue to be shipped for the time being.

I'm intentionally make it difficult to encode with the old codecs so that you transition to one of the new codecs.


Long term support codecs :

Kraken, Mermaid, Selkie (and Hydra)

codecs being deprecated :

LZNA, BitKnit, LZA, LZH, LZHLW, LZNIB, LZB16, LZBLW

The new codecs simply obsolete the old codecs and they should not be used any more. The old codecs are also a mix of some fuzz safe, some not. The new codecs are all fuzz safe and we want to remove support for non-fuzz-safe decoding so that it's not possible to do by accident.


In pursuit of that last point, another change in 2.6.0 is removing the default argument value for OodleLZ_FuzzSafe in the OodleLZ_Decompress call.

Previously that argument had a default value of OodleLZ_FuzzSafe_No , so that it would allow all the codecs to decompress and was backwards compatible when it was introduced.

If you are not explicitly passing something there, you will get a compiler error and need to pass something.

If possible, you should pass OodleLZ_FuzzSafe_Yes. This will ensure the decode is fuzz safe (ie. won't crash if given invalid data to decode).

The only reason that you would not pass FuzzSafe_Yes is if you need to decode some of the older non-fuzz-safe codecs. We recommend moving away from those codecs to the new post-Kraken codecs (which are all fuzz safe).


Fuzz safe codecs :

Kraken, Mermaid, Selkie (and Hydra)
BitKnit, LZB16, LZH, LZHLW

Non-fuzz-safe codecs :

LZNA, LZA, LZNIB, LZBLW

If you need to decode one of the non-fuzz-safe codecs, you must pass FuzzSafe_No. If you pass FuzzSafe_Yes, and the decoder encounters data made by that codec, it will return failure.

Fuzz safety in Oodle means that unexpected data will not crash the decoder. It will not necessarilly be detected; the decode might still return success. For full safety, your systems that consume the data post decompression must all be fuzz safe.


Another semi-deprecation coming in Oodle is removing high-performance optimization for 32-bit builds (where 64-bit is available).

What I mean is, we will continue to support & work in 32-bit, but it will no longer be an optimization priority. We may allow it to get slower than it could be. (for example in some cases we just run the 64-bit code that requires two registers per qword; not the ideal way to write the 32-bit version, but it saves us from making yet more code paths to optimize and test).

We're not aware of any games that are still shipping in 32-bit. We have decided to focus our time budget on 64-bit performance. We recommend evaluating Oodle and always running your tools in 64-bit.

If you're a user who believes that 32-bit performance is important, please let us know by emailing your contact at RAD.

1/18/2018

The Natural Lambda

Every compressor has a natural innate or implicit "lambda" , the Lagrange parameter for space-speed optimization.

Excluding compressors like cmix that are purely aiming for the smallest size, every other compressor is balancing ratio vs. complexity in some way. The author has made countless decisions about space-speed tradeoffs in the design of their codec; are they using huffman or ANS or arithmetic coding? are they using LZ77 or ROLZ or PPM? are they optimal parsing? etc. etc.

There are always ways to spend more CPU time and get more compression, so you choose some balance that you are targeting and try to make decisions that optimize for that balance. (as usual when I talk about "time" or "speed" here I'm talking about decode time ; you can of course also look at balancing encode time vs. ratio, or memory use vs. ratio, or code size, etc.)

Assuming you have done everything well, you may produce a compressor which is "pareto optimal" over some range of speeds. That is if you do something like this : 03-02-15 - Oodle LZ Pareto Frontier , you make a speedup graph over various simulated disk speeds, you are on the pareto frontier if your compressor's curve is the topmost for some range. If your compressor is nowhere topmost, as in eg see the graph with zlib at the bottom : Introducing Oodle Mermaid and Selkie , then try again before proceeding.

The range over which your compressor is pareto optimal is the "natural range" for it. That is a range where it is working better than any other compressor in the world.

Outside of that range, the optimal way to use your compressor is to *not* use it. A client that wants optimal behavior outside of that range should simply switch to a different compressor.

As an example of this - people often take a compressor which is designed for a certain space-speed performance, and then ask it questions that are outside of its natural range, such as "well if I change the settings, just how small of a file can it produce?" , or "just how fast can it decode at maximum speed?". These are bogus questions because they are outside of the operating range. It's like asking how delicate an operation can you do with a hammer, or how much weight can you put on an egg. You're using the tool wrong and should switch to a different tool.

(perhaps a better analogy that's closer to home is taking traditional JPEG Huff and trying to use it for very low bit rates (as many ass-hat authors do when trying to show how much better they are than JPEG), like 0.1 bpp, or at very high bit rates trying to get near-lossless quality. It's simply not built to run outside of its target range, and any examination outside of that range is worse than useless. (worse because it can lead you to conclusions that are entirely wrong))

The "natural lambda" for a compressor is at the center of its natural range, where it is pareto optimal. This is the space-speed tradeoff point where the compressor is working its best, where all those implicit decisions are right.

Say you chose to do an LZ compressor with Huffman and to forbid overlapping matches with offset less than 16 (as in Kraken) - those decisions are wrong at some lambda. If you are lucky they are right decisions at the natural lambda.

Of course if you have developed a compressor in an ad-hoc way, it is inevitable that you have made many decisions wrong. The way this has been traditionally done in the past is for the developer to just try some ideas, often they don't really even measure the alternatives. (for example they might choose to pack their LZ offsets in a certain way, but don't really try other ways). If they do seriously try alternatives, they just look at the numbers for compression & speed and sort of eye ball them and make a judgement call. They do not actually have a well defined space-speed goal and a way to measure if they are improving that score.

The previously posted "correct" Weissman Score provides a systematic way of identifying the space-speed range you wish to target and thus identifying if your decisions are right in a space-speed sense.

Most ad-hoc compressors have illogical contradictory decisions in the sense that they have chosen to do something that makes them slower (to decode), but have failed to do something else which would be a better space-speed step. That is, say you have a compressor, and there are various decisions you face as an implementor. The implicit "natural lambda" for your compressor give you a desired slope for space-speed tradeoffs. Any decision you can make (either adding a new mode, or disabling a current mode; for example adding context mixing, or disabling arithmetic coding) should be measured against that lambda.

Over the past few years, we at RAD have been trying to go about compressor development in this new systematic way. People often ask us for tips on their compressor, hoping that we have some magical answers, like "oh you just do this and that makes it work perfectly". They're disappointed when we don't really have those answers. Often my advice leads nowhere; I suggest "try this and try that" and they don't really pan out and they think "this guy doesn't know anything". But that's what we do - we just try this and that, and usually they don't work, so we try something else. What we have is a way of working that is careful and methodical and thorough. For every decision, you legitimately try it both ways with well-optimized implementations. You measure that decision with a space-speed score that is targetted at the correct range for your compressor. You measure on a variety of test sets and look at randomized subsets of those test sets, not averages or totals. There's no magic, there's just doing the work to try make these decisions correctly.

Usually the way that compressor development works starts from a spark of inspiration.

That is, you don't set out by saying "I want to develop a compressor for the Weissman range of 10 - 100 mb/s" and then proceed to make decisions that optimize that score. Instead it starts with some grain of an idea, like "what if I send matches with LZSA and code literals with context mixed RANS". So you start going about that grain of idea and get something working. Thus all compressor development starts out ad hoc.

At some point (if you work at RAD), you need to transition to having a more defined target range for your compressor so you can beginning making well-justified decisions about its implementation. Here are a few ways that I have used to do that in the past.

One is to look at a space-speed "speedup" curve, such as the "pareto charts" I linked to above. You can visually pick out the range where your compressor is performing well. If you are pareto optimal vs the competition, you can start from that range. Perhaps early in development you are not yet pareto optimal, but you can still see where you are close, you can spot the area where if you improved a bit you would be pareto optimal. I then take that range and expand it slightly to account for the fact that the compressor is likely to be used slightly outside of its natural range. eg. if it was optimal from 10 - 100 mb/s , I might expand that to 5 - 200 mb/.

Once you have a Weissman range you can now measure your compressor's "correct" Weissman score. You then construct your compressor to make decisions using a Lagrange parameter, such as :


J = size + lambda * time

for each decision, the compressor tries to minimize J. But you don't know what lambda to use. One way to find it is to numerically optimize for the lambda that maximize the Weissman score over your desired performance interval.

When I first developed Kraken, Mermaid & Selkie, I found their lambdas by using the method of transition points.

Kraken, Mermaid & Selkie are all pareto optimal over large ranges. At some point as you dial lambda on Kraken to favor more speed, you should instead switch to Mermaid. As you dial lambda further towards speed in Mermaid, you should switch to Selkie. We can find what this lambda is where the transition point occurs.

It's simply where the J for each compressor is equal. That is, an outer meta-compressor (like Oodle Hydra ) which could choose between them would have an even choice at this point.


how to set the K/M/S lambdas

    measure space & speed of each

    find the lambda point at which J(Kraken) = J(Mermaid)
        that's the switchover of compressor choice

    call that lambda_KM

    size_K + lambda_KM * time_K = size_M + lambda_KM * time_M
    lambda_KM = (size_M - size_K) / (time_K - time_M)

    same for lambda_MS between Mermaid and Selkie

    choose Mermaid to be at the geometric mean of the two transitions :

    lambda_M = sqrt( lambda_KM * lambda_MS )

    then set lambda_K and lambda_S such that the transitions are at the geometric mean
        between the points, eg :

    lambda_KM = sqrt( lambda_K * lambda_M )
    lambda_K = lambda_KM^2 / lambda_M = lambda_KM * sqrt(lambda_KM / lambda_MS)
    lambda_S = lambda_MS^2 / lambda_M = lambda_MS * sqrt(lambda_MS / lambda_KM)

Of course always beware that these depend on a size & time measurement on a specific file on a specific machine, so gather a bunch on different samples and take the median, discard outliers, etc.

(the geometric mean seems to work out well because lambdas seem to like to go in a logarithmic way. This is because the time that we spend to save a byte seems to increase in a power scale way. That is, saving each additional byte in a compressor is 2X harder than the last, so compressors that decrease size more by some linear amount will have times that increase by an exponential amount. Put another way, if J was instead defined to be J = size + lambda * log(time) , with logarithmic time, then lambda would be linear and arithmetic mean would be approprite here.)


It's easy to get these things wrong, so it's important to do many and frequent sanity checks.

An important one is that : more options should always make the compressor better. That is, giving the encoder more choices of encoding, if it makes a space-speed decision with the right lambda, if the rate measurement is correct, if the speed estimation is correct, if there are no unaccounted for side effects - having that choice should lead to a better space-speed performance. Any time you enable a new option and performance gets worse, something is wrong.

There are many ways this can go wrong. Non-local side effects of decisions (such as parse-statistics feedback) is a confounding one. Another common problem is if the speed estimate doesn't match the actual performance of the code, or if it doesn't match the particular machine you are testing on.

An important sanity check is to have macro-scale options. For example if you have other compressors to switch to and consider J against them - if they are being chosen in the range where you think you should be optimal, something is wrong. One "compressor" that should always be included in this is the "null" compressor (memcpy). If you're targetting high speed, you should always be space-speed testing against memcpy and if you can't beat that, either your compressor is not working well or your supposed speed target range is not right.

In the high compression range, you can sanity check by comparing to preprocessors. eg. rather than spend N more cycles on making your compressor slower, compare to other ways of trading off time for compression, such as perhaps word-dictionary preprocessing on text, delta filtering on wav, etc. If those are better options, then they should be done instead.


One of the things I'm getting at is that every compressor has a "natural lambda" , even ones that are not conscious of it. The fundamental way they function implies certain decisions about space-speed tradeoffs and where they are suitable for use.

You can deduce the natural lambda of a compressor by looking at the choices and alternatives in its design.

For example, you can take a compressor like ZStd. ZStd entropy codes literals with Huffman and other things (ML, LRL, offset log2) with TANS (FSE). You can look at switching those choices - switch the literals to coding with TANS. That will cost you some speed and gain some compression. There is a certain lambda where that decision is moot - either way gives equal J. If ZStd is logically designed, then its natural lambda must lie to the speed side of that decision. Similarly try switching coding ML to Huffman, again you will find a lambda where that decision is moot, and ZStd's natural lambda should lie on the slower side of that decision.

You can look at all the options in a codec and consider alternatives and find this whole set of lambda points where that decision makes sense. Some will constrain your natural lambda from above, some below, and you should lie somewhere in the middle there.

This is true and possible even for codecs that are not pareto. For however good they are, their construction implies a set of decisions that target an implicit natural lambda.

old rants