6/26/2014

06-26-14 - VR Impressions

NOTE : changed post title to break the link.

Yesterday I finally went to Valve and saw "The Room". This is a rather rambly post about my thoughts after experiencing it.

For those who have been under a rock (like me), Valve has got this amazing VR demo. It uses unique prototype hardware that provides very good positional head tracking and very low latency graphics. It's in a calibrated room with registration spots all over the walls. It's way way better than any other VR, it's the real thing.

There is this magic thing that happens, it does tickle your brain intuitively. Part of you thinks that you're there. I had the same experiences that I've heard other people recount - your body starts reacting; like when a sphere moves towards you, you flinch and try to dodge it without thinking.

Part of the magic is that it's good enough that you *want* to believe it. It's not actually good enough that it seems real. Even in the carefully calibrated Valve room, it's glitchy and things pop a bit, and you always know you're in a simulation. But you choose to ignore the problems. It felt like when you're watching a good movie, and if you were being rational you would say that this is all illogical and the green screening looks fucking terrible and that is physically impossible what he just did, but if it's good you just choose to ignore all that and go along for the ride. VR felt like that to me.

One of the cool things about VR is that there is an absolute sense of scale, because you are always the size of you. This gives you scale reference in a way that you never have in games. Which is also a problem. It's wonderful if you're making games where you play as a human, but you can't play as a giant (if you just scale down everything else, it feels like you're you in a world where everything else is tiny, not that you're bigger; scale is no longer relative, you are always you). You can't make the characters run at 60 mph the way we usually do in games.

As cool as it is, I don't see how you actually make games with it.

For one thing there are massive short term technical problems. The headset is heavy and uncomfortable. The lenses have to be perfectly aligned to your eyes or you get sick. The registration is very easy to mess up. I'm sure these will be resolved over time. The headset has a cable which is always in danger of tripping or strangling you, which is a major problem and technically hard to get rid of, but perhaps possible some day.

But there are more fundamental inherent problems. When I stepped off the ledge, I wanted to fall. But of course I never actually can. You make my character fall, but not my body? That's weird. Heck if my character steps up on something, I want to step up myself. You can only make games where you basically stand still. In the room with the pipes, I want to climb on the pipes. Nope, you can't - and probably never can. Why would I want to be in a virtual world if I can't do anything there? I don't know how you even walk around a world without it feeling bizarre. All the Valve demos are basically you stuck in a tiny box, which is going to get old.

How do you ever make a game where the player character is moved without their own volition? If an NPC bumps me and pushes my avatar, what happens? You can't push my real human body, so it breaks the illusion. It seems to me that as soon as your viewpoint has a physical reaction with the virtual world and isn't just a viewer with no collision detection, it just doesn't work.

There's this fundamental problem that the software cannot move the player's viewpoint. The player must always get to move their own viewpoint with their head, or the illusion is broken (or worse, you get sick). This is just such a huge problem for games, it means the player can only be a ghost, or an omniscient observer in an RTS game, or other such things. Sure you can make games where you stand over an RTS world map and poke at it. Yay, it's a board game with fancy graphics. I see how it could be great as a sculpting or design tool. I see how it would be great for The Witness and similar games.

For me personally, it's so disappointing that you can't actually physically be in these worlds. The most exciting moments for me were some of the outdoor scenes, or the pipe room, where I just felt viscerally - "I want to run around in this world". What would be amazing for me would be to go in the VR world to alien planets with crazy strange plants and geology, and be able to run around it and climb on it. And I just don't see how that ever works. You can't walk around your living room, you'll trip on things or run into the wall. You can't ever step up or down anything, you have to be on perfectly flat ground all the time. You can't ever touch anything. (It's like a strip club; hey this is so exciting! can I interact with it? No? I have to just sit here and not move or touch anything? How fucking boring and frustrating. I'm leaving.)

At the very minimum you need gloves with amazing force feedback to give you some kind of tactile experience of the VR world, but even then it's just good for VR surgery and VR board games and things where you stand still and touch things. (and we all know the real app is VR fondling).

You could definitely make a killer car racing game. Put me in a seat with force feedback, and that solves all the physical interaction problems. (or, similarly, I'm driving a mech or a space ship or whatever; basically lock the player in a seat so you don't have to address the hard problems for now).

There are also huge huge software problems. Collision detection has to be polygon-perfect ; coarse collision proxies are no longer acceptable. Physics and animation have to be way better. Texture mapping and normal mapping just don't work. Billboard cards just don't work. We basically can't have trees or smoke or anything soft or complex for a long time, it's going to be a lot of super simple rigid objects. Skinned characters and painted on clothing (and just using textures to paint on geometry), none of it works. Flat shaded simple stuff is totally fine, but all the hacks we've used for so long are out the window.

I certainly see the appeal (for a software engineer) of starting from scratch on so many issues and working on the hard problems. Fun.

Socially I find VR rather scary.

One issue is the addictive nature of living in a VR world. Yes yes people are already addicted to their phones and facebook and WoW and whatever, but this is a whole new level. Plus it's even more disengaged from reality; it's one thing for everyone in a coffee shop these days to be staring at their laptops (god I hate you) but when they're all in headsets then interaction in the real world is completely over. I have no doubt that there will be a large class of people that live in the VR world and never leave their living room; Facebook will provide a "deliver pizza" button so that you don't even have to exit the simulation. It will be bad.

Perhaps more disturbing to me is how real and scary it can be. Just having a cube move into me was a kind of real physical fright that I haven't felt in a game. I think that being in a realistic VR world with people shooting each other would be absolutely terrifying and disgusting and really would do bad things to the brains of the players.

And if we wind up with evil overlords like Facebook or Apple or whoever controlling our VR world, that is downright dystopian. We all had our chance to say "no" to the rise of closed platforms when the Apple shit started to take off, and we all fucking dropped the ball (well, you did). Hell we did the same thing with the PATRIOT act. We're all just lying down and getting raped and not doing a damn thing about it and the future of freedom is very bleak indeed. (wow that rant went off the rails)

Anyway, I look forward to trying it again and seeing what people come up with. It's been a long time since I saw anything in games that made me say "yes! I want to play that!" so in that sense VR is a huge win.


Saved comments :

Tom Forsyth said... Playing as a giant is OK - grow the player's height, but don't move their eyes further apart. So the scale is unchanged, but the eyes are higher off the ground. July 3, 2014 at 7:45 PM

brucedawson said... Isn't a giant just somebody who is way taller than everybody else? So yeah, maybe if you 'just' scale down everyone else then you'll still feel normal size. But you'll also feel like you can crush the tiny creatures like bugs! Which is really the essence of being a giant. And yes, I have done the demo. July 3, 2014 at 8:56 PM

Grzegorz Adam Hankiewicz said... I don't understand how you say a steering wheel with force feedback solves any VR problem when the main reason I know I'm driving fast is how forces are being applied to my whole body, not that I'm holding something round instead of a gamepad. You mention it being jarring not being able to climb, wouldn't it be jarring to jump on a terrain bump inside your car and not feel gravity changes? Maybe the point of VR is not replicating dull life but simulating what real life can't possibly give us ever? July 4, 2014 at 3:08 AM

cbloom said... @GAH - not a wheel with force feedback (they all suck right now), but a *seat* like the F1 simulators use. They're quite good at faking short-term forces (terrain bumps and such are easy). I certainly don't mean that that should be the goal of VR. In fact it's quite disappointing that that is the only thing we have any hope of doing a good job of in the short term. July 4, 2014 at 7:33 AM

Stu said... I think you're being a bit defeatist about it, and unimaginative about how it can be used today. Despite being around 30 years old, the tech has only just caught up to the point whereby it can begin to travel down the path towards full immersion, Matrix style brain plugs, holodeck etc. This shit's gotta start somewhere, and can still produce amazing gaming - an obvious killer gaming genre is in any vehicular activity, incl. racing, normal driving, flying, space piloting, etc. Let the other stuff slowly evolve towards your eventual goal - we're in the 'space invaders' and 'pacman' era for VR now, and it works as is for a lot of gaming. July 4, 2014 at 9:11 AM

cbloom said... I'd love to hear any ideas for how game play with physical interaction will ever work. Haven't heard any yet. Obviously the goal should be physical interaction that actually *feels* like physical interaction so that it doesn't break the illusion of being there. That's unattainable for a long time. But even more modest is just how do you do something like walking around a space that has solid objects in it, or there are NPC's walking around. How do you make that work without being super artificial and weird and breaking the magic? In the short term we're going to see games that are basically board games, god games, fucking "experiences" where flower petals fall on you and garbage like that. We're going to see games that are just traditional shitty computer games, where you slide around a fucking lozenge collision proxy using a gamepad, and the VR is just a viewer in that game. That is fucking lame. What I would really hate to see is for the current trend in games to continue into VR - just more of the same shit all the time with better graphics. If people just punt on actually solving VR interaction and just use it as a way to make amazing graphics for fucking Call of Doody or Retro Fucking Mario Indie Bullshit Clone then I will be sad. When the top game is fucking VR Candy Soul-Crush then I will be sad. What is super magical and amazing is the feeling that you actually are somewhere else, and your body gets involved in a way it never has before, you feel like you can actually move around in this VR world. And until games are actually working in that magic it's just bullshit. July 4, 2014 at 9:36 AM

cbloom said... If you like, this is an exhortation to not cop the fuck out on VR the way we have in games for the last 20 years. The hard problems we should be solving in games are AI, animation/motion, physics. But we don't. We just make the same shit and put better graphics on it. Because that sells, and it's easy. Don't do that to VR. Actually work on how people interact with the simulation, and how the simulation responds to them. July 4, 2014 at 10:03 AM

Dillon Robinson said... Son, Bloom, kiddo, you've talking out of your ass again. Think before you speak.

.. and then it really went downhill.

6/21/2014

06-21-14 - The E46 M3

Fuck Yeah.

Oh my god. It's so fucking good.

When I'm working in my little garage office, I can feel her behind me, trying to seduce me. Whispering naughty thoughts to me. "Come on, let's just go for a little spin".

On the road, I love the way you can just pin the throttle on corner exit; the back end gently slides out, just a little wiggle. You actually just straighten the steering early, it's like half way through the corner you go boom throttle and straighten the lock and the car just glides out to finish the turn. Oh god it's like sex. You start the turn with your hands and then finish it with your foot, and it's amazing, it feels so right.

On the track there's a whole other feeling, once she's up to speed at the threshold of grip, on full tilt. My favorite thing so far is the chicane at T5 on the back side of pacific. She just dances through there in such a sweet way. You can just lift off the throttle to get a little engine braking and set the nose, then back on throttle to make the rear end just lightly step out and help ease you around the corner. The weight transfer and grip front to back just so nicely goes back and forth, it's fucking amazing. She feels so light on her feet, like a dancer, like a boxer, like a cat.

There are a few things I miss about the 911. The brakes certainly, the balance under braking and the control under trail-braking yes, the steering feel, oh god the steering feel was good and it fucking SUCKS in the M3, the head room in the 911 was awesome, the M3 has shit head room and it's really bad with a helmet, the visibility - all that wonderful glass and low door lines, the feeling of space in the cabin. Okay maybe more than a few things.

But oh my god the M3. I don't care that I have to sit slightly twisted (WTF); I don't care that there are various reliability problems. I don't care that it requires super expensive annual valve adjustments. I forgive it all. For that engine, so eager, so creamy, screaming all the way through the 8k rev range with not a single dip in torque, for the quick throttle response and lack of electronic fudging, for the chassis balance, for the way you can trim it with the right foot. Wow.

06-21-14 - Suffix Trie Note

A small note on Suffix Tries for LZ compression.

See previously :

Sketch of Suffix Trie for Last Occurance

So. Reminder to myself : Suffix Tries for optimal parsing is clean and awesome. But *only* for finding the length of the longest match. *not* for finding the lowest offset of that match. And *not* for finding the longest match length and the lowest offset of any other (shorter) matches.

I wrote before about the heuristic I currently use in Oodle to solve this. I find the longest match in the trie, then I walk up to parent nodes and see if they provide lower offset / short length matches, because those may be also interesting to consider in the optimal parser.

(eg. for clarity, the situation you need to consider is something like a match of length 8 at offset 482313 vs. a match of length 6 at offset 4 ; it's important to find that lower-length lower-offset match so that you can consider the cost of it, since it might be much cheaper)

Now, I tested the heuristic of just doing parent-gathers and limitted updates, and it performed well *in my LZH coder*. It does *not* necessarily perform well with other coders.

It can miss out on some very low offset short matches. You may need to supplement the Suffix Trie with an additional short range matcher, like even just a 1024 entry hash-chain matcher. Or maybe a [256*256*256] array of the last occurance location of a trigram. Even just checking at offset=1 for the RLE match is helpful. Whether or not they are important or not depends on the back end coder, so you just have to try it.

For LZA I ran into another problem :

The Suffix Trie exactly finds the length of the longest match in O(N). That's fine. The problem is when you go up to the parent nodes - the node depth is *not* the longest match length with the pointer there. It's just the *minimum* match length. The true match length might be anywhere up to *and including* the longest match length.

In LZH I was considering those matches with the node depth as the match length. And actually I re-tested it with the correct match length, and it makes very little difference.

Because LZA does LAM exclusion, it's crucial that you actually find what the longest ML is for that offset.

(note that the original LZMA exclude coder is actually just a *statistical* exclude coder; it is still capable of coding the excluded character, it just has very low probability. My modified version that only codes 7 bits instead of 8 is not capable of coding the excluded character, so you must not allow this.)

One bit of ugliness is that extending the match to find its true length is not part of the neat O(N) time query.

In any case, I think is all a bit of a dead-end for me. I'd rather move my LZA parser to be forward-only and get away from the "find a match at every position" requirement. That allows you to take big steps when you find long matches and makes the whole thing faster.

6/18/2014

06-18-14 - Oodle Network Test Results

Well, it's been several months now that we've been testing out the beta of Oodle Network Compression on data from real game developers.

Most of the tests have been UDP, with a few TCP. We've done a few tests on data from the major engines (UE3/4, Source) that do delta property bit-packing. Some of the MMO type games with large packets were using zlib on packets.

This is a summary of all the major tests that I've run. This is not excluding any tests where we did badly. So far we have done very well on every single packet capture we've seen from game developers.


MMO game :
427.0 -> 182.7 average = 2.337:1 = 57.21% reduction
compare to zlib -5 : 427.0 -> 271.9 average


MMRTS game :
122.0 -> 75.6 average = 1.615:1 = 38.08% reduction


Unreal game :
220.9 -> 143.3 average = 1.542:1 = 35.15% reduction


Tiny packet game :
21.9 -> 15.6 average = 1.403:1 = 28.72% reduction


Large packet Source engine game :
798.2 -> 519.6 average = 1.536:1 = 34.90% reduction

Some of the tests surprised even me, particularly the tiny packet one. When I saw the average size was only 22 bytes I didn't think we had much room to work with, but we did!

Some notes :

  • Oodle Network Compression provides > 28% reduction in all cases seen so far.

  • Oodle works even on tiny packets!

  • Oodle greatly out-compresses zlib. Some developers think that zlib is a great compromise for space and speed. Oodle can do much better! Oodle is also generally faster to encode than zlib and uses less memory per channel.

  • Oodle works even on games that are already doing bit-packing and delta updates (like most of the major traditional engines do).

  • Oodle easily pays for itself; the savings in bandwidth is much greater than the cost of Oodle. In addition to the cost savings, Oodle means you can run more players

6/16/2014

06-16-14 - Rep0 Exclusion in LZMA-like coders

For reference on this topic : see the last post .

I believe there's a mistake in LZMA. I could be totally wrong about that because reading the 7z code is very hard; in any case I'm going to talk about Rep0 exclusion. I believe that LZMA does not do this the way it should, and perhaps this change should be integrated into a future version. In general I have found LZMA to be very good and most of its design decisions have been excellent. My intention is not to nitpick it, but to give back to a very nice algorithm which has been generously made public by its author, Igor Pavlov.

LZMA does "literal-after-match" exclusion. I talked a bit about this last time. Basically, after a match you know that the next literal cannot be the one that would have continued the match. If it was you would have just written a longer match. This relies on always writing the maximum length for matches.

To model "LAM" exclusion, LZMA uses a funny way of doing the binary arithmetic model for those symbols. I wrote a bit about that last time, and the LZMA way to do that is good.

LZMA uses LAM exclusion on the first literal after a match, and then does normal 8-bit literal coding if there are more literals.

That all seems fine, and I worked on the Oodle LZ-Arith variant for about month with a similar scheme, thinking it was right.

But there's a wrinkle. LZMA also does "rep0len1" coding.

For background, LZMA, like LZX before it, does "repeat match" coding. A rep match means using one of the last N offsets (usually N = 3) and you flag that and send it in very few bits. I've talked about the surprising importance of repeat matches before (also here and other places ).

LZMA, like LZX, codes rep matches with MML of 2.

But LZMA also has "rep0len1". rep0len1 codes a single symbol at the 0th repeat match offset. That's the last offset matched from. That's the same offset that provides the LAM exclusion. In fact you can state the LAM exclusion as "rep0len1 cannot occur on the symbol after a match". (and in fact LZMA gets that right and doesn't try to code the rep0len1 bit after a match).

rep0len1 is not a win on text, but it's a decent little win on binary structured data (see example at bottom of this post ). It lets you get things like the top byte of a series of floats (off=4 len1 match, then 3 literals).

The thing is, if you're doing the rep0len1 coding, then normal literals also cannot be the rep0len1 symbol. If they were, then you would just code them with the rep0len1 flag.

So *every* literal should be coded with rep0 exclusion. Not just the literal after a match. And in fact the normal 8-bit literal coding path without exclusion is never used.

To be concrete, coding a literal in LZMA should look like this :


cur_lit = buffer[ pos ];

rep0_lit = buffer[ pos - rep0_offset ];

if ( after match )
{
    // LAM exclusion means cur_lit should never be = rep0_lit
    ASSERT( cur_lit != rep0_lit );
}
else
{
    if ( cur_lit == rep0_lit )
    {
        // lit is rep0, send rep0len1 :
        ... encode rep0len1 flag ...

        // do not drop through
        return;
    }
}

// either way, we now have exclusion :
ASSERT( cur_lit != rep0_lit );

encode_with_exclude( cur_lit, rep0_lit );

and that provides a pretty solid win. Of all the things I did to try to beat LZMA, this was the only clear winner.


ADDENDUM : some notes on this.

Note that the LZMA binary-exclude coder is *not* just doing exclusion. It's also using the exclude symbol as modelling context. Pure exclusion would just take the probability of the excluded symbol and distribute it to the other symbols, in proportion to their probability.

It turns out that the rep0 literal is an excellent predictor, even beyond just exclusion.

That is, say you're doing normal 8-bit literal coding with no exclusion. You are allowed to use an 8-bit literal as context. You can either use the order-1 literal (that's buffer[pos-1]) or the rep0 literal (that's buffer[pos-rep0_offset]).

It's better to use the rep0 literal!

Of course the rep0 literal becomes a weaker predictor as you get away from the end of the match. It's very good on the literal right after a match (lam exclusion), and still very good on the next literal, and then steadily weaker.

It turns out the transition point is 4-6 literals away from the end of the match; that's the point at which the o1 symbol becomes more correlated to the current symbol than the rep0 lit.

One of the ideas that I had for Oodle LZA was to remove the rep0len1 flag completely and instead get the same effect from context modeling. You can instead take the rep0 lit and use it as an 8-bit context for literal coding, and should get the same benefit. (the coding of the match flag is implicit in the probability model).

I believe the reason I couldn't find a win there is because it turns out that LZ literal coding needs to adapt very fast. You want very few context bits, you want super fast adaptation of the top bits. Part of the issue is that you don't see LZ literals very often; there are big gaps where you had matches, so you aren't getting as much data to adapt to the file. But I can't entirely explain it.

You can intuitively understand why the rep0 literal is such a strong predictor even when it doesn't match. You've taken a string from earlier in the file, and blacked out one symbol. You're told what the symbol was before, and you're told that in the new string it is not that symbol. It's something like :


"Ther" matched before
'r' is to be substituted
"The?"
What is ? , given that it was 'r' but isn't 'r' here.

Given only the o1 symbol ('e') and the substituted symbol ('r'), you can make a very good guess of what should be there ('n' probably, maybe 'm', or 's', ...). Obviously more context would help, but with limited information, the substituted symbol (rep0 lit) sort of gives you a hint about the whole area.

An ever simpler case is given just the fact that rep0lit is upper or lower case - you're likely to substitute it with a character of the same case. Similarly if it's a vowel or consonant, you're likely to substitute with one of the same. etc. and of course I'm just using English text because it's intuitive, it works just as well on binary structured data.


There's another very small flaw in LZMA's exclude coder, which is more of a minor detail, but I'll include it here.

The LZMA binary exclude coder is equivalent to this clearer version that I posted last time :


void BinaryArithCodeWithExclude( ArithEncoder * arith, int val, int exclude )
{
    int ctx = 1; // place holder top bit

    // first loop in the "matched" part of the tree :
    for(;;)
    {
        int exclude_bit = (exclude >> 7) & 1; exclude <<= 1;
        int bit = (val >> 7) & 1; val <<= 1;
        ASSERT( ctx < 256 );
        m_bins[256 + ctx + (exclude_bit<<8)].encode(arith,bit);
        ctx = (ctx<<1) | bit;
        if ( ctx >= 256 )
            return;
        if ( bit != exclude_bit )
            break;
    }
    
    // then finish bits that are unmatched :
    // non-matched
    do
    {
        int bit = (val >> 7) & 1; val <<= 1;
        m_bins[ctx].encode(arith,bit);
        ctx = (ctx<<1) | bit;
    }
    while( ctx < 256 );
}
This codes up to 8 bits while the bits of "val" match the bits of "exclude" , and up to 8 bits while the bits of "val" don't match.

Now, obviously the very first bit can never be coded in the unmatched phase. So we could eliminate that from the unmatched bins. But that only saves us one slot.

(and actually I'm already wasting a slot intentionally; the "ctx" with place value bit like this is always >= 1 , so you should be indexing at "ctx-1" if you want a tight packed array. I intentionally don't do that, so that I have 256 bins instead of 255, because it makes the addressing work with "<<8" instead of "*255")

More importantly, in the "matched" phase, you don't actually need to code all 8 bits. If you code 7 bits, then you know that "val" and "exclude" match in all top 7 bits, so it must be that val == exclude^1. That is, it's just one bit flip away; the decoder will also know that so you can just not send it.

The fixed encoder is :


void BinaryArithCodeWithExclude( ArithEncoder * arith, int val, int exclude )
{
    int ctx = 1; // place holder top bit

    // first loop in the "matched" part of the tree :
    for(;;)
    {
        int exclude_bit = (exclude >> 7) & 1; exclude <<= 1;
        int bit = (val >> 7) & 1; val <<= 1;
        ASSERT( ctx < 128 );
        m_bins[256 + ctx + (exclude_bit<<7)].encode(arith,bit);
        ctx = (ctx<<1) | bit;
        if ( bit != exclude_bit )
            break;
        if ( ctx >= 128 )
        {
            // I've coded 7 bits
            // and they all match
            // no need to code the last one
            return;
        }
    }
    
    // then finish bits that are unmatched :
    // non-matched
    do
    {
        int bit = (val >> 7) & 1; val <<= 1;
        m_bins[ctx].encode(arith,bit);
        ctx = (ctx<<1) | bit;
    }
    while( ctx < 256 );
}
Note that now ctx in the matched phase can only reach 128. That means this coder actually only needs 2*256 bins, not 3*256 bins as stated last time.

This is a little speed savings (tiny because we only get one less arith coding event on a rare path), a little compression savings (tiny because that bottom bit models very well), and a little memory use savings.

6/12/2014

06-12-14 - Some LZMA Notes

I've been working on an LZ-Arith for Oodle, and of course the benchmark to beat is LZMA, so I've had a look at a few things.

Some previous posts related to things I'll discuss today :

cbloom rants 09-27-08 - 2 On LZ and ACB
cbloom rants 10-01-08 - 2 First Look at LZMA
cbloom rants 10-05-08 - 5 Rant on New Arithmetic Coders
cbloom rants 08-20-10 - Deobfuscating LZMA
cbloom rants 09-03-10 - LZ and Exclusions

Some non-trivial things I have noticed :

1. The standard way of coding literals with a binary arithmetic coder has a subtle quirk to it.

LZMA uses the now standard fractional update method for binary probability modeling. That's p0 -= (p0 >> updshift) and so on. See for example : 10-05-08 - 5 : Rant on New Arithmetic Coders .

The fractional update method is an approximation of a standard {num0,num1} binary model in which you are kept right at the renormalization threshold. That is, a counting model does :

P0 = num0 / (num0+num1);
... do coding ...
if ( bit ) num1++;
else num0++;
if ( (num0+num1) > renorm_threshold )
{
  // scale down somehow; traditionally num0 >>= 1;
}
The fractional shift method is equivalent to :

num0 = P0;
num1 = (1<<frac_tot) - P0;
if ( bit ) num1++;
else num0++;

// num0+num1 is now ((1<<frac_tot)+1); rescale :
P0 = num0 * (1<<frac_tot)/((1<<frac_tot)+1);

That is, it assumes you're right at renormalization threshold and keeps you there.

The important thing about this is adaptation speed.

A traditional {num0,num1} model adapts very quickly at first. Each observed bit causes a big change to P0 because total is small. As total gets larger, it becomes more stable, it has more inertia and adapts more slowly. The renorm_threshold sets a minimum adaptation speed; that is, it prevents the model from becoming too full of old data and too slow to respond to new data.

Okay, that's all background. Now let's look at coding literals.

The standard way to code an N bit literal using a binary arithmetic coder is to code each bit one by one, either top down or bottom up, and use the previous coded bits as context, so that each subtree of the binary tree gets its own probability models. Something like :


ctx = 1;
while( ctx < 256 ) // 8 codings
{
    int bit = (val >> 7)&1; // get top bit
    val <<= 1; // slide val for next coding
    BinaryCode( bit, p0[ctx-1] );
    // put bit in ctx for next event
    ctx = (ctx<<1) | bit;
}

Okay.

Now first of all there is a common misconception that binary coding is somehow different than N-ary arithmetic coding, or that it will work better on "binary data" that is somehow organized "bitwise" vs text-like data. That is not strictly true.

If we use a pure counting model for our N-ary code and our binary code, and we have not reached the renormalization threshold, then they are in fact *identical*. Exactly identical.

For example, say we're coding two-bit literals :


The initial counts are :

0: 3
1: 1
2: 5
3: 4
total = 13

we code a 2 with probability 5/13 in log2(13/5) bits = 1.37851
and its count becomes 6

With binary modeling the counts are :

no ctx
0: 4
1: 9

ctx=0
0: 3
1: 1

ctx=1
0: 5
1: 4

to code a "2"
we first code a 1 bit with no context
with probability 9/13 in log2(13/9) bits = 0.53051
and the counts become {4,10}

then we code a 0 bit with a 1 context
with probability 5/9 in log2(9/5) bits = 0.84800
and the counts become {6,4}

And of course 1.37851 = 0.53051 + 0.84800

The coding is exactly the same. (and furthermore, binary coding top down or bottom up is also exactly the same).

However, there is a difference, and this is where the quirk of LZMA comes in. Once you start hitting the renormalization threshold, so that the adaptation speed is clamped, they do behave differently.

In a binary model, you will see many more events at the top bit. The exact number depends on how spread your statistics are. If all 256 symbols are equally likely, then the top bit is coded 128X more often than the bottom bits (and each of the next bits is coded 64X, etc.). If only one symbol actually occurs then all the bit levels will be coded the same number of times. In practice it's somewhere in between.

If you were trying to match the normal N-ary counting model, then the binary model should have much *slower* adaptation for the top bit than it does for the bottom bit. With a "fractional shift" binary arithmetic coder that would mean using a different "update shift".

But LZMA, like most code I've seen that implements this kind of binary coding of literals, does not use different adaptation rates for each bit level. Instead they just blindly use the same binary coder for each bit level.

This is wrong, but it turns out to be right. I tested a bunch of variations and found that the LZMA way is best on my test set. It seems that having much faster adaptation of the top bits is a good thing.

Note that this is a consequence of using unequal contexts for the different bit levels. The top bit has 0 bits of context, while the bottom bit has 7 bits of context, which means its statistics are diluted 128X (or less). If you do an order-1 literal coder this way, the top bit has 8 bits of context while the bottom bit gets 15 bits.

2. The LZMA literal-after-match coding is just an exclude

I wrote before (here : cbloom rants 08-20-10 - Deobfuscating LZMA ) about the "funny xor thing" in the literal-after-match coder. Turns out I was wrong, it's not really funny at all.

In LZ coding, there's a very powerful exclusion that can be applied. If you always output matches of the maximum length (more on this later), then you know that the next symbol cannot be the one that followed in the match. Eg. if you just copied a match from "what" but only took 3 symbols, then you know the next symbol cannot be "t", since you would have just done a length-4 match in that case.

This is a particularly good exclusion because the symbol that followed in the match is what you would predict to be the most probable symbol at that spot!

That is, say you need to predict the MPS (most probable symbol) at any spot in the file. Well, what you do is look at the preceding context of symbols and find the longest previous match of the context, and take the symbol that follows that context. This is "PPM*" essentially.

So when you code a literal after a match in LZ, you really want to do exclusion of the last-match predicted symbol. In a normal N-ary arithmetic coder, you would simply set the count of that symbol to 0. But it's not so simple with the binary arithmetic coder.

With a binary arithmetic coder, let's say you have the same top 7 bits as the exclude symbol. Well then, you know exactly what your bottom bit must be without doing any coding at all - it must be the bit that doesn't match the exclude symbol. At the next bit level above that, you can't strictly exclude, but you can probabilistically, exclude. That is :


Working backwards from the bottom :

At bit level 0 :

if symbol top 7 bits == exclude top 7 bits
then full exclude

that is, probability of current bit == exclude bit is zero

At bit level 1 :

if symbol top 6 bits == exclude top 6 bits
then

if symbol current bit matches exclude current bit, I will get full exclusion in the next level
so chance of that path is reduced but not zero

the other binary path is unaffected

that is, we're currently coding to decide between 4 symbols.  Something like :

0 : {A,B}
1 : {C,D}

we should have P0 = (PA+PB)/(PA+PB+PC+PD)

but we exclude one; let's say B, so instead we want to code with P0 = PA/(PA+PC+PD)

etc..

That is, the exclude is strongest at the bottom bit level, and becomes less strong as you go back up to higher bit levels, because there are more symbols on each branch than just the exclude symbol.

The LZMA implementation of this is :


  static void LitEnc_EncodeMatched(CRangeEnc *p, CLzmaProb *probs, UInt32 symbol, UInt32 matchByte)
  {
    UInt32 offs = 0x100;
    symbol |= 0x100;
    do
    {
      matchByte <<= 1;
      RangeEnc_EncodeBit(p, probs + (offs + (matchByte & offs) + (symbol >> 8)), (symbol >> 7) & 1);
      symbol <<= 1;
      offs &= ~(matchByte ^ symbol);
    }
    while (symbol < 0x10000);
  }

I rewrote it to understand it; maybe this is clearer :

void BinaryArithCodeWithExclude( ArithEncoder * arith, int val, int exclude )
{
    // same thing but maybe clearer :
    bool matched = true;        
    val |= 0x100; // place holder top bit

    for(int i=0;i<8;i++) // 8 bit literal
    {
        int exclude_bit = (exclude >> (7-i)) & 1;
        int bit = (val >> (7-i)) & 1;

        int context = val >> (8-i);
        if ( matched )
            context += exclude_bit?512:256;

        m_probs[context].encode(arith,bit);

        if ( bit != exclude_bit )
            matched = false;
    }
}

We're tracking a running flag ("matched" or "offs") which tells us if we are on the same path of the binary tree as the exclude symbol. That is, do all prior bits match. If so, that steps us into another group of contexts, and we add the current bit from the exclude symbol to our context.

Now of course "matched" always starts true, and only turns to false once, and then stays false. So we can instead implement this as two loops with a break :


void BinaryArithCodeWithExclude( ArithEncoder * arith, int val, int exclude )
{
    int ctx = 1; // place holder top bit

    // first loop in the "matched" part of the tree :
    for(;;)
    {
        int exclude_bit = (exclude >> 7) & 1; exclude <<= 1;
        int bit = (val >> 7) & 1; val <<= 1;
        m_bins[256 + ctx + (exclude_bit<<8)].encode(arith,bit);
        ctx = (ctx<<1) | bit;
        if ( ctx >= 256 )
            return;
        if ( bit != exclude_bit )
            break;
    }
    
    // then finish bits that are unmatched :
    // non-matched
    do
    {
        int bit = (val >> 7) & 1; val <<= 1;
        m_bins[ctx].encode(arith,bit);
        ctx = (ctx<<1) | bit;
    }
    while( ctx < 256 );
}

It's actually not weird at all, it's just the way to do symbol exclusion with a binary coder.

ADDENDUM : maybe I'm going to0 far saying it's not weird. It is a bit weird, sort of like point 1, it's actually not right, but in a good way.

The thing that's weird is that when coding the top bits, it's only using the bits seen so far of the exclude symbol. If you wanted to do a correct probability exclusion, you need *all* the bits of the exclude symbol, so that you can see exactly what symbol it is, how much probability it contributes to that side of the binary tree.

The LZMA way appears to work significantly better than doing the full exclude.

That is, it's discarding some bits of the exclude as context, and that seems to help due to some issue with sparsity and adaptation rates. The LZMA uses 3*256 binary probabilities, while full exclusion uses 9*256. (though in both cases, not all probs are actually used; eg. the first bit is always coded from the "matched" probs, not the "un-matched").

ADDENDUM2 : Let me say it again perhaps clearer.

The way to code a full exclude using binary modeling is :


coding "val" with exclusion of "exclude"

while bits of val coded so far match bits of exclude coded so far :
{
  N bits coded so far
  use 8 bits of exclude as context
  code current bit of val
  if current bit of val != same bit of exclude
    break;
}

while there are bits left to code in val
{
  N bits coded so far
  use N bits of val as context
  code current bit of val
}

The LZMA way is :

coding "val" with exclusion of "exclude"

while bits of val coded so far match bits of exclude coded so far :
{
  N bits coded so far
  use N+1 bits of exclude as context   // <- only difference is here
  code current bit of val
  if current bit of val != same bit of exclude
    break;
}

while there are bits left to code in val
{
  N bits coded so far
  use N bits of val as context
  code current bit of val
}

I also tried intermediate schemes like using N+2 bits of exclude (past bits+current bit+one lower bit) which should help a little to identify the exclusion probability without diluting statistics too much - they all hurt.

3. Optimal parsing and exclusions are either/or and equal

There are two major options for coding LZ-arith :

I. Do literal-after-match exclusion and always output the longest match. Use a very simplified optimal parser that only considers literal vs match (and a few other things). Essentially just a fancier lazy parse (sometimes called a "flexible parse").

II. Do not do literal-after-match exclusion, and consider many match lengths in an optimal parser.

It turns out that these give almost identical compression.

Case II has the simpler code stream because it doesn't require the literal-after-match special coder, but it's much much slower to encode at high compression because the optimal parser has to work much harder.

I've seen this same principle many times and it always sort of intrigues me. Either you can make a code format that explicitly avoids redundancy, or you can exploit that redundancy by writing an encoder that aggressively searches the coding space.

In this case the coding of exclude-after-match is quite simple, so it's definitely preferable to do that and not have to do the expensive optimal parse.

4. LZMA is very Pareto

I can't really find any modification to it that's a clear win. Obviously you can replace the statistical coders with either something faster (ANS) or something that gives more compression (CM) and you can move the space/speed tradeoff, but not in a clearly beneficial way.

That is, on the compression_ratio / speed / memory_use three-way tradeoff, if you hold any two of those constant, there's no improvement to be had in the other.

.. except for one flaw, which we'll talk about in the next post.

old rants