2/20/2015

02-20-15 - Shoes All Suck

Shoes are fucking bullshit. This is another in the long series of "everyone else is a fucking moron and I should be in charge of everything".

Skippable background : {

My feet are wide in the toebox area (forefoot). This has caused me enormous problems over the years. Basically no shoes fit. As a result of wearing shoes that are too tight in the forefoot, I developed a neuroma. My neuroma is in the usual place, between the metatarsal heads. There's basically no treatment for neuromas. I've tried orthotics and cortisone shots and all the usual stuff. There are surgical options, but the success rate is extremely low and the problem often recurs worse after surgery. The neuroma goes from being a minor annoyance to being incredibly painful when inflamed. It's affected what I can do in my life; I never run on concrete, I try to avoid long walks or standing on concrete. I still do things like hiking and backpacking, but I know I'll be in great pain afterward and just take that as part of the deal.

Recently it's gotten much worse. The problem is I've subconsciously developed a gait that avoids putting pressure on the neuroma. For the past I don't know how many years, I've been walking really crooked. I didn't really notice until it started having bad effects up the chain. First I had an SI slip in my left hip. I got it popped back in, then a month later had another SI slip. (I now know how to pop it back in myself, it's pretty easy). I started going to physical therapy for the left hip, and that immediately set off a groin strain on my left pelvis, which has been lingering now for several months. Basically short version is that the left foot neuroma has now caused bad effects all the way up the chain due to subconscious favoring.

(I had the same kind of issue with my shoulder separation ; if I could just force my body to keep moving normally and ignore the pain, it actually doesn't have any bad mechanical consequences. In fact the surgical treatment for a neuroma is just to try to kill the nerve. The pain is not indicative of damage occuring, you want to just power through it. The problem is that subconsciously the body just takes over and modifes your movements to avoid the pain, and those modifications are actually much worse for you than the original injury. I'm still doing PT almost every day to try convince my shoulder that it's allowed to move in a normal way and doesn't have to do janky things to avoid pops.)

So I decided that the neuroma pain that I've been just ignoring for a long time was something that I need to try to address if it's going to fuck up my hips and low back and so on. Part of that is trying to find shoes that give me enough toe room.

}

So. I need 4E wide shoes, which basically just don't exist. (New Balance is the only significant maker of wide shoes, and they're bullshit. They just aren't actually wide. NB makes it wide shoes by putting a larger upper on the same sole, which means the ball of my foot is hanging over the side of the sole. Fail New Balance.) Even the rare 4E shoes that do exist are wrong, which we will now discuss.

Anyway. That's just background. The point is shoes fucking suck. I'm going to go by type of shoes.

First, oxfords and traditional dress shoes and such. I don't know WTF the point of these shoes is. Raised heel = terrible for gait. No. Tapered toe box = neuroma pain. No. Hard leather sole = uncomfortable. Most have terrible grip soles. Overly rigid soles that don't bend with the step. Just non-functional all around.

It's a shame because I'm tempted to get custom made shoes with a last cut for my foot. The problem is that the custom shoe maker guys only make these fucking non-functional antiquated bullshit shoes.

Second, traditional running shoes.

The stupid minimalist community is all down on these, and I agree with some of the complaints, but there are good things about them.

Good : cushioning to protect pain from concrete and pebbles. flexible but structured upper than you can tighten to really fit snug on the foot so it doesn't slide around (thus allowing you to make a hard athletic "cut". If you can't make a cut in a shoe without the shoe sliding around, then the shoe does not work and goes in the fucking trash). generally grippy soles that aren't made of something retarded like vibram or leather.

Bad : too much heel-to-toe drop. Too much heel height in general causing early heel strike and strange gait. Too little feel of the ground. Too much taper in the toe box.

Let's talk about the last one : almost every shoe in the world has too much taper in the toe box. Feet are not pointed at the front. It's like the shoe makers all thing we have feet that are shaped like boats that come to a point. They're trying to sell us jester shoes.

In fact real feet (especially freak feet like mine) are pretty flat in front. The foot is widest at the ball of the foot and then does *NOT* taper ahead of the ball - the toes go straight ahead, and then there's a pretty straight line across the front edge. Not a pointy taper.

I will illustrate the problem on a shoe I actually like a lot. The Asics Gel Unifire TR 4E is one of the best shoes of the 100 I have tried in the past few months. It's a true 4E sole, has good structure in the heel so my foot doesn't slide all over, and is wide enough in the ball of the foot. *However* it's pointed, which makes it still a bit painful for me.

I believe these images are self explanatory :

... and everyone who makes shoes is fired.

(the other things I hate about the Unifire are : 1. they're ugly as piss, and 2. the tread chunks stick out past the edges of the upper, which makes them feel bigger than they are. A shoe should feel as small as possible, like it's not there. 3. the heel is way too high; I do not want or need high-heel sneakers.)

cliff note version :
These are better shoes than the stupid shit that mainstream shoe manufacturers make :

Now, you may object - "cbloom, you're taking your own personal weird foot issue and projecting on the masses; blaming shoe makers for making crap shoes when in fact their shoes work fine for most people". Mmm, yes, but I don't think so. The thing is, I don't think your shoes actually *do* work for you. You may wear something ridiculously narrow and pointed like a typical Nike shoe, and you've been lucky enough to not get any obvious painful problems like a bunion or hammertoe or neuroma, so you think your shoes are great. But I don't think they are. Most people who wear western shoes have got their pinky & big toes jammed inward. This is fucking up your feet. So you've endured the toe-smushing. It doesn't mean it's actually right, and I belive your feet would be better off in shoes that don't crush the toes.

Aside from the shape of the toe box, traditional running shoes would be easy to fix (as opposed to like, the average "casual shoe" or dress shoe which are completely hopeless). For me the perfect sole is something like what is on my beloved Onitsuka Tigers. I don't need a fucking one inch thick insane high-heel rubber sole like you get on New Balance or the Unifire. I want one centimeter like you get on the Tigers. (Tigers do have a tiny bit too much drop, I'd like a little more padding under the toes and a little less in the heel). I think that retro sneakers are as close as you can get to a perfect shoe. They're generally light weight, you can actually tie them tight to your foot, they have pretty good ground feel. They have the plusses of running shoes but without too much sole. Unfortunately for me there's not a single one of them that's made wide, and they're all too tapered in the front.

Summarizing the mainsteam shoes varieties :

And the retro/fashion sneaker is the closest; it's almost there. Like, if feet were shaped like fucking torpedos it would be right. It would be right if the fucking jester shoes were right. If you only had two toes and your foot tapered down to those two toes like a fucking sloth it would be right. If you were a 19th century Japanese girl with bound feet it would be great. Not for an actual human being. But, hey it's the best that the shoe industry has done, so, we'll give it a thumbs up and an encouraging pat on the back with "almost".

So, let's talk about "minimal" shoes.

Minimal shoes are fucking bullshit. Shoes in the real world need some padding. Maybe if we only walked on grass and dirt and rubber they would be fine. But we don't, we walk on concrete and gravel and other painful shit. Like the fucking moron "paleo" eaters they love to talk about how the "foot evolved to be barefoot" blah blah. Okay, first of all, a minimal shoes is not at all like actually being barefoot. That's just a lie. You don't move in the same way at all. You don't feel the ground, you can't grip with your toes, you don't even step the same way. Second of all, when the foot evolved we didn't walk on fucking concrete all day, we didn't have broken glass and nails, etc. Third of all, when apes evolved the ability to make things one of the very first fucking things they make in any primitive society is shoes.

(the only minimal shoe that I sort of believe in is the VFF, which at least allows your toes to grip the ground and so on, in theory. The "minimal" shoe I'm primarily objecting to here is something like the vivobarefoot, which is basically a leather sock. It provides no padding. It slips around on your foot if you try to do anything athletic in it. It removes your toes ability to actually feel the ground. It's total bullshit.)

(I do actually believe that truly barefoot is the best way to be, when possible. That's very different than "barefoot" shoes. It also only makes sense to me on surfaces that are sufficiently soft. eg. maybe if you have a nice rubber running track that you know to be free of nails and glass - run on it truly barefoot, and yeah I totally support that)

The distressing thing is that in some ways the minimal shoes are doing things right. Some of them actually have quite wide toe boxes (*) that let your feet spread nicely. I like zero drop. Actually for me I think a tiny bit of drop is better, but zero drop is better than the huge drop of traditional shoes. I like light weight. So that's all good. What I don't like is zero padding. I also don't like unstructured; if a shoe slips around it's not right.

(* = unfortunately almost all of the minimal shoes CLAIM to have "wide toe boxes" but very few of them actually do. Altra, innov8, not at all wide. VivoBarefoot is the best I've found in terms of shoe shape - actually wide and squared off in the front (they look like clown shoes), but they have literally zero padding and are unwearable. Reebok Crossfit Nanos and Lems shoes are probably good for someone with a normal width foot; they feel like a 2E to me, which is not wide enough for me but probably okay for others.)

Crossfit Nanos and Lems are what I would call "semi-minimal". They have a little more structure and actually have about 1 cm of sole thickness for some padding. Lems are the best designed of all the shoes I tried. If I fit Lems, that's what I would wear. Unfortunately, they only make whole Euro sizes (**), and they're also a tiny bit too narrow in the toes for me. The Nanos should be pretty great, but for some moronic reason they've made the bottom of the shoe out of hard plastic instead of soft rubber, so even though it's thick enough it's painful to walk on and now flexible enough. (yeah yeah weight lifting, bullshit, whatever).

(** = whole Euro sizes, WTF. The step between sizes is way too big.)

The best shaped shoe that I've found is the Patagonia Loulu. It's a hybrid shoe; semi-minimal, semi-sneaker. Unfortunately it's almost ruined by a fucking retarded synthetic sole. It's some recycled plastic garbage and it just doesn't work. It's widely known to wear away insanely quickly, and despite that it also has zero grip in the wet. It's like ice skating. Fucking awful moronic sole material. JUST USE NATURAL RUBBER you fucking morons, it was perfect a hundred years ago and you're only making it worse.

While I'm at it let me complain about shoe shopping in general :

1. How hard is it to fucking standardize the sizes? I should be able to measure my foot and buy a shoe that will fit. Instead a fucking "size 10" is slightly different in every damn brand.

2. Widths are even worse. Half the 4E shoes I bought were just not actually wide at all. Even the ones that were wide suffer from a fundamental problem with unclarity of the specification. There's an ambiguity in width. Is it just the toe box that's wide, and the heel is narrow? Are they both wide? Is it just the upper (bullshit) or the sole also? Furthermore, in a lot of brands "wide" actually means "fat fuck" and the shoe is just bigger all over.

3. How fucking terrible is the internet. I have to search on a whole mess of different web sites. They all have different searches, most of which are broken (like width is not an option, or I can't filter by size until I select a type of shoe). Lots of shoes are only for sale on the manufacturer's web site which I have to go digging around to find and then use their own custom terrible interface. But most of all -

- none of them actually take advantage of the internet. Shopping sites could easily be wikis where customers can mark up entries with extra information. Some people would be nerdy enough to actually measure shoes and add that information so I could search by true measurements instead of the stupid official sizes. I should be able to cross-index all the different shopping sites with one global index. But no. None of that. It's just so fucking primitive.

4. Why can't I get custom made sneakers? I should be able to measure my foot and provide specs and get shoes made to order for reasonable fees.

It's all so antiquated. It requires actually trying on 100's of shoes to find one that fits. There will of course always be some aspect of personal feel, but right now it's not even that. It's like I put on a shoe and immediately go "nope, this 4E is not actually wide at all" or "nope, this supposed size 10.5 is actually a 10 or 11". I'm just wasting all this time weeding out candidates that could have been done systematically.

2/14/2015

02-14-15 - Template Arithmetic Coder Generation

I did something for Oodle LZA that I thought worked out very neatly. I used templates to generate efficient & arbitrary variants of arithmetic coder decompositions.

First you define a pattern. Mine was :


"coder" pattern implements :

void reset();
 int decode( arithmetic_decoder * dec );
void encode( arithmetic_encoder * enc, int val );
int cost(int val) const;

where encode & decode also both adapt the model (if any).

You can then implement that pattern in various ways.

Perhaps the first "coder" pattern is just a single bit :


template <int t_tot_shift, int t_upd_shift>
struct arithbit_updshift
{
    U16  m_p0;

    void reset()
    {
        m_p0 = 1<<(t_tot_shift-1);
    }

    .. etc ..

};

And then you do some N-ary coders. The standard ones being :


template <int t_alphabet_size>
struct counting_coder;

template <int t_numbits, typename t_arithbit>
struct bitwise_topdown_coder;

template <int t_numbits, typename t_arithbit>
struct bitwise_bottomup_coder;

template <int t_maxval, typename t_arithbit>
struct unary_coder;

I'm not going to show implementations of all of these in this post, but I'll do a few here and there for clarity. The point is more about the architecture than the implementation.

For example :


template <int t_numbits, typename t_arithbit>
struct bitwise_topdown_coder
{

    enum { c_numvals = 1<<t_numbits };
    //t_arithbit m_bins[c_numvals-1]; // <- what's needed (use ctx-1 index)
    t_arithbit m_bins[c_numvals]; // padded for simpler addressing (use ctx index)
    
    int decode( arithmetic_decoder * dec )
    {
        int ctx = 1;

        for(int i=0;i<t_numbits;i++)
        {
            int bit = m_bins[ctx].decode(dec);
            ctx <<= 1; ctx |= bit;
        }

        return ctx & (c_numvals-1);
    }

    etc ...
};

and "t_arithbit" is your base coder pattern for doing a single modeled bit. By making that a template parameter it can be something like

arithbit_countn0n1

or a template like :

arithbit_updshift<12,5>

etc.

The fun thing is you can start combining them to make new coders which also fit the pattern.

For example, say you want to do a bitwise decomposition coder, but you don't have an even power of 2 worth of values? And maybe you don't want to put your split points right in the middle?


// t_fracmul256 is the split fraction times 256
// eg. t_fracmul256 = 128 is middle splits (binary subdivision)
// t_fracmul256 = 0 is a unary coder

template <int t_numvals, int t_fracmul256, typename t_arithbit>
struct bitwise_split_coder
{
    enum { c_numvals = t_numvals };
    enum { c_numlo = RR_CLAMP( ((t_numvals*t_fracmul256)/256) , 1, (t_numvals-1) ) };
    enum { c_numhi = t_numvals - c_numlo };

    t_arithbit  m_bin;
    bitwise_split_coder<c_numlo,t_fracmul256,t_arithbit> m_lo;
    bitwise_split_coder<c_numhi,t_fracmul256,t_arithbit> m_hi;

    void reset()
    {
        m_bin.reset();
        m_lo.reset();
        m_hi.reset();
    }

    void encode(arithmetic_encoder * enc,int val)
    {
        if ( val < c_numlo )
        {
            m_bin.encode(arith,0);
            m_lo.encode(arith,val);
        }
        else
        {
            m_bin.encode(arith,1);
            m_hi.encode(arith, val - c_numlo );
        }
    }

    .. etc ..

};

+ explicit template specialize for t_numvals=1,2

This lets you compile-time generate funny branching trees to be able to handle something like "my alphabet has 37 symbols, and I want to code each interval as a binary flag at 1/3 of the range, so the first event is [0-12][13-37]" and so on.

And then you can make yourself some generic tools for plugging coders together. The main ones I use are :


// val_split_coder :
// use t_low_coder below t_low_count
// then t_high_coder above t_low_count

template <int t_low_count, typename t_low_coder, typename t_high_coder , typename t_arithbit>
struct val_split_coder
{
    t_arithbit  m_bin;
    t_low_coder m_lo;
    t_high_coder m_hi;

    void encode(arithmetic_encoder * enc,int val)
    {
        if ( val < t_low_count )
        {
            m_bin.encode(arith,0);
            m_lo.encode(arith,val);
        }
        else
        {
            m_bin.encode(arith,1);
            m_hi.encode(arith, val - t_low_count );
        }
    }

    .. etc .. 

};

// bit_split_coder :
// use t_low_coder for t_low_bits
// use t_high_coder for higher bits
// (high and low bits are independent)

template <int t_low_bit_count, typename t_low_coder, typename t_high_coder >
struct bit_split_coder
{
    t_low_coder m_lo;
    t_high_coder m_hi;

    void encode(arithmetic_encoder * enc,int val)
    {
        int low = val & ((1<<t_low_bit_count)-1);
        int high = val >> t_low_bit_count;
        m_lo.encode(enc,low);
        m_hi.encode(enc,high);
    }

    .. etc .. 
};

// bit_split_coder_contexted :
// split bits, code hi then low with hi as context
// (used for decomposition of a value where the bits are dependent)

template <int t_low_bit_count, typename t_low_coder, int t_high_bit_count, typename t_high_coder >
struct bit_split_coder_contexted
{
    t_high_coder m_hi;
    t_low_coder m_lo[(1<<t_high_bit_count)];

    void encode(arithmetic_encoder * enc,int val)
    {
        int high = val >> t_low_bit_count;
        int low = val & ((1<<t_low_bit_count)-1);

        m_hi.encode(enc,high);
        m_lo[high].encode(enc,low);
    }

    .. etc .. 
};

So that gives us a bunch of tools. Then you put them together and make complicated things.

For example, my LZ match length coder is this :


val_split_coder< 8 , 
    bitwise_topdown_coder< 3 , arithbit_updshift<12,5> > ,  // low val coder
    numsigbit_coder< unary_coder<16, arithbit_updshift<12,5> > > ,  // high val coder
    arithbit_updshift<12,5> >

which says : first code a binary event for length < 8 or not. If length < 8 then code it as a 3-bit binary value. If >= 8 then code it using a num-sig-bits decomposition where the # of sig bits are written using unary (arithmetic coded) and the remainder is sent raw.

And the standard LZ offset coder that I described in the last post is something like this :


val_split_coder< 64 , 
    bitwise_topdown_coder< 6 , arithbit_updshift<12,5> > , // low vals
    bit_split_coder< 5 ,  // high vals
        bitwise_bottomup_coder< 5 , arithbit_updshift<12,5> > ,  // low bits of high vals
        numsigbit_coder< unary_coder<30, arithbit_updshift<12,5> > > ,  // high bits of high vals
    > ,
    arithbit_updshift<12,5> 
    >

To be clear : the advantage of this approach is that you can easily play with different variations and decompositions, plug in different coders for each portion of the operation, etc.

The code generated by this is very good, but once you fully lock down your coders you probably want to copy out the exact cases you used and hand code them, since the human eye can do things the optimizer can't.

2/10/2015

02-10-15 - LZ Offset Modeling Rambles

Canonical LZ offset coding :

The standard way to send LZ offsets in a modern LZ coder is like this :

1. Remove the bottom bits. The standard number is 4-7 bits.


low = offset & ((1<<lowbits)-1);
offset >>= lowbits;

then you send "low" entropy coded, using a Huffman table or arithmetic or whatever. (offsets less than the mask are treated separately).

The point of this is that offset bottom bits have useful patterns in structured data. On text this does nothing for you and could be skipped. On structured data you get probability peaks for the low bits of offset at 4,8,12 for typical dword/qword data.

You also get a big peak at the natural structure size of the data. eg. if it's a D3DVertex or whatever and it's 36 or 72 bytes there will be big probability peaks at 36,72,108,144.

2. Send the remaining high part of offset in a kind of "# sig bits" (Elias Gamma) coding.

Count the number of bits on. Send the # of bits on using an entropy coder.

Then send the bits below the top bit raw, non-entropy coded. Like this :


topbitpos = bit_scan_top_bit_pos( offset );
ASSERT( offset >= (1<<topbitpos) && offset < (2<<topbitpos) );

rawbits = offset & ((1<<topbitpos)-1);

send topbitpos entropy coded
send rawbits in topbitpos bits

Optionally you might also entropy-code the bit under the top bit. You could just arithmetic code that bit (conditioned on the position of the top bit as context). Or you can make an expanded top-bit-position code :


slot = 2*topbitpos + ((offset>>(topbitpos-1))&1)

send "slot" entropy coded
send only topbitpos-1 of raw bits

More generally, you can define slots by the number of raw bits being sent in each level. We've done :

Straight #SB slots :

{0,1,2,3,4,5,...}

Slot from #SB plus one lower bit :

{0,1,1,2,2,3,3,4,4...}

More generally it helps a little to put more slots near the bottom :

{0,0,1,1,1,2,2,2,3,3,4,4,5,6,7,8...}

but in the more general method you can't use a simple bitscan to find your slot.

The intermediate bits that are sent raw do have a slight probability decline for larger values. But there's very little to win there, and a lot of noise in the modeling. In something like a Huffman-coded-LZ, sending the code lengths for extra slots there costs more than the win you get from modeling it.

With this we're just modeling the general decrease in probability for larger offsets. Something that might be interesting that I've never heard of anyone doing is to fit a parameteric probability function, laplacian or something else, to offset. Instead of counting symbols to model, instead fit the parameteric function, either adaptively or statically.


So, a whole offset is sent something like this :


offset bits (MSB on left) :

  SSRRRRRRRRLLLLL

S = bit sent in slot index, entropy coded
R = raw bit
L = low bit, entropy coded


Now some issues and followup.

I. The low bits.

The low-bits-mask method actually doesn't handle the structure size very well. It does well for the 4-8 dword/qword stuff, because those are generally divide into the low bit mask evenly. (a file that had trigram structure, like an RGB bitmap, would have problems).

The problem is the structure size is rarely an exact multiple of the power of two "lowbits" mask. For example :


 36&0xF = 0x04 = 0100
 72&0xF = 0x08 = 1000
108&0xF = 0x0C = 1100
144&0xF = 0x00 = 0000

A file with structure size of 36 will make four strong peaks if the lower 4 bits are modeled.

If instead of doing (% 16) to get low bits, you did (% 36), you would get a perfect single peak at zero.

Any time the structure size doesn't divide your low bits, you're using extra bits that you don't need to.

But this issue also gets hidden in a funny way. If you have "repeat match" codes then on strongly structured data your repeat offsets will likely contain {36,72,...} which means those offsets don't even go into the normal offset coder. (the redundancy between low-bits modeling and repeat matches as a way of capturing structure is another issue that's not quite right).

II. Low bit structure is not independent of high bits.

Sometimes you get a file that just has the exact same struct format repeated throughout the whole file. But not usually.

It's unlikely that the same structure that occurs in your local region (4-8-36 for example) occurs back at offset one million. It might be totally different structure there. Or, it might even be the same structure, but shifted by 1 because there's an extra byte somewhere, which totally mucks up the low bits.

The simplest way to model this is to make the lowbits entropy coder be conditioned by the high bits slot. That is, different models for low offsets vs higher offsets. The higher offsets will usually wind up with low bits that are nearly random (equiprobable) except in the special case that your whole file is the same structure.

More generally, you could remember the local structure that you detect as you scan through the file. Then when you match back to some prior region, you look at what the structure was there.


An obvious idea is to use this canonical coding for offsets up to 32768 (or something), and then for higher offsets use LZSA.

So essentially you have a small sliding LZ77 window immediately preceding your current pointer, and as strings fall out of the LZ77 window they get picked up in a large LZSA dictionary behind.

2/09/2015

02-09-15 - LZSA - Some Results

So I re-ran some Oodle Network tests to generate some LZSA results so there's some concreteness to this series.

"Oodle Network" is a UDP packet compressor that works by training a model/dictionary on captured packets.

The shipping "OodleNetwork1 UDP" is a variant of LZP. "OodleStaticLZ" is LZSA-Basic and obviously HC is HC.

Testing on one capture with dictionaries from 2 - 64 MB :


test1   380,289,015 bytes

OodleNetwork1 UDP [1|17] : 1388.3 -> 568.4 average = 2.442:1 = 59.06% reduction
OodleNetwork1 UDP [2|18] : 1388.3 -> 558.8 average = 2.484:1 = 59.75% reduction
OodleNetwork1 UDP [4|19] : 1388.3 -> 544.3 average = 2.550:1 = 60.79% reduction
OodleNetwork1 UDP [8|20] : 1388.3 -> 524.0 average = 2.649:1 = 62.26% reduction
OodleNetwork1 UDP [16|21] : 1388.3 -> 493.7 average = 2.812:1 = 64.44% reduction
OodleNetwork1 UDP [32|22] : 1388.3 -> 450.4 average = 3.082:1 = 67.55% reduction
OodleNetwork1 UDP [64|23] : 1388.3 -> 390.9 average = 3.552:1 = 71.84% reduction

OodleStaticLZ [2] : 1388.3 -> 593.1 average = 2.341:1 = 57.28% reduction
OodleStaticLZ [4] : 1388.3 -> 575.2 average = 2.414:1 = 58.57% reduction
OodleStaticLZ [8] : 1388.3 -> 546.1 average = 2.542:1 = 60.66% reduction
OodleStaticLZ [16] : 1388.3 -> 506.9 average = 2.739:1 = 63.48% reduction
OodleStaticLZ [32] : 1388.3 -> 445.8 average = 3.114:1 = 67.89% reduction
OodleStaticLZ [64] : 1388.3 -> 347.8 average = 3.992:1 = 74.95% reduction

OodleStaticLZHC [2] : 1388.3 -> 581.6 average = 2.387:1 = 58.10% reduction
OodleStaticLZHC [4] : 1388.3 -> 561.4 average = 2.473:1 = 59.56% reduction
OodleStaticLZHC [8] : 1388.3 -> 529.9 average = 2.620:1 = 61.83% reduction
OodleStaticLZHC [16] : 1388.3 -> 488.6 average = 2.841:1 = 64.81% reduction
OodleStaticLZHC [32] : 1388.3 -> 429.4 average = 3.233:1 = 69.07% reduction
OodleStaticLZHC [64] : 1388.3 -> 332.9 average = 4.170:1 = 76.02% reduction

--------------

test2   423,029,291 bytes

OodleNetwork1 UDP [1|17] : 1406.4 -> 585.4 average = 2.402:1 = 58.37% reduction
OodleNetwork1 UDP [2|18] : 1406.4 -> 575.7 average = 2.443:1 = 59.06% reduction
OodleNetwork1 UDP [4|19] : 1406.4 -> 562.0 average = 2.503:1 = 60.04% reduction
OodleNetwork1 UDP [8|20] : 1406.4 -> 542.4 average = 2.593:1 = 61.44% reduction
OodleNetwork1 UDP [16|21] : 1406.4 -> 515.6 average = 2.728:1 = 63.34% reduction
OodleNetwork1 UDP [32|22] : 1406.4 -> 472.8 average = 2.975:1 = 66.38% reduction
OodleNetwork1 UDP [64|23] : 1406.4 -> 410.3 average = 3.428:1 = 70.83% reduction

OodleStaticLZ [2] : 1406.4 -> 611.6 average = 2.300:1 = 56.52% reduction
OodleStaticLZ [4] : 1406.4 -> 593.0 average = 2.372:1 = 57.83% reduction
OodleStaticLZ [8] : 1406.4 -> 568.2 average = 2.475:1 = 59.60% reduction
OodleStaticLZ [16] : 1406.4 -> 528.6 average = 2.661:1 = 62.42% reduction
OodleStaticLZ [32] : 1406.4 -> 471.1 average = 2.986:1 = 66.50% reduction
OodleStaticLZ [64] : 1406.4 -> 374.2 average = 3.758:1 = 73.39% reduction

OodleStaticLZHC [2] : 1406.4 -> 600.4 average = 2.342:1 = 57.31% reduction
OodleStaticLZHC [4] : 1406.4 -> 579.9 average = 2.425:1 = 58.77% reduction
OodleStaticLZHC [8] : 1406.4 -> 552.8 average = 2.544:1 = 60.70% reduction
OodleStaticLZHC [16] : 1406.4 -> 511.8 average = 2.748:1 = 63.61% reduction
OodleStaticLZHC [32] : 1406.4 -> 453.8 average = 3.099:1 = 67.73% reduction
OodleStaticLZHC [64] : 1406.4 -> 358.3 average = 3.925:1 = 74.52% reduction

Here's a plot of the compression on test1 ; LZP vs. LZSA-HC :

Y axis is comp/raw and X axis is log2(dic mb)

What you should see is :

OodleNetwork1 (LZP) is better at small dictionary sizes. I think this is just because it's a lot more tweaked out; it's an actual shipping quality codec, whereas the LZSA implementation is pretty straightforward. Things like the way you context model, how literals & lengths are coded, etc. needs tweakage.

At around 8 MB LZSA catches up, and then as dictionary increases it rapidly passes LZP.

This is the cool thing about LZSA. You can just throw more data at the dictionary and it just gets better. With normal LZ77 encoding you have to worry about your offsets taking more bits. With LZ77 or LZP you have to make sure the data's not redundant or doesn't replace other more useful data. (OodleNetwork1 benefits from a rather careful and slow process of optimizing the dictionary for LZP so that it gets the most useful strings)

Memory use of LZSA is quite a bit higher per byte of dictionary, so it's not really a fair comparison in that sense. It's a comparison at equal dictionary size, not a comparison at equal memory use.

2/07/2015

02-07-15 - LZSA - Index Post

LZSA series :

cbloom rants 02-04-15 - LZSA - Part 1
cbloom rants 02-04-15 - LZSA - Part 2
cbloom rants 02-05-15 - LZSA - Part 3
cbloom rants 02-05-15 - LZSA - Part 4
cbloom rants 02-06-15 - LZSA - Part 5
cbloom rants 02-06-15 - LZSA - Part 6
cbloom rants 02-09-15 - LZSA - Some Results

And other suffix related posts :

cbloom rants 09-27-08 - 2 (On LZ and ACB)
cbloom rants 09-03-10 - LZ and Exclusions
cbloom rants 09-24-11 - Suffix Tries 1
cbloom rants 09-24-11 - Suffix Tries 2
09-26-11 - Tiny Suffix Note
cbloom rants 09-28-11 - String Matching with Suffix Arrays
cbloom rants 09-29-11 - Suffix Tries 3 - On Follows with Path Compression
cbloom rants 06-21-14 - Suffix Trie Note
cbloom rants 07-14-14 - Suffix-Trie Coded LZ
cbloom rants 08-27-14 - LZ Match Length Redundancy
cbloom rants 09-10-14 - Suffix Trie EOF handling

02-07-15 - LZMA note

Originally posted at encode.ru

I just had a look in the LZMA code and want to write down what I saw. (I could be wrong; it's hard to follow!) The way execution flows in LZMA is pretty weird, it jumps around : The outer loop is for the encoding position. At each encoding position, he asks for the result of the optimal parse ahead. The optimal parse ahead caches results found in some sliding window ahead. If the current pos is in the window, return it, else fill a new window. To fill a window : find matches at current position. set window_len = longest match length fill costs going forward as described by Bulat above as you go forward, if a match length crosses past window_len, extend the window if window reaches "fast bytes" then break out when you reach the end of the window, reverse the parse and fill the cache now you can return the parse result at the base position of the window So, some notes : 1. LZMA in fact does do the things that I proposed earlier - it updates statistics as soon as it hits a choke point in the parse. The optimal-parse-ahead window is not always "fast bytes". That's a maximum. It can also stop any time there are no matches crossing a place, just as I proposed earlier. 2. LZMA caches the code costs ("prices") of match lengths and offsets, and updates them periodically. The match flags and literals are priced on the fly using the latest encoded statistics, so they are quite fresh. (from the start of the current parse window) 3. LZMA only stores 1 arrival. As I noted previously this has the drawback of missing out on some non-greedy steps. *However* and this is something I finally appreciate - LZMA also considers some multi-operation steps. LZMA considers the price to go forward using : literal rep matches normal matches If you stored all arrivals, that's all you need to consider and it would find the true optimal parse. But LZMA also considers : literal + rep0 match rep + literal + rep0 normal match + literal + rep0 These are sort of redundant in that they can be made from the things we already considered, but they help with the 1 arrivals problem. In particular they let you carry forward a useful offset that might not be the cheapest arrival otherwise. ( literal + rep0 match is only tested if pos+1 's cheapest arrival is not a literal from the current pos ) That is, it's explicitly including "gap matches" in the code cost as a way of finding slightly-non-local-minima.

2/06/2015

02-06-15 - LZSA - Part 6

Some wrapping up.

I've only talked about LZSA for the case of static dictionaries. What about the more common case of scanning through a file, the dictionary is dynamic from previously transmitted data?

Well, in theory you could use LZSA there. You need a streaming/online suffix trie construction. That does exist, and it's O(N) just like offline suffix array construction. So in theory you could do that.

But it's really no good. The problem is that LZSA is transmitting substrings with probability based on their character counts. That's ideal for data that is generated by a truly static source (that also has no finite state patterns). Real world data is not like that. Real world sources are always locally changing (*). This means that low-offset recent data is much better correlated than old data. That is easy to model in LZ77 but very hard to model in LZSA; you would have to somehow work the offsets of the strings into their code space allocation. Data that has finite state patterns also benefits from numeric modeling of the offsets (eg. binary 4-8 patterns and so on).

(* = there's also a kind of funny "Flatland" type thing that happens in data compression. If you're in a 2d Flatland, then 2d rigid bodies keep their shape, even under translation. However, a 3d rigid body that's translating through your slice will appear to change shape. The same thing happens with data compression models. Even if the source comes from an actual static model, if your view of that model is only a partial view (eg. your model is simpler than the real model) then it will appear to be a dynamic model to you. Say you have a simple model with a few parameters, the best value of those parameters will change over time, appearing to be dynamic.)

Furthermore, at the point that you're doing LZSA-dynamic, you've got the complexity of PPM and you may as well just do PPM.

The whole advantage of LZSA is that it's an incredibly simple way to get a PPM-like static model. You just take your dictionary and suffix sort it and you're done. You don't have to go training a model, you don't have to find a way to compact your PPM nodes. You just have suffix sorted strings.


Some papers for further reading :

"A note on the Ziv - Lempel model for compressing individual sequences" - Langdon's probability model equivalent of LZ

"Unbounded Length Contexts for PPM" - the PPM* paper

"Dictionary Selection using Partial Matching" - LZ-PM , an early ROLZ

"PPM Performance with BWT Complexity" - just a modification of PPM* with some notes.

"Data compression with finite windows" - LZFG ; the "C2" is the Suffix Trie LZFG that I usually think of.

Mark Nelson's Suffix Tree writeup ; see also Larsson, Senft, and Ukkonen.

There's obviously some relation to ACB, particularly LZSA*, just in the general idea of finding the longest backwards context and then encoding the longest forward match from there, but the details of the coding are totally unrelated.

02-06-15 - LZSA - Part 5

LZSA is not really an LZ. This is kind of what fascinates me about LZSA and why I think it's so interesting (*). Ryg called it "lz-ish" because it's not really LZ. It's actually much closer to PPM.

(* = it's definitely not interesting because it's practical. I haven't really found a use of LZSA in the real world yet. It appears to be a purely academic exercise.)

The fundamental thing is that the code space used by LZSA to encode is match is proportional to the probability of that substring :


P(abc) = P(a) * P(b|a) * P(c|ab)

where P is the probability as estimated by observed counts. That kind of code-space allocation is very fundamentally PPM-ish.

This is in contrast to things like LZFG and LZW that also refer directly to substrings without offsets, but do not have the PPM-ish allocation of code space.

The funny thing about LZSA is that naively it *looks* very much like an LZ. The decoder of LZSA-Basic is :


LZSA-Basic decoder rough sketch :

{

handle literal case
decode match_length

arithmetic_fetch suffix_index in dictionary_size

match_ptr = suffix_array[suffix_index];

copy(out_ptr, match_ptr, match_length );
out_ptr += match_length;

arithmetic_remove get_suffix_count( suffix_index, match_length ) , dictionary_size

}

whereas a similar LZ77 on a static dictionary with a flat offset model is :

LZ77-static-flat decoder rough sketch :

{

handle literal case
decode match_length

arithmetic_fetch offset_index in dictionary_size

match_ptr = dictionary_array + offset_index;

copy(out_ptr, match_ptr, match_length );
out_ptr += match_length;

arithmetic_remove {offset_index,offset_index+1} , dictionary_size

}

they are almost identical. The only two changes are : 1. an indirection table for the match index, and 2. the arithmetic_remove can have a range bigger than one, eg. it can remove fewer than log2(dictionary_size) bits.

We're going to have a further look at LZSA as a PPM by examining some more variants :


LZSA-ROLZ :

ROLZ = "reduced offset LZ" uses some previous bytes of context to reduce the set of strings that can be matched.

This is trivial to do in LZSA, because we can use the same suffix array that we used for string matching to do the context lookup as well. All we have to do is start the suffix array lookup at an earlier position than the current pointer.

eg. instead of looking up matches for "ptr" and requiring ML >= 3 , we instead look up matches for "ptr-2" and require ML >= 5. (that's assuming you want to keep the same MML at 3. In fact with ROLZ you might want to decrease MML because you're sending offsets in fewer bits, so you could use an MML of 2 instead, which translates to a required suffix lookup ML of 4).

That is, say my string to encode is "abracket" and I've done "ab" so I'm at "racket". My static dictionary is "abraabracadabra". The suffix sort is :

a
aabracadabra
abra
abraabracadabra
abracadabra
acadabra
adabra
bra
braabracadabra
bracadabra
cadabra
dabra
ra
raabracadabra
racadabra
The dictionary size is 15.

With LZSA-Basic I would look up "racket" find a match of length 3, send ML=3, and index = 14 in a range of 15.

With LZSA-ROLZ-o2 I would do :


string = "ab|racket"

look up context "ab" and find the low & high suffix indexes for that substring
 (low index = 2, count = 3)

look up "abracket" ; found "abracadabra" match of length 5
 at index = 4

send ML=3

arithmetic_encode( suffix_index - context_low_index , context_count )
  (that's 2 in a range of 3)

You could precompute and tabulate the suffix ranges for the contexts, and then the complexity is identical to LZSA-Basic.

In LZSA-ROLZ you cannot encode any possible string, so MML down to 1 like LZSA-HC is not possible. You must always be able to escape out of your context to be able to encode characters that aren't found within that context.

LZSA-Basic had the property of coding from order-0,order-1,2,3,.. ML, jumping back to order-0. In LZSA-ROLZ, instead of jumping down to order 0 after the end of a match, you jump down to order-2 (or whatever order you chose for the ROLZ). You might then have to jump again to order-0 to encode a literal. So you still have this pattern of the context order going up in waves and jumping back down, you just don't jump all the way to order-0.


LZSA-ROLZ* :

(pronounced "ROLZ star" ; named by analogy to "PPM*" (PPM star) which starts from the longest possible context)

This idea can be taken further, which turns out to be interesting.

Instead of duing ROLZ with a fixed order, do ROLZ from the highest order possible. That is, take the current context (preceding characters) and find the longest match in the dictionary. In order to do that efficiently you need another lookup structure, such as a suffix trie on the reversed dictionary (a prefix tree). The prefix tree should have pointers to the same string in the suffix tree.

eg. say you're coding "..abcd|efgh.."

You look up "dcba..." in the prefix tree (context backwards). The longest match you find is "dcbx..". So you're at order-3. You take the pointer over to the suffix tree to find "bcd" in the suffix tree. You then try to code "efgh..." from the strings that followed "bcd" in the dictionary.

You pick a match, send a match length, send an offset.

Say the deepest order you found is order-N. Then if you code a match, you're coding at order-N+1,order-N+2,etc.. as you go through the match length.

The funny thing is : those are the same orders you would code if you just did PPM* and only coded symbols, not string matches.

Say you're doing PPM* and you find an order-N context (say "abcd"). You successfully code a symbol (say "x"). You move to the next position. Your context now is "abcdx" - well, that context must occur in the dictionary because you successfully coded an x following abcd. Therefore you will an order-N+1 context. Furthermore there can be no longer context or you would have found a longer one at the previous location as well. Therefore with PPM* as you successfully code symbols you will always code order-N, order-N+1,order-N+2 , just like LZSA!

If LZSA-ROLZ* can't encode a symbol at order-N it must escape down to lower orders. You want to escape down to the next lower order that has new following characters. You need to use exclusion.

Remember than in LZSA the character counts are used just like PPM, because of the way the suffix ranges form a probability distribution. If the following strings are "and..,at..,boobs.." then character a is coded with a 2/3 probability.

There are a few subtle differences :

In real PPM* they don't use just the longest context. They use the *shortest* determinstic context (if there is any). If there is any deterministic context, then the longest context is determinstic, predicting the same thing. But there may be other mismatching contexts that predic the same thing and thus affect counts. That is :


..abcdefg|x    abcdefg is the longest context, and only x follows it
.....xefg|y    context "efg" sees both x and y.  So "defg" is the shortest determinstic context
....ydefg|x    but "ydefg" also predicts x

So the longest determistic context only sees "x" with a count of 1

But if you use the shortest determinstic context, you see "x" with a count of 2

And that affects your escape estimation.

The big difference is in the coding of escapes vs. match lengths.

And if you code matches in LZSA* from orders lower than the deepest, that results in different order selection than PPM*.

2/05/2015

02-05-15 - LZSA - Part 4

When you don't find a match in LZSA that actually carries a lot of information.

Say you have an MML of 2 and encode a literal "a". That means all digrams in the dictionary that start with "a" are exclusions. The following character cannot be any of those, and those are the most likely characters so they are worth a lot.

Discussed previously here :

cbloom rants 09-03-10 - LZ and Exclusions

Even when you send a match, you have a powerful exclude from the characters that cannot extend that match. In a normal LZ77 like LZMA this is done for the *single* match offset that we sent with a literal-after-match exclude. In LZSA we can easily exclude *all* the characters that could have extended the match.

LZSA-HC = LZSA- High Compression

For LZSA-HC we will take MML down to 1, so literals and matches are unified and we are always writing "matches". However we will also write matches by first writing the first character of the match and then writing the rest of the match. I assume that you have ensured the dictionary contains every character at least once, so there's no issue of impossible encodings.

Algorithm LZSA-HC :


Data structures required are the same as LZSA-Basic

When the match lookup structure (typically suffix trie) returns a match
it should also provide the set of characters which were found in the dictionary
 but did not match the next character in the search string


To encode :

initialize exclude_set = { emtpy }

for all of ptr

encode current character (*ptr) using literal coder,
 doing exclusion from current exclude_set

look up the current string (ptr) in the match lookup structure
set exclude_set = characters found that followed match but != ptr[match_length]

transmit match length ( >= 1 )

if ( match_length > 1 )
{
  send the suffix substring matched :
  simply one arithmetic encode call

  we only need to send it within the range of suffixes of the already-sent first character
    char_suffix_low[ *ptr ] is a table lookup
    char_suffix_count = char_suffix_low[ *ptr + 1 ] - char_suffix_low;

  arithmetic_encode( suffix_sort_low_index - char_suffix_low, suffix_count , char_suffix_count );
}

ptr += match_length;

a few notes on this.

We only send matches, length 1,2,3... Then exclude all characters that could not come after the match. This exclusion is exactly the same as exclusion in PPM when you escape down orders. In LZ you escape from order-ML down to order-0.

Coding the first character of the match separately is just so that I can use a different coder there. I use order-1 plus some bits of match history as context. For purity, I could have left that out and just always coded a suffix index for the match. In that case "exclusion" would consist of subtracting off the char_suffix_count[] number of suffixes from each excluded character.

Because match_length is sent after the first character, I use the first character as context for coding the match length.

The "if ( match_length > 1 )" is actually optional. You could go ahead and run that block for match_length = 1 as well. It will arithmetic_encode a symbol whose probability is 1; that is a cumprob which is equal to the full range. This should be a NOP on your arithmetic encoder. Whether that's a good idea or not depends on implementation.

In practice the exclude_set is 256 bit flags = 32 bytes.

LZSA-HC must use greedy parsing (always send the longest possible match) because of full exclusion. There's no such thing as lazy/flexible/optimal parses.


To decode :

we now can't really use the faster backward tree
because we need a lookup structure that will provide

initialize exclude_set = { emtpy }

for all of ptr

decode current character (*ptr) using literal coder,
 doing exclusion from current exclude_set

decode match_length

if ( match_length == 1 )
{
  // just precompute a table of exclude sets for the after-1-character case :
  exclude_set = o1_exclude_set[ *ptr ]
}
else
{
  decode the suffix index
 
  within the range of suffixes of the already-sent first character
    char_suffix_low[ *ptr ] is a table lookup
    char_suffix_count = char_suffix_low[ *ptr + 1 ] - char_suffix_low;

  int suffix_index = arithmetic_fetch( char_suffix_count ) + char_suffix_low;

  at this point {suffix_index, match_length} is our match string

  unsigned char * match_string = suffix_sort[suffix_index];
  copy_match( out_ptr , match_string, match_length );
  out_ptr += match_length;

  we also need the suffix low index & count to remove the arithmetic interval :

  suffix_sort_low_index, suffix_count = get_suffix_count( suffix_index, match_length );
 
  here get_suffix_count must be a real match lookup and should also fill exclude_set

  arithmetic_remove( suffix_sort_low_index, suffix_count , dictionary_size );
}

ptr += match_length

And that is LZSA-HC

02-05-15 - LZSA - Part 3

PPM.

You can of course implement a static PPM easily with a suffix array. (you would not want to do it for dynamic PPM because inserting into a sort is painful; the standard context tree used for PPM is a suffix trie)

Say my string is "abraabracadabra" (Steve Miller style). The suffix sort is :

a
aabracadabra
abra
abraabracadabra
abracadabra
acadabra
adabra
bra
braabracadabra
bracadabra
cadabra
dabra
ra
raabracadabra
racadabra
I want to encode the next character starting with order-3 PPM. My current context is "bra". So I look up "bra..." in the suffix sort. I've seen "a" and "c" before, each with count 1. So I have a total count of 2 and a novel count of 2. I can do the old fashioned PPM escape estimation from that either code a symbol in that context or escape. If I escape, I go to order 2 and look up "ra..." , etc.

Now, what if you did a funny thing with your PPM.

Start at order-0 and encode a character. At order-0 we have all the strings in the suffix array, so we just count the occurance of each first character ( C(a) = 7, C(b)=3, C(c)=1, C(d)=1, C(r)=3 , Total = 15).

Say we encode an "a". So we send 7 in 15. (and a no-escape flag)

On the next character move to order-1. So we reduce to these strings :

a
aabracadabra
abra
abraabracadabra
abracadabra
acadabra
adabra
Within this context we have C(a)=1, C(b)=3, C(c)=1, C(d)=1 (to send an 'r' we would have to escape). Say we sent a "c".

Now we change to order-2. We only have this string at order-2 :

acadabra
So if our next character is "a" we can send that with just a no-escape flag. Otherwise we have to escape out of this context.

What we have now is a "deterministic context" and we can keep increasing order as long as we match from it.

This is LZSA !

(the only difference is that in LZSA the escape is only modeled based on context order, not the contents of the context, which it normally would be in PPM)


To be clear :

When LZSA encodes a reference to the string "abc" it does with an arithmetic encoder an the equivalent probability :


P = probability equivalent as used in arithmetic encoding

P = C("abc") / C_total

C("abc") = C("a") * C("ab")/C("a") * C("abc")/C("ab")

as the counts become large and match the true probabilities :

C("ab")/C("a") => P("b"|"a")   (reads count of "b" given a previous "a")


C("abc") => C("a") * P("b"|"a") * P("c"|"ab")

P encoded => P("a") * P("b"|"a") * P("c"|"ab")

That's order-0 * order-1 * order-2. (and so on for larger match lengths).

The match length can be thought of as unary. So ML=3 is "1110". We read that as "match,match,match, no-match".

In this way we see the match length as a series of escape/no-escape flags.


Another note on PPM.

In real-world modern PPM's you do LOE. (LOE = Local Order Estimation). In the olden days you just chose "we always use order-4 PPM", which meands you start at order-4, and escape down to order-3,2,1. With LOE you look at local information and decide which order to use.

Now, let's say you had some data that was binary and had a repeated pattern that was like :

1 byte is random
3 bytes predictable
1 byte is random
7 bytes predictable
repeated. What orders should you use?
order-0 for random
order-0,1,2
order-0 for random
order-0,1,2,3,4,6
these are the orders that LZ uses!

This is part of why LZ can beat PPM on binaries but not on text, because the funny way that LZ jumps down to order-0 at the start of a match can actually be a good thing on binary.

Now, what if you had skip contexts?

(skip contexts are contexts that use some non-consecutive range of previous characters; eg. something like YYN is a two-symbol context that uses the 2nd & 3rd symbols back, but not the immediately preceding 1 symbol.)

If you have random bytes and skip contexts, then what you want is :

order-0 for random
order-0, Y, YY
order-0 for random
YYYN, YYYNY, YYYNYY, etc..
and this is what a "repeat match" is in LZSA.

Say I matched "abr" in the example above, but then my string is "abrd" , so I get a match of length 3, then a mismatch. I can then continue from the "repeat match" skip-context using YYYN of "abrX" :

YYYN...
abraabracadabra
abracadabra
so there are two strings available matches for a "repeat match". If your next character is not "a" or "c" then you code an "escape" and drop the YYYN context which is the same as saying you code a flag to select a normal match and not a repeat match.


If we like, we could make LZSA more of a true PPM.

In LZSA-Basic we encoded the match length and then the suffix index to select the match.

Instead, you could code the suffix index first. The decoder can then get the match string :


  int suffix_index = arithmetic_fetch( dictionary_size );

  at this point {suffix_index, match_length} is our match string

  unsigned char * match_string = suffix_sort[suffix_index];

but we still don't know the match length.

You can then encode "match length" unary style, a sequence of binary "more length" flags (these are PPM escape flags). Because we already know match_string, we can use the characters matched so far in our match flag coding. We could use them as contexts for the match flag. Or we could do a true PPM and use them to look up a context node and get total & novel counts to do PPM-style escape estimation.

If we do that, then LZSA really becomes a true PPM. It's a PPM that does this funny order selection : order-0,order-1,... order-N, then escapes back to order-0.

It has a neat advantage over traditional PPM - we are encoding the character selection in one single arithmetic coding operation, instead of one at each context level.

2/04/2015

02-04-15 - LZSA - Part 2

Algorithm LZSA-Basic :


LZSA-Basic encoder :

given a dictionary
form the suffix sort
make a match lookup structure on the suffix sort (suffix trie for example)

when you look up a string in the match structure
you are given the lowest index of that substring in the suffix sort
and also the number of entries that match that prefix

for example in a suffix trie
each leaf corresponds to an entry in the suffix sort
each node stores the lowest leaf index under it, and the number of leaves


To encode :

look up the current string in the match lookup structure
if match length < MML 
{
  flag literal & send literal
}
else
{
  flag not literal
  send match length

  send the suffix substring matched :
  simply one arithmetic encode call
  (dictionary size usually a power of 2 for more speed)

  arithmetic_encode( suffix_sort_low_index, suffix_count , dictionary_size );
}

Lazy parsing and other standard LZ things are optional.

Minimum Match Length , MML >= 2 as written. However, you could also set MML=1 and dispense with the match flag entirely. Then literals are written as a match of length 1, (and you must ensure every character occurs at least once in the dictionary). This is identical to order-0 coding of the literals, because the suffix ranges for matches of length 1 are just the order-0 counts! In practice it's better to code literal separately because it lets you do a custom literal coder (using order-1 context, or match history context, or whatever).


LZSA-Basic decoder :

decoder requires the suffix sort
it also requires the suffix count for the given match length
(see later)


To decode :

get match flag
if not match
{
  decode literal
}
else
{
  decode match length

  get the suffix index :

  int suffix_index = arithmetic_fetch( dictionary_size );

  at this point {suffix_index, match_length} is our match string

  unsigned char * match_string = suffix_sort[suffix_index];
  copy_match( out_ptr , match_string, match_length );
  out_ptr += match_length;

  we also need the suffix low index & count to remove the arithmetic interval :

  suffix_sort_low_index, suffix_count = get_suffix_count( suffix_index, match_length );

  this must be the same interval that was encoded :
  (suffix_index is somewhere in that range)
  (note that all suffix_index values in that range provide the same match string over match_length)

  arithmetic_remove( suffix_sort_low_index, suffix_count , dictionary_size );
}

easy peasy, and very fast. Decoding is just as fast as normal LZ77, except for one piece : get_suffix_count.

To implement get_suffix_count we need something like the suffix trie that was used in the encoder. But we can do something a bit more compact and efficient. Rather than a forward tree, we can use a backward only tree, because we have a leaf index to jump into, and we only need to go up to parents to find the right node.


get_suffix_count :


struct backward_suffix_node
{
  int parent; // node index
  int depth;
  int low,count; // suffix sort range
};

unsigned char * suffix_sort[ dictionary_size ];
int suffix_leaf_parent[ dictionary_size ];
backward_suffix_node suffix_nodes[ dictionary_size ];


suffix_sort_low_index, suffix_count = get_suffix_count( suffix_index, match_length )
{
    int node = -1;
    int parent = suffix_leaf_parent[ suffix_index ];
    while( match_length <= suffix_nodes[ parent ] )
    {
        node = parent;
        parent = suffix_nodes[ node ].parent;
    }

    if ( node == -1 )
        return suffix_index, 1;
    else
        return suffix_nodes[ node ].low , suffix_nodes[ node ].count;
}

the logic here is just slightly fiddly due to path compression. With path compression, match_length can be between the depth of two nodes, and when that happens you want to stay at the child node. The leaf nodes are implicit, and the number of internal nodes is always <= the number of leaves.

You could of course also accelerate the suffix_count lookup for low match lengths, at ML=3 or 4 for example by just having a direct array lookup for that case.

In theory walking backward like this has a bad O(N^2) possible running time (if the tree is deep, but you're only getting short matches in it). Conversely, walking forward up the tree ensures that decode time is O(N), because the trie walk is proportional to match length. In practice the backward walk is always significantly faster. (a single forward trie step can involve lots of linked list steps and jumping around in memory; the backward trie is much more compact and easier to walk without conditionals; you have to have a very deep average tree depth for it to be worse). If this was actually an issue, you could augment the backwards trie with a Fenwick/skip-list style larger parent step in a binary pattern (some halfway-to-root steps, some quarter-way-to-root steps, etc.). But it just isn't an issue.

02-04-15 - LZSA - Part 1

I'm going to introduce what I believe is a novel (*) compression algorithm. I'm calling it "LZSA" for "LZ Suffix Array" , though ryg rightly points out it's not really an LZ.

(* = if you're actually a scientist and not a cock-munching "entrepreneur" you should know that nothing is ever novel. This could be considered a simplification of ACB)

(Note to self : first emailed 07/27/2014 as "Internal Compression Blog")

I'm going to write a mini series about this.

Here's some previous posts that are related :

cbloom rants 09-27-08 - 2
cbloom rants 09-03-10 - LZ and Exclusions
cbloom rants 08-27-14 - LZ Match Length Redundancy
cbloom rants 07-14-14 - Suffix-Trie Coded LZ

So let's dive in.

Part 1 : motivation and background

I was working on compression from static dictionaries. The problem with traditional LZ77 on static dictionaries is that to get good compression you want a large dictionary, but then the offsets require more bits as well. In a normal dynamic scan dictionary, you have very strong offset modeling (they tend to be small, as well as binary patterns). In particular, short common strings will occur at low offset and thus not require many bits. But in a static dictionary all references take the same large number of bits, even if the match is short and the substring matched is very common. (*)

(* = obviously you could sort the substrings by frequency to try to create an optimal static dictionary that has strongly biased offsets; but then you also have to be aware of how adjacent substrings form larger strings (eg. "ab" next to "cd" also adds "abcd"), and have to make that whole grammar sorted, and that seems like a monstrous hard problem)

The problem is that common substrings occur all over the static dictionary (eg. in an english text dictionary "the" occurs in thousands of places), but in LZ77 you have to code an offset to one specific occurance of that substring. In effect you are wasting log2(N) bits, where N is the count of that substring.

In fact, the solution is very easy conceptually. Just take the static dictionary and do a suffix sort on it. Now all occurances of "the" are consecutive in the suffix sort.

Say our dictionary is "banana" , then the strings we can match are :

banana
anana
nana
ana
na
a
to code "ana" we could send index 1 or 3, they both decode as "ana" at length 3.

After suffix sort our strings are :

a
ana
anana
banana
na
nana
And to send "ana" we send index 1 or 2.

So now we need to send an integer, and we need it to be in a range, but we don't need to specify it exactly.

That is, we want to send an integer in the range {suffix_lo,suffix_hi} but we don't care what it is exactly in that range (because they all decode to the same string), and we don't want to waste bits unnecessarily specifying what it is in that region.

That's exactly what an arithmetic encoder does! We just need the low and high index of our substring in the suffix array, and we send that as an arithmetic encoder.

It's exactly like a cumulative frequency table. The arithmetic encoder is gauranteed to send an integer that is somewhere in the range we need. We don't know which exact integer the decoder will see; it won't be determined until we do some more arithmetic encodings and the range is reduced further.

We're just treating the # of strings in the dictionary as the cumulative probability total. Then the low & high suffix index that contains our substring are the probabilities that we use to encode a "symbol" in the arithmetic coder. By coding that range we have specified a substring, and we save log2(substring_count) bits of unnecessary information.

Next post I'll describe the algorithm more precisely and then I'll talk about it.

2/01/2015

02-01-15 - Fucking Fuck the Fucking Web

ADDED PREFACE : This is a pretty ranty rant. I write these on a semi-regular basis, and pretty much always delete them before posting them these days. I thought I would leave this one for old time rants' sake.

So I was forced to upgrade my Firefox from 15 to 35. AAAARRGGGGH.

Of course first thing is "Classic Theme Restorer" to make it look reasonable again. (christ fucking rounded shit). Ok, waste a bunch of time as usual learning new settings and updating. Whatever, fucking awful life as usual, no biggie.

But it just SUCKS it sucks it sucks. I want fucking FF 15 back.

It's so much *less* responsive and less asynchronous than before despite the big rewrite to make it async. Loading a page now, the elements keeping popping in and re-flowing and shit moves all over. It takes way longer for the focus text box to become active. I have to either manually click in it or wait forever for most of the page to load for it to activate. So bad. Even typing in the address bar to go to a new page (while a page is still loading) is all hitchy now, that is so fucking retarded and borked, your basic GUI elements need to always be independent of the page loading.

The HTML5 video player is just so not ready for primetime. Fucking buffering doesn't work right, it hitches and stalls out and then won't restart. Audio cuts out; you can't seek; etc. It's just so frustrating I want to smash my goddamn computer. I was hoping to run without Flash now, but it's just not ready. You can't fucking ship it like that.

Every so often I log in from weird places and Google decides that's "suspicious activity". So now I have to pick a new password.

If my password isn't good enough, the entry form just gets wiped, and I have to enter my old password again, and the new one twice. Nope, not good enough, wipe the form.

God damn you. I don't want to enter a complicated password because I have to type the damn thing in on my fucking PHONE with fucking nightmare starred-out stupid fucking touch keyboard, where things like going caps and lower case takes forever. I want my password to be "hello" or some shit. Nope, blank the form.

This is really fucking important Google. Because the big security leak is hackers just guessing passwords. It's not when fucking Target or Visa or someone just loses an entire mainframe full of personal account info.

Actually the biggest leak is when a major retailer literally GIVES AWAY everyone's personal account info. Because most of the big hacks are "human engineering" which just means the hackers called someone up and that person gave them access to their back end.

So yeah, really fucking helpful that you're making me put punctuation in my password and starring it out so that noone in the room can get it. Thanks a lot. That's going to help when Home Depot loses millions of credit card numbers.

Oh yeah, and when I get suspicious activity, Google locks my account and wants to send me an SMS.

My phone number is a FUCKING GOOGLE VOICE NUMBER. How am I supposed to get an SMS? Oh, I'll send it to my actual hardware phone number (which I never use and don't even know, but I have it written down somewhere.) okay, let' try that... MY PHONE GETS SMS WITH FUCKING GOOGLE HANGOUTS. And it won't let me log in and get my SMS because Google has flagged my account for suspicious activity. Amazing.

(and when Google Hangouts can't login with my primary Google account it automatically switches to my RAD account without asking. No retry / enter a new password prompt or anything. Awesome.)

Oh my god, I hate the new fucking google maps. Every time I want a map now the process is :

1. Huh, WTF, why is it loading so slow ?
2. Oh my god the fucking nav buttons still haven't finished popping in?
3. Aw crap, it's the new maps I forgot about this nightmare
4. How do I go back to the old maps? Is it settings? Nope
5. "Help" button.
6. Select the reason why I want old maps : "I want to punch you in the nuts".
7. Return to Classic Google Maps

Fucking christ. If it's slower it's worse. Don't fucking ship it. My god it's so fucking slow.

And how is the fucking most basic search interaction of maps & normal search still so fucking garbage. Basic shit like if I do a normal search and I get various search results, I can't just click one and go "show on maps". If I'm in maps, I can't just take my current search text and switch back to a web search. If I'm in maps trying to get directions, and I put in a description search instead of a specific address, it kicks me out of directions mode when I choose the address. WTF WTF. This is what you fucking do and you're so garbage at it. Everyone is fired.

I was going to write "I don't understand how companies look at these things and think they're okay" but actually I know exactly how it happens.

The fundamental way that tech is developed is all wrong.

There's this very weird non-results-oriented thing that can happen with developers. You sort of go into this rabbit hole where you have a blind spot for what the actual user experience is.

In any product development cycle, it starts out with reasonably good intentions. You want to solve a problem and make things good for the user. But then you translate that into todos which are technical, like "rewrite the back end in such a way". The weird thing that then happens is the developers focus on the technical todos, and they are happy if they accomplish them, even if the end result is actually worse for the user.

It's like :

1. We want to make it good for the user
2. To do that we have to figure out some tech todos
3. So we decide to do A, B, C
4. We go away and put our heads down and crank on A, B, & C
5. Great success!  We have accomplished A !  Pat ourselves on the back
6. We have totally forgotten about #1.

(though realistically, even at the start there are usually lots of "todos" on the list that have nothing to do with improving user experience. There will be things like "make it IEEE PPTP RFC 40187 compatible" which you feel is really important but in fact for the user means almost nothing. There will also be lots of pet todos from the developers like "make it fine-grain threaded" which again, if they actually result in a user experience that's more snappy, great, but if not then it's just tech masturbation).

As far as how the horrific moronic GUIs keep getting green lit, I don't really know. I can only guess. I guess that there are probably GUI mockup meetings where things are approved based on still screen shots. I guess that managers who are in charge of these things have no fucking clue about the most basic principles of good GUI design, like "buttons should obviously be buttons", "focus should never move without the user doing it", "GUI elements should not trickle in or change unexpectedly", "buttons should always be in the same place", "delays in rendering should not change the meaning of UI input", etc. etc.

And there's a general lack of testing. You have to get people in and watch them use your software. People who aren't familiar with it. Don't give them any instructions. Any time they need the help or get frustrated, you have failed. It can't be a last minute thing when you aren't willing to make drastic changes. It has to be early and often and you have to have many months left to make changes before shipping. I cannot believe that these companies actually get anyone to try their product (or web page), and see how fucking shitty the user experience is, and actually respond to it.

(Nobody who ever uses Outlook.com comes away doing anything but screaming their head off)

Fucking Amazon sucks as a UI. Fucking ebay sucks. Fucking facebook sucks. Not like sucks a little bit, like they're just awful. Our fucking most valuable, biggest most important software products are just fucking garbage. Why is it so fucking hard for me to spend my money on the internet? It's ridiculous.

Part of the problem is that by the time you have something to show, the team is usually so weary and calcified, that they don't really want to listen to any feedback. They'll make lots of excuses to not change things.

Managers are hesitant to ever admit a mistake or cut a huge feature or redo work, because it reflects badly on them. That's terrible. It means that broken awful shit gets shipped just because they politically have to keep insisting "we did a great job". In a better world, a manager would be able to say "actually our GUI rewrite for VC2010 is a total fuckup and we're going to just bin the entire project and go back to VC6". Unfortunately in corporations, like real politics, it serves you better to insist that you totally believe in all your choices still even when they have obviously turned out to be huge fuckups. Kind of mind boggling that people like it better when their politician is walking down the street naked saying "look at my fabulous choice of clothes" than if they would just admit that they got duped.

I think an interesting idea for development would be to have a totally separate "finishing team". You get all new developers and managers on at the end, so that they aren't wedded to any of the ideas, they aren't building fences around their work and unwilling to change anything, they aren't prideful and unable to admit that it sucks. The finishing team is only concerned with the *results*, what the actual user experience is, and they can mercilessly cut up the product to make it better. Politically the finishers have no stake in keeping work, so they can even revert huge chunks or just cut features.

We've had great results at RAD with ryg as a sort of ruthless finisher. Developers very regularly get some crazy ideas in their head about why they absolutely must do something in a crazy over-complicated way, and they cling to some silly reason why they can't just do it the simple obvious way. It takes a 3rd party coming in to rip up the code and take out the stupidity. (This used to be sort of my specialty).

(sort of like how Hollywood brings in separate writers to finish screenplays; the analogy's not great though because in writing there's a lot to be said for the flawed artistry of a single voice, but in web apps not so much)

I certainly have been guilty of this sin myself many times. It's just very fundamental to the developer personality. We all get Balkan and prideful and defensive and calcified.

One example that I regret is the camera on Stranger's Wrath. I wrote the camera; I had this vision of things that I wanted to do, partly based on mistakes I'd seen in the past. I wanted it to be very robust, physical and analog feeling (not digital), to not get stuck or ever jump ahead or go through objects, framerate independent, very solid. And for the most part I accomplished those and thought that I was very clever and had done an amazing thing.

But people didn't like it. Almost right away I started getting feedback from within the company that was not glowing; why can't it be more like this other game, can it move faster, can it do this or that. Later as we started getting outside play testing it was a common complaint. And basically I did nothing about it. We did some minor tweaking to speed it up a bit based on feedback, but I didn't have the "come to jesus" moment of hey this doesn't actually work as something that makes users happy. It may have hit all the todos that I set out for it originally, but it didn't accomplish the only goal that really matters which is to make users go "this feels great".

One of the stupid excuses that I made with the camera, which is a very common one, was : "people are just used to other cameras (Mario 64, Jak & Daxter and so on), so they're expecting that and just think it feels bad because it's different. Once they get used to it they'll like it."

NO!

Even if that's true, you're wrong. The fact that people are used to some other thing is a fact of the world and you have to work within that. They're used to GUIs in a certain style. They're used to the controls in their car being in a certain place. Maybe you have thought of a better way to do it, but if that feels bad to people in the real world because they're used to the old way, YOU'RE WRONG. You have to make it good within the context of actual users. (that's not to say you can't ever change any GUI, but the changes should feel good to someone who knows the old way; and of course if it's something like a car where the user will switch between various vehicles, then fuck you just make it standard and don't change things that don't actually make anything better)

Whoah. Enough ranting.

old rants