10-31-11 - What are these pipes -

So there's this low area along the path by my house that gathers water. I went out to fix it and digging around I found this colony of pipes :

The two small ones are two inch diameter, the big one is four inch; they're black PVC with hammered in caps (that I can easily pull out by hand). I dug down another foot or so and from what I could see they just continue straight down.

Pretty much all my utilities flow past that spot so they could be related to almost anything. In particular the sewer goes past there so I think they might be an outside sewer access. Kind of wierd that's there three of them though instead of just one.

I don't think that we have any kind of french drain system, though it would make sense to have a low point there with a french drain drawing the water off.

Anyhoo, kind of curious what they are before I cover up that spot with a bunch of rock and dirt to raise it up. Photos backing up :

10-31-11 - Small poker note

This year's WSOP final table has perhaps the best tournament poker players ever at a WSOP final table. Maybe at any major live tournament (?). I don't really follow tournaments much, but there's only one fish at the table, and he's not even a huge fish, he's just a "solid" old player, and everyone else is an internet kid, which means that they actually know what things like "fold equity" and "tournament chip EV vs. real dollar EV" is.

Martin Staszko (40.1 million in chips)
Eoghan O'Dea (33.9 million)
Matt Giannetti (24.7 million)
Phil Collins (23.8 million)
Ben Lamb (20.8 million)
Badih Bounahra (19.7 million)
Pius Heinz (16.4 million)
Anton Makiievskyi (13.8 million)
Samuel Holden (12.3 million)

I'm sure it will be horrible TV ; for one thing, ESPN will just show the all-ins which is horrible boring poker broadcasting. But beyond that, the young internet players are just SOoooo boring to watch. God, cash in some of your poker winnings and buy a personality, please. You may as well point a camera at me while I'm coding.

It's too bad that Daniel Negreanu doesn't have enough humility to buy some lessons from JMan or someone good, because it's so much more entertaining to see someone who actually interacts at the table. And his live metagame skills are very good, so if he would just play better technically he could do well.

10-31-11 - I hate the web

Google search no longer supports +. The Google response does a great job of making it worse by responding in super-douche corporate bullshit speak, like "actually you didn't want that feature, we know better than you, it's better without it" and also the power-douche "we hear your concerns and are glad for your feedback but fuck you we're going to ignore you".

Google has done some similarly epically retarded shit with Google+ . Like, hey dumb asses, if businesses want to be on Google+ and you don't have the business accounts features done yet, why don't you just let them keep using the normal Google+ and migrate them over when you have the business features done? Oh no, let's kick everyone off and cause a big shit storm because we know the "right way" and you will thank us because it's "worth the wait". Epically retarded.

Anyway, Google Reader got a new look and it seems to be neither good nor bad, but it PISSES ME OFF.

I fucking hate it when shit that I use as a basic part of my day changes under me randomly. It's like if somebody semi-randomly periodically came into your house and swapped out your clothes or your appliances. Sometimes they fuck up some feature that you really liked. But most of all it's just distracting and annoying and ruins your familiarity with a tool.

Web software in general sucks because of this. Even ignoring all the bullshit about how slow it is, or the fact that you need a live connection, or the fact that you can't download web pages properly, etc. it sucks because people are constantly changing it out under your feet.

10-31-11 - Photos , Mostly Enchantments

Colchuk lake is a beautiful turqoise :

I'm in love with this rock face. It towers over Colchuk and feels like a real living being, it has such presence, and you're not sure if it's protective or menacing; The super-difficult barely-a-trail up to the enchantments is on the left :

Sun shining through larches :

The actual enchantments area is a weird top-of-the-world wasteland :

This is from a hike to Snow Lake earlier in the year; I noticed these vortices sheeting off a rock in the river; the river had perfect laminar flow and the rock edge disturbance was shedding this regular "street" of round vortices that then acted as lenses. The lenses were so perfect they were creating caustic rings of light focused and defocused on the river bed. You can see some of the lenses in the lower left of the photo and the light through them hits the creek bed near the top of the photo.

This guy visited our back yard a few days ago. Pearlescent feathers. No idea what he is, but he seemed real tame like he was probably a pet.

This is the view from my new home office (it's downtown Seattle in the distance). We're in a slight microclimate where we get fractionally less wetness (it's not like a real San Francisco microclimate that's dramatically different; I think we get 90-95% of the wetness); the storms hit us later and leave us sooner, so I get to watch them roll in to Seattle, and I get to watch them clear up. The result is a lot of rainbows. I took a photo of the first one I saw. (BTW whoever has that house with the red roof, I thank you, it really adds some spice to my view).


10-27-11 - Tiny LZ Decoder

I get 17 bytes for the core loop (not including putting the array pointers in registers because presumably they already are there if you care about size) .

My x86 is rusty but certainly the trick to being small is to use the ancient 1 byte instructions, which conveniently the string instructions are. For example you might be tempted to read out length & offset like this :

        mov cl,[esi]    // len
        mov al,[esi+1]  // offset
        add esi,2

but it's smaller to do

        lodsb  // len
        mov cl,al
        lodsb  // offset

because it keeps you in 1 byte instructions. (and of course any cleverness with lea is right out). (naively just using lodsw and then you have len and offset in al and ah is even better, but in practice I can't make that smaller)

Anyhoo, here it is. I'm sure someone cleverer with free time could do better.

__declspec(noinline) void TinyLZDecoder(char * to,char * fm,char * to_end)
        mov esi,fm
        mov edi,to
        mov edx,to_end
        xor eax,eax
        xor ecx,ecx
        movsb   // literal
        lodsb   // len
        mov cl,al
        lodsb   // offset
        push esi
        mov esi,edi
        sub esi,eax
        rep movsb   // match
        pop esi
        cmp edi,edx
        jne more



        movsb   // literal
00401012  movs        byte ptr es:[edi],byte ptr [esi] 
        lodsb   // len
00401013  lods        byte ptr [esi] 
        mov cl,al
00401014  mov         cl,al 
        lodsb   // offset
00401016  lods        byte ptr [esi] 
        push esi
00401017  push        esi  
        mov esi,edi
00401018  mov         esi,edi 
        sub esi,eax
0040101A  sub         esi,eax 
        rep movsb   // match
0040101C  rep movs    byte ptr es:[edi],byte ptr [esi] 
        pop esi
0040101E  pop         esi  
        cmp edi,edx
0040101F  cmp         edi,edx 
        jne more
00401021  jne         more (401012h) 

Also obviously you would get much better compression with a literal run length instead of a single literal every time, and it only costs a few more bytes of instructions. You would get even better compression if the run len could be either a match len or a literal run len and that's just another few bytes. (ADDENDUM : see end)

A small "Predictor/Finnish" is something like this :

ByteLoop:   lodsb   // al = *esi++ // control byte
            mov edx, 0x100
            mov dl, al
            shr edx, 1
            jz  ByteLoop
            jnc zerobit
            mov [ebx], al
zerobit:    mov al, [ebx]
            mov bh, bl
            mov bl, al
            stosb  // *edi++ = al
            jmp BitLoop

the fast version of Finnish of course copies the bit loop 8 times to avoid looping but you can't do that if you want to be small.

I'm quite sure this could be smaller using some clever {adc esi} or such. Also the sentry bit looping is taking a lot of instructions and so on.

Note the crucial trick of "Finnish" is that the hash table must be 64k in size and 64k aligned, so you can do the hash update and the table address computaton just by cycling the bottom two bytes of ebx. (I use the name "Predictor" for the generic idea of the single bit prediction/literal style compressor; "Finnish" is the specific variant of Predictor that uses this bx trick).

(note that this is not remotely the fast way to write these on modern CPU's)

ADDENDUM : Better (in the sense of more compression) LZ decoder in 22 bytes (core loop only) :

__declspec(noinline) void TinyLZDecoder(char * to,char * fm,char * to_end)
        mov esi,fm
        mov edi,to
        mov edx,to_end
        xor eax,eax
        xor ecx,ecx
        mov cl,[esi]    // len
        inc esi
        shr cl,1        // bottom bit is flag
        jc literals
        lodsb   // offset -> al
        push esi
        mov esi,edi
        sub esi,eax
        rep movsb   // copy match
        pop esi
        // ecx is zero, just drop through
        rep movsb  // copy literal run
        cmp edi,edx
        jne more


Note that obviously if your data is bigger than 256 bytes you can use a two byte match offset by doing lodsw instead of lodsb.

x86 of course is a special case that's particularly amenable to small LZ decoders; it's not really fair, all other instruction sets will be much larger. (which is sort of ironic because RISC is supposed to make instruction encoding smaller) (addendum : not actually true, that last bit)

10-27-11 - Metrics

The best thing you can ever have in software development is a good metric that you are trying to optimize for. A repeatable test case that produces a score and all you have to do is maximize that score.

However, having a performance metric that doesn't exactly match what you want to optimize can be very harmful. You have to be very careful about how you set your metric and over-training for it.

If you make a speed test where you run the same bit of code over and over a thousand times, you wind up creating code that overlaps well with itself and runs fast when it's hot in cache, and maybe code that factors out to a precompute then repeat - not necessarily things that you wanted.

If you set a compression challenge based on the Calgary Corpus, you wanted to get just great compressors, but instead you get compressors specifically tuned for those files (mainly english text).

An example that has misled many people is automated financial trading software. It might seem that that is the ideal case - you get a score, how much money it makes - and you just optimize it to make more money and let it run. But that's not right, because there are other factors, such as risk. If you just train the software to maximize EV it can wind up learning to do very strange things (like massive leverage circular arbitrage trades that require huge leverage to squeeze tiny margins; this is part of what killed LTCM for example).

The only time you can really go nuts optimizing for a metric is when the metric is the real final target of your application. Otherwise, you have to be really careful and take the metric with a grain of salt; you optimize for the metric, but also keep an eye on your code and use your human judgement to decide if the changes your making are good general changes or are just over-specific to the metric.

Anyone in software should know this.

Which is what makes it particularly disturbing that the Gates Foundation supports moronic metric-based education.

When you set simple performance metrics for big bureaucracies, you don't make things better. You make the bureaucracies better at optimizing those metrics. And since they have limited resources and limited amounts of time and energy, that typically makes everything else worse.

Granted, Gates is not so moronic as to advocate "teaching to the test", but even a more complicated cocktail of metrics (which they have yet to define, instead pouring money into metrics research) will not be any different. If you're going to pay and hire and fire people based on metrics you create a horrible situation where any creative thought is punished.

(I think Gates' opposition to small class sizes reflects an over-attention to test results (which have been shown to not correlate strongly to class size) and a lack of common sense about what actually happens in a class room)

The irony is that it's just like the way that horrible teachers grade their students. It's like those essay questions on AP exams where they don't actually read your essay and appreciate what you're saying at all, the grader just looks for the key words that you're supposed to have used if your answer is correct, so you could actually write something that doesn't make sense at all, as long as it has the correct "thesis/evidence/summary" structure you get full points.

It gets me personally hot and bothered. My experience of American public schools was that they were generally absurdly soul-crushing in a bureaucratic Kafka-esque way; like you would be tested for your creativity and independent though process, and the way that was done was you had to recite back the specific problem solving steps that you had been told was the method. In that general depressing stupidity, I was lucky enough to have a few teachers that really touched me and helped me through life because they just engaged me as a human being and were flexible about how I was allowed to go about things. In terms of objective evaluations I'm sure many of those special teachers would have done very poorly. They often spent entire class sessions rambling on about their personal lives or about politics - and that was great! That was so much more valuable to a child than taking turns reading out of the textbook or following the official lesson plan.


10-26-11 - Tons

In the US, a "ton" = 2000 pounds.

In the UK a "ton" is 2240 pounds (which comes from twenty "hundredweights" where a "hundredweight" is eight stone, and a stone is 14 pounds, WTF Britain).

A "metric ton" is obviously 1000 kg. In the UK this is officially called a "tonne" which you will see in technical documents, but I don't see that used much in casual writing, and it's certainly confusing when spoken since it sounds the same. (but a UK ton is very close to a metric ton (2204.6 pounds) so the mixup here surely happens all the time and is not a huge problem).

(when you hear someone in the UK phonetically say "ton" do they mean "tonne" or imperial ton?)

To differentiate the US ton vs UK ton they can be called "short ton" or "long ton".

On a related note, a pint is not a pound *anywhere* in the world.

In the UK, 1 oz by volume of water = 1 oz of weight. But a "pint" in the UK is 20 oz. So a pint is 1.25 pounds (a gallon is exactly 10 pounds)

In the US, 1 oz by volume of water = 1.041 oz of weight, so a pint = 1.041 pounds. (and a gallon = 8.33 pounds).

(neither liquid ounce is anything neat in terms of volume; the only nice whole number unit is the US gallon which is 231 cubic inches)

(the weight measures are the same in the US and UK, it's the US volume measure which went weird (1.041), and I believe it was done in order to make the gallon an integer number of cubic inches)

If you want to get technical, a (US) "pint's a pound" at some high temperature. (...some digging...) actually it's very close just before boiling. It looks like 98 C water is almost exactly a pound per pint (US).

Actually there is a sort of cute book-end of the ranges of water density there :

Very close to freezing (4 C) water is 1 g/ml , and very close to boiling (98 C) it's a pound per (US) pint. The difference is a factor of about 0.96.

10-26-11 - The Eight Month Cruise

If you're buying a home in Seattle, you should always try to do so in early spring. This will give you a few months of wetness to see any problems, and then you'll have the whole summer to deal with them before the wet sets in again. (it also lets you see the homes during rains, which lets you look for water incursions while you shop). I tried to time it that way, but the home shopping took too long and I didn't wind up buying until late summer.

The problem with Seattle is that once it starts raining (around Oct 1 pretty reliably) it literally does not stop for the next 8 months. Sure maybe it stops for a day or two, but never long enough for the whole house to dry out and then give you a big chunk of dry days to do something like replace the roof or paint the exterior.

Fortunately I don't have any problem so large as that, but even for minor things it's damn annoying. For example some time around September I realized that I really need to get a coat of waterproofer on all my decking. Oh well, it's gonna have to wait 8 months. There's a couple of spots I need to touch up exterior paint, but you can't really do a good job of painting without a solid 5 days of dry and decent warmth.

It's almost like our houses are boats, and we go on an 8 month aquatic voyage every year. You really need to use those 4 months as a chance to get your boat up on dry dock, scrub off the moss, dig out the dry rot, apply epoxy, sand and paint, etc.

10-26-11 - Some Things I Find Appalling

The US continues to be the largest provider of arms around the world, including to questionable third world countries and private militias.

The FBI continues to use entrapment techniques on suspected possible terrorists in which they provide a more radical undercover agent who provides the arms and encouragement. (just like the 60's, man)

The FBI continues to spy on non-criminals inside the US.

The US government creates semi-hidden propaganda to sell its policies to US citizens.

US journalists are not allowed to cover our own wars any more. Don't be misled by "embeds" or other government-provided "news" footage.

The executive continues to hide its actions under the cloak of "privilege" or "national security" , way beyond what is remotely reasonable.

Private companies are paid to imprison our citizens. Privitization of prisons is just insane, but of course it's only natural when you have private military forces, which are not only illegal but paid for by our own government. WTF.

We continue to use terrorism as a thin excuse for deporting or imprisoning ("detention") thousands of immigrants.



10-24-11 - We Cut Costs ...

"We cut costs, and pass the savings on to .... us (!?) "

That's the motto of the modern era. Your Jeff Bezoses and Steve Jobs of the world are like a late night "mattress king" but with much less integrity, since the mattress king also slashes prices.

Music and books are digital. The producers save 90% of their costs. No manufaturing. No distribution. No retail space. Where are the savings? In their pocket.

Nobody has a physical location anymore where you can go and get help. Even the horrible call centers are often just automated, or moved to the web. Savings for consumers? Nope.

It struck me when I got an "Orca card" , the local transit pass. I have to load up an e-wallet so they get to hold onto a bunch of my money all the time. All my fares are prepaid and processed in one big transaction. They save massive amounts (vs. collecting physical money to pay for rides). Where are the savings? Not for me of course.

It's gotten so bad that when a new technology comes around and the savings are not passed on to the consumer we don't even blink. They never are.

10-24-11 - LZ Optimal Parse with A Star Part 1

First two notes that aren't about the A-Star parse :

1. All good LZ encoders these days that aren't optimal parsers use complicated heuristics to try to bias the parse towards a cheaper one. The LZ parse is massively redundant (many parses can produce the same original data) and the cost is not the same. But in the forward normal parse you can't say for sure which decision is cheapest, so they just make some guesses.

For example the two key crucial heuristics in LZMA are :

// LZMA lastOffset heuristic :
  if (repLen >= 2 && (
        (repLen + 1 >= mainLen) ||
        (repLen + 2 >= mainLen && mainDist >= (1 << 9)) ||
        (repLen + 3 >= mainLen && mainDist >= (1 << 15))))
    // take repLen instead of mainLen

// LZMA lazy heuristic :

        // Truncate current match if match at next position will be better (LZMA's algorithm)
        if (nextlen >= prevlen && nextdist < prevdist/4 ||
            nextlen == prevlen + 1 && !ChangePair(prevdist, nextdist) ||
            nextlen > prevlen + 1 ||
            nextlen + 1 >= prevlen && prevlen >= MINLEN && ChangePair(nextdist, prevdist))
             return MINLEN-1;
        } else {
             return prevlen;

One is choosing a "repeat match" over a normal match, and the next is choosing a lazy match (literal then match) over greedy (immediate match).

My non-optimal LZ parser has to make similar decisions; what I did was set up a matrix and train it. I made four categories for a match based on what the offset is : {repeat, low, medium, high } , so to decide between two matches I do :

  cat1 = category of match 1 offset
  cat2 = category of match 2 offset
  diff = len1 - len2

  return diff >= c_threshold[cat1][cat2]
so c_threshold is a 4x4 matrix of integers. The values of threshold are in the range [0,3] so it's not too many to just enumerate all possibilities of threshold and see what's best.

Anyway, the thing about these heuristics is that they bias the parse in a certain way. They assume a certain cost tradeoff between literals vs. repeat matches vs. normal matches, or whatever. When the heuristic doesn't match the file they do badly. Otherwise they do amazingly well. One solution might be having several heuristics trained on different files and choose the one that creates the smallest output.

Also I should note - it's not trivial to tell when you have the heuristic wrong for the file. The problem is that there's a feedback loop between the parse heuristic and the statistical coder. That causes you to get into a local minimum, and you might not see that there's a better global minimum which is only available if you make a big jump in parsing style.

Repeating that more explicitly : the statistical coder will adapt to your parse heuristic; say you have a heuristic that prefers low offsets (like the LZMA heuristic above), that will cause your parse to select more low offsets, that will cause your statistical backend to code low offsets smaller. That's good, that's what you want, that's the whole point of the heuristic, it skews the parse intentionally to get the statistics going in the right direction. The issue then is that if you try to evaluate an alternative parse using the statistics that you have, it will look bad, because your statistics are trained for the parse you have.

2. It's crazy that LZ compression can do so well with so much redundancy in the code stream.

Think about it this way. Enumerate all the compressed bit streams that are created by encoding all raw bit streams up to length N.

Start with the shortest compressed bit stream. See what raw bits that decodes to. Now there may be several more (longer) compressed bit streams that decode to that same raw bit stream. Mark those all as unused.

Move to the next shortest compressed bit stream. First if there is a shorter unused one, move it into that slot. Then mark all other output streams that make the same raw bits as unused, and repeat.

For example a file like "aaaaaaa" has one encoding that's {a, offset -1 length 6} ; but there are *tons* of other encodings, such as {a,a,[-1,5]} or {a,[-1,3],[-1,3]} or {a,[-1,2],[-2,4]} etc. etc.

All the encodings except the shortest are useless. We don't want them. But even the ones that are not the best are quite short - they are small encodings, and small encodings take up lots of code space (see Kraft Inequality for example - one bit shorter means you take up twice the code space). So these useless but short encodings are real waste. In particular, there are other raw strings that don't have such a short encoding that would like to have that output length.

Anyhoo, surprisingly (just like with video coding) it seems to be good to add even *more* code stream redundancy by using things like the "repeat matches". I don't think I've ever seen an analysis of just how much wasted code space there is in the LZ output, I'm curious how much there is.

Hmm we didn't actually get to the A Star. Next time.


10-18-11 - To wrap it up or move on

One of the things I've really struggled with in the last few years at RAD is the trade off between wasting time on a side shoot of the main code line vs. really wrapping something up while your focus is on it.

Generally after I've spent a week or two on a topic I start feeling like "okay that's enough time on this I need to move on" ; I guess that's my internal project manager watching my schedule. But on the other hand, there's such a huge advantage to staying on a topic while it fills your mind. You just lose so much momentum if you have to come back to it later.

For example I wish I had finished my JPEG decoder back when I was working on it, that was an important piece of software I believe, but I felt like I needed to move on to more practical tasks, and now it's all out of my head and would take me several weeks to figure out all the nuances again.

Currently I'm working on a new way to do LZ optimal parsing, and I feel like I need to move on because it's not that important to my product and I need to get onto more practical tasks, but at the same time I hate to leave it now because I feel close to a breakthrough and I know that if I stop now I may never come back to it, and if I do it will take forever to get back into the flow.

10-18-11 - StringMatchTest - Hash 1b

For reference :

The good way to do the standard Zip-style Hash -> Linked List for LZ77 parsing.

There are two tables : the hash entry point table, which gives you the head of the linked list, and the link table, which is a circular buffer of ints which contain the next position where that hash occured.

That is :

  hashTable[ hash ]  contains the last (most recent preceding) position that hash occured
  chainTable[ pos & (window_size-1) ]  contains the last position of the hash at pos before pos

To walk the table you do :

  i = hashTable[ hash ];
  while ( i in window )
    i = chainTable[ i & (window_size-1) ]

To update the table you do :

  head = hashTable[ hash ];
  hashTable[hash] = pos;
  chainTable[ pos & (window_size-1) ] = head;

And now for some minor details that are a bit subtle. We're going to go through "Hash1" from StringMatchTest which I know I still haven't posted.

int64 Test_Hash1(MATCH_TEST_ARGS)
    uint8 * buf = (uint8 *)charbuf;
    const int window_mask = window_size-1;
    vector<int> chain; // circular buffer on window_size
    int * pchain = chain.data();
    const int hash_size = MIN(window_size,1<<20);
    const int hash_mask = hash_size-1;
for small files or small windows, you can only get good per-byte speed if you make hash size proportional with that MIN. (what you can't see is outside the Test_ I also made window_size be no bigger than the smallest power of 2 that encloses file size).

    vector<int> hashTable; // points to pos of most recent occurance of this hash
    int * phashTable = hashTable.data();

As noted previously, for large hashes you can get a big win by using a malloc that gives you zeros. I don't do it here for fairness to the other tests. I do make sure that my initialization value is zero so you can switch to VirtualAlloc/calloc.

    int64 total_ml = 0;
    // fiddle the pointers so that position 0 counts as being out of the window
    int pos = window_size+1;
    buf -= pos;
    ASSERT( (char *)&buf[pos] == charbuf );
    size += pos;

I don't want to do two checks in the inner loop for whether a position is a null link vs. out of the window. So I make the "null" value of the linked list (zero) be out of the window.


        // grab 4 bytes (@ could slide here)
        uint32 cur4 = *((uint32 *)&buf[pos]);
        //ASSERT( cur4 == *((uint32 *)&buf[pos]) );

On PC's it's fastest just to grab the unaligned dword. On PowerPC it's faster to slide bytes through the dword. Note that endian-ness changes the value, so you may need to tweak the hash function differently for the two endian cases.

        // hash them 
        uint32 hash = hashfour(cur4) & hash_mask;
        int hashHead =  phashTable[hash];
        int nextHashIndex = hashHead;
        int bestml = 0;
        int windowStart = pos-window_size;
        ASSERT( windowStart >= 0 );
        #ifdef MAX_STEPS
        int steps = 0;

Start the walk. Not interesting.

        while( nextHashIndex >= windowStart )
            uint32 vs4 = *((uint32 *)&buf[nextHashIndex]);
            int hi = nextHashIndex&window_mask;
            if ( vs4 == cur4 )
                int ml = matchlenbetter(&buf[pos],&buf[nextHashIndex],bestml,&buf[size]);
                if ( ml != 0 )
                    ASSERT( ml > bestml );
                    bestml = ml;
                    // meh bleh extra check cuz matchlenbetter can actually go past end
                    if ( pos+bestml >= size )
                        bestml = size - pos;

            #ifdef MAX_STEPS
            if ( ++steps > MAX_STEPS )
            nextHashIndex = pchain[hi];

This is the simple walk of the chain of links. Min match len is 4 here which is particularly fast, but other lens can be done similarly.

"MAX_STEPS" is the optimal "amortize" (walk limit) which hurts compression in an unbounded way but is necessary for speed.

"matchlenbetter" is a little trick ; it's best to check the character at "bestml" first because it is the most likely to differ. If that char doesn't match, we know we can't beat the previous match, so we can stop immediately. After that I check the chars in [4,bestml) to ensure that we really do match >= bestml (the first 4 are already checked) and lastly the characters after, to extend the match.

The remainder just updates the hash and is not interesting :

        ASSERT( bestml == 0 || bestml >= 4 );
        total_ml += bestml;
        // add self :
        //  (this also implicitly removes the node at the end of the sliding window)
        phashTable[hash] = pos;
        int ci = pos & window_mask;
        pchain[ci] = hashHead;
        if ( greedy && bestml > 0 )
            int end = pos+bestml;
            ASSERT( end <= size );
                uint32 cur4 = *((uint32 *)&buf[pos]);
                // hash them 
                uint32 hash = hashfour(cur4) & hash_mask;
                int hashHead =  phashTable[hash];
                phashTable[hash] = pos;
                int ci = pos & window_mask;
                pchain[ci] = hashHead;      
            pos = end;
    return total_ml;

Note that for non-greedy parsing you can help the O(N^2) in some cases by setting bestml to lastml-1. This helps enormously in practice because of the heuristic of matchlenbetter but does not eliminate the O(N^2) in theory. (the reason it helps in practice is because typically when you find a very long match, then the next byte will not have any match longer than lastml-1).

(but hash1 is really just not the right choice for non-greedy parsing; SuffixArray or SuffixTrie are vastly superior)

10-18-11 - Occupy Personal Responsibility

I am happy to see the "occupy" movements ; it's nice just to see people trying to do something about our fucked up politics.

Using "the top 1%" as your scapegoat is very clever, because it's such a narrowly defined group that it can actually get some majority support behind it. Past democratic/populist movements have tried to blame "the rich" or the top 10% , but that never works since 20-30% of people think they are in the top 10% (or will be soon), and then another 20% will oppose you just because you're a democrat, and another 20% will oppose any kind of redistribution, and the result is that you can't get a majority. The "top 1%" is a mysterious group that nobody personally knows, they live in crystal castles and somehow screw us all over.

But that's where I find the whole movement to be rather depressing. It's not actually a new movement towards more realistic, humble governance. It's yet another call for a free lunch.

The real problem with American politics goes back to the voters. Nobody is ever willing to see the big picture and do what's good for the country. Everyone wants their taxes lower. They want their services increased. It's somebody else who can get higher taxes and get a service cut.

And the whole Occupy/1%er movement is the same thing. It's not our fault that the country is so fucked and we don't have jobs. It couldn't be because we act like a herd of buffalo bidding up houses and jumping on stocks just before they tank. It couldn't be because we ran up massive personal debts to buy imported crap. It couldn't be because we chose to get useless educations. It couldn't be because we refuse to raise taxes. It couldn't be because we slash education and infrastructure spending that would help our country develop. No, it's those top 1% ! They're somehow manipulating markets and controlling government and screwing us all over!

(not related to my main point, but another funny bit is that the *actual* poor are completely missing from this movement; it's always the middle class or maybe lower-middle class who are in a bit of a hard spot; if you watch the news about foreclosures, it's always some white people in a suburban 3000 square foot tract home whining about their foreclosure. We have 15-20% of our population in serious fucking big time poverty (the official poverty line is crazy low; $10k a year for one person, $20k for a family of four, a more reasonable definition would make the number even higher). This is not a small group, but they are completely invisible from modern politics, the news, and all these populist movements. That's very intentional, I believe, because the democrats/populists know that the truly poor are politic poison. When you get a bunch of blacks, homeless, immigrants, etc. in your rallies, that's when the Republican opposition calcifies against you. Plus it's just not news; we know we have massive embarassing ghettos where people are barely scraping by, and we don't care and we don't want to hear about it).


10-17-11 - Sensor Dry

Do the laundry manufacturers of the world really believe that "slightly damp" is the correct amount of dryness?

So far as I can tell all the fancy "Energy Star" low-energy dryers just work by NOT FUCKING DRYING. Oh, big woop you saved energy because you just fucking didn't heat up the clothes. I can make a dryer from 1960 use zero watts if I'm allowed to not actually dry the clothes.

It's so disrespectful. It treats me, the user of the device, like a fucking piece of shit. I told you to do something, but who the fuck am I? Just some moron consumer. I'm sure the engineers know what I want better than I do, so they'll detect that the clothes are dry and stop the machine. NO! I didn't fucking tell you to stop it, so don't stop it.

My range hood has a similar thing; a "heat sensor" feature which automatically detects a certain temperature and kicks it on high. FUCK YOU. You will run when I tell you to run, you bitch. If I want to boil a pot of water and leave it on the stove and not run the hood then I fucking get to, it's my god damn house. What a fucking cock sucking feature. And of course none of these things can be turned off.

All modern cars are essentially the same way. You wanted to do an abrupt engine brake to get some off-throttle oversteer? Nope, sorry, that throttle release is low-passed. You want to abruptly jump on the throttle to speed away? Nope, sorry, the ingition timing was advanced to save fuel. You wanted to get the weight unbalanced side to side to initiate a flick? Nope, sorry, the magnetic suspension detected the sway and adjusted to stop it. We (the engineers, the ECU, the manufacturer) know better than you. You don't get the freedom to do what you want with your own tools.

Of course they're right most of the time. People are morons. But we should be allowed to be morons. I hate this shit.

It's an unfortunate truth that any time you get software involved, things become shit.

Part of the problem, I believe, is that software allows products to be designed by the marketers.

When products were designed by engineers and scientists, and had to be pressed out of metal, and some big custom machine had to be made to do the pressing, there was a long turn around time, and they just tried to design it to work as well as it could. Some marketer could come in and say "can we make it automatically turn off after 10 minutes?" and the engineer would say "well, that would require this extra part, and it would take 6 months to change the machines, ..." so it wouldn't get done. You couldn't risk chasing the latest trend because by the time your products got through the cycle the trend would be different.

Software is just too easy to change. And programmers are cheap and easy to replace, and have no personal ethics about the code they're writing anyway.

So some moron Producer/Marketer can call a meeting at the last minute and say "what if we add vending machines that sell Sobe drink powerups?" or "what if we sell hats?" , the engineer says "yeah, umm, I guess we could do that, I don't really think it's a good idea..." but it's not your job to have ideas, just go write the code.

10-17-11 - Crap Products

Am I the only person in the world that's just disgusted by how the moron marketting/design people are ruining all modern cars? This is not the correct amount of glass in a car :

Oh sure, I can sacrifice something minor like seeing *out* of the car for something more important like how "aggressive" it looks (that seems to be the key word for car marketting morons ons at the moment). I guess with high end cars like the LFA it actually is more important to the average buyer how it looks from the outside than how it is to drive, but sadly this stupidity has infected all levels of car design.

Modern bed designers either are intentionally out to bruise a lot of shins or they're just morons. I like to think it's the former. They love the fact that all the pretentious douches who just buy things because they're "stylish" without thinking about whether they're actually good are walking around with constantly bruised shins.

If you google "modern platform bed" almost every single one is a shin minefield.

I picked the particular one because it exhibits another piece of retarded design - the integrated nightstand. It seems sort of like a nice idea at first; some of them have little cubbies or nooks in the headboard where you can stash your book, a lamp, a glass of water. But then I started thinking...

Do you people never have sex? When you do, is it lying almost perfectly still with a sheet between you?

Also on the bed topic - "Eco Leather" ? Seriously? Hmm, let's see... 70% polyurethane. That's plastic my friend. I believe that's "pleather", or "leatherette". Fucking eco leather. May as well call it "iLeather".

(there's some inconsistency about what the term "eco leather" means : the smugstainability junta want "eco leather" to mean recycled leather or some such shit, the ridiculously-obsessive-mom junta want "eco leather" to mean leather processed without chemicals so it's safe for their ever-so-fragile babies, but in fact the furniture manufacturers of the world have just said "nope, fuck you, it means plastic").


10-12-11 - Subaru VIP program

FYI, just found out about this. Buy a Subaru for 2% under invoice with no haggling. WRX for $24,300 for example. Easy to get "VIP" status by joining some charity or other but requires six months of lead time.

Info at legacygt.com and cars101.com

10-12-11 - Some random politico-economics

I believe it's possible to slant recent events such that Greenspan is the central villain. He presided over both the .com bubble and the real estate bubble, fueling both with super low interest rates and his approval (eg. the "new economy" that can grow forever!). He presided over a Fed that did nothing to control the banks it was supposed to be monitoring. Perhaps most damning though is how he steered the Clinton presidency, which led to the creation of this entire modern financial era.

One version of the myth is that Clinton arrived at the white house all liberal and bright eyed, only to find it infested by a den of wolves who forced him into realpolitik compromises. One of those wolves was Greenspan, who met with Clinton and said something along the lines of "if you don't cut federal spending, I will raise interest rates". Which is basically a threat - if you don't do what I like (small government) then I will destroy your presidency by constricting the economy in an already recessed time. I believe the fact of this threat is public record though you may believe Greenspan's version that the reason behind it was "advice" that runaway government debt would force his hand to raise interest rates. In any case, what followed was deregulation, low interest rates, and the inevitable disaster.

Anyhoo, I don't really believe that reading of history, but it makes a good story.

One thing that strikes is all the articles these days that describe the "Great Recession" as a "unique time in American history" or "unprecedented income inequality". Uhh, hello? Did you go to high school? Maybe if you qualified that with the addendum "since WW2". Basically it was completely standard pre-1910 American economics, with the only exception being that our new modern government caught the crash and smoothed it into a recession rather than just letting the bubble pop. If they hadn't caught the crash it would have been a generic "Panic of XXXX" (1857, 1873, 1884, 1890, 1893, 1896, 1907, 1910, 1914, 1920, 1929) many of which were unsurprisingly similar to our current fuckup (massive speculation in a bubbled asset leading to a crash in the value of that asset which leads to a liquidity crisis and bank failures); we could have dropped our monocle in shock as the robber-baron speculators caused a run on the banks.

Ever since the 80's we've been talking about how we need to transform America into a new "information economy" or "service economy". With NAFTA, etc. whenever we lost jobs, the response would be we need new types of jobs, new education etc. I don't recall much effort by anyone to stop and ask - do we want to live in a service economy? I mean, at the moment we are even failing to have a healthy service economy, in the sense that there's lots of unemployment, so all the cries are just to *fix* the service economy. But really, even if it was fixed, if there were call center jobs and retail jobs for everyone, that's a pretty fucking bad world to live in. An "information economy" is inherently bad for almost everyone in it, because "intellectual property" is owned by the very few and is where all the money is.

I believe that raising capital gains taxes to match income taxes would help. Not only is it just inherently fair to tax all income the same way and would eliminate the tax rate dip of the super rich vs the merely rich (a low capital gains tax is a regressive benefit to the super rich, since the poor don't make any income from capital gains), it would greatly reduce capital movement.

I also think that "retirement accounts" and all that shit should go away. It's an unnecessary complexity. Instead just make the capital gains tax go down based on the length held. Maybe reduce by 1% per year held, so after 30 years the tax goes to zero. This further encourages "buy and hold" type investing.

I'm more and more convinced that massive rapid capital movement isn't actually good for anyone but the very top echelon of financiers. It creates panics and crashes in small emerging markets. It creates bubbles, and it sucks profit out of markets. Obviously capital markets are beneficial to get money to companies that need it. All structures that are permitted in society need to justify themselves as being beneficial to the greater good.

I believe that allowing bankruptcy to wipe out student loans would provide a valuable incentive to colleges to keep tuition low, and to keep education useful. Right now it's far too easy to go and get a $200k education in the cultural impact of deconstruction in gay/lesbian underwater basket weaving, and wind up in a position with no job and no hope to ever pay it back. Obviously it's the fault of the student for doing that, but they're children, they shouldn't be expected to make such a huge financial life decision in the summer after high school when someone is handing them a huge blank check. If the colleges had to underwrite the student loans themselves, they would have an inherent interest to provide good useful educations, and to simply refuse to admit students that shouldn't be going to college.

I believe the most basic fundamental thing that we should do to fix the US economy (aside from the obvious things like eliminating off-book derivatives and restoring bank/investment separation and so on) is to make it cheaper to employ Americans.

It's simply too expensive to employ Americans. You can try to prop up our manufacturing sector in various ways, but you have to get to the root cause. It's insane how expensive labor is here, even only semi-skilled labor, vs. buying manufactured products.

I believe the best way to reduce the cost of employees is to get rid of all the additional costs to the employer. Health care, workers' comp, pensions, 401k's, all of this nonsense that employers in America have to administer and pay for. If these were provided by the government, the total cost for them would be the same, but it would be paid by corporate profits whether you hired those workers or not, so you may as well hire them here rather than outsourcing. You would greatly reduce the incremental cost of adding a new employee, which would encourage business to hire. Furthermore, without having to worry about so much hiring and firing paperwork, businesses would be more willing to hire in times of uncertain growth.

Ideally you would get to a scenario of no hiring/firing paperwork at all. This would make it far easier for businesses to find the best employees, and easier for employees to move on to better jobs without fear of being without health care for a while or whatever.

You might note that the heavy social welfare countries don't exactly have lively agile businesses. I believe that is not the fault of the government providing social care, but rather because those countries tend to also have very heavy regulatory structures that make doing business difficult and expensive. My proposal is to make small businesses much much easier to start and run, and much cheaper.

There's this whole modern movement to "save money" (for the government) by pushing the maintenance of social welfare programs onto the businesses (health care, retirement, etc.) But that doesn't actually save money for society, it just makes someone else pay it. The proponents will say they "reduced taxes" but they also reduced your pay check because the business now has to cover those costs. And it's worse than that, because it increases the cost per employee to the business, it makes them prefer to hire fewer people, or outsource some work, or just buy pre-made pieces from overseas.

10-12-11 - Bad profiling

I've seen quite a few posts recently that purport to do some profiling and show you which option is faster. The worst is probably spin locks are faster than mutexes but this kind of cache line study is not awesome either.

(Bouliii's test is not actually a demonstration of cache line sharing at all; it's a demonstration that threading speedup depends on data independence; to demonstrate cache line effects you should have lots of threads atomically incrementing *different* variables that are within the same cache line, not *one* variable; if you did that you would see that cache line sharing causes almost as much slow down as variable sharing)

This kind of profiling is deeply flawed and I believe is in fact worse than nothing.

Timing threading primitives in isolation or on synthetic task sets is not useful.

Obviously if all you measure is the number of clocks to take a lock, then the simplest primitives (like spinlock) will appear the fastest. Many blogs in the last few years have posted ignorant things like "critical section takes 200 clocks and spin lock is only 50 clocks so spin lock is faster". Doof! (bang head). What you care about in real usage is things like fairness, starvation, whether the threads that have CPU time are the ones that are able to get real work done, etc.

So say you're not completely naive and instead you actually cook up some synthetic work for some worker threads to do and you test your primitives that way. Well, that's better. It does tell you what the best primitive is for *that particular work load*. But that might not reflect real work at all. In particular homogenous vs. heterogenous threads (worker threads that are interchangeable and can do any work vs. threads that are dedicated to different things) will behave very differently. What other thread loads are on the system? Are you measuring how well your system releases CPU time to other threads when yours can't do any more? (a fair threading benchmark should always have a low priority thread that's counting the number of excess cycles that are released). (spinning will always seem faster than giving up execution if you are ignoring the ability of the CPU to do other work)

Furthermore the exact sharing structure and how the various workers share cache lines is massively important to the results.

In general, it is very easy for bad threading primitives to look very good in synthetic benchmarks, but to be disastrous in the real world, because they suffer from things like thundering herd or priority inversion or what I call "thread thrashing" (wasted or unnecessary thread context switches).

You may have noticed that when I posted lots of threading primitives a month or two ago there was not one benchmark. But what I did talk about was - can this threading primitive spin the CPU for a long time waiting on a data item? (eg. for an entire time slice potentially) ; can this threading primitive suffer from priority inversion or even "priority deadlock" (when the process can stall until the OS level set manager saves you) ; how many context switches are needed to transfer control, is it just one or are there extra? etc. these are the questions you should ask.

This has been thoroughly discussed over the years w.r.s.t. memory allocators, so we should all know better by now.

Also, as has been discussed thoroughly wrst allocators, it is almost impossible to cook up a thorough benchmark which will tell you the one true answer. The correct way to decide these things is :

1. Understand the issues and what the trade-offs are.

2. Make sure you pick an implementation that does not have some kind of disastrous corner case. For example make sure you allocator has decent O() behavior with no blow-up scenario (or understand the blow-up scenario and know that it can't happen for your application). Particularly with threading I believe that for 99.99% of us (eg. everyone except LRB) it's more important to have primitives that work correctly than to save 10 clocks of micro efficiency.

3. Try to abstract them cleanly so that they are easily swappable and test them in your real final application with your real use case. The only legitimate test is to test in the final use case.

10-12-11 - Post-backpacking Recalibration

1. 65 degree house is fucking blazing hot at night.

2. Of course I'll walk a mile in the rain to get lunch. No big deal.

3. Everything is fucking DELICIOUS! Jar of spaghetti sauce, om nom nom. Beef stew yum.

4. None of the usual hesitation in sitting on the semi-shit-covered office toilet after having had to sit on the very-shit-covered wilderness toilet.


10-11-11 - Comcast

Comcast keeps calling me with some horrible auto-dialer trying to tell me something. Unfortunately., if I just ignore it or screen it, it leaves me an automated message on my voicemail.

I try to ignore it for a while, but constantly having voicemails to go through is damn annoying so I finally call them up.

Enter your damn phone number. Bleep bloop. Urg I hate you. Mash buttons through the menu. Urg.

Me : "Why the fuck are you calling me over and over?"

Comcast : "Sir that's because you have a late a balance due ... "

Me (interrupting) , "Umm, my account is on credit card pay, why the fuck do I have a late balance?"

Comcast : "Sir that's because credit card payments aren't processed for 2-3 months after setting up the account"

Me : "Umm, okay so can I just pay the balance now so they stop pestering me?"

Comcast : "No it will be charged automatically in the next billing cycle"

Me : "So can you stop calling me?"

Comcast : "Unfortunately I can't do that, but I can give you a number to call ..."

Me : (sighs and rolls eyes)

Comcast : "..it's 877-824-2288"

Me : "Umm.. that's the generic comcast customer support number"

Blah. Boring, I know. So frustrating. I have absolutely no recourse, I can't pick a different internet provider, and I can't fight with these people because they just send you around in circles and you never get to talk to anyone with power. You always get some damn call center person who says "I'm sorry I can't do that".


10-06-11 - Fiberglass

Don't put this shit in your house. It's toxic, it's poison, it's the modern day asbestos. There are plenty of alternatives.

Even in the attic or walls, sure it's sealed up most of the time, but any time you have to go in there to work you stir it up, then you get glass shards in the air which get in your eyes and lungs, so you have to wear safety suits and respirators and so on just to do basic maintenace work.

If anything ever goes wrong with it, it's a nightmare to dispose of.

Worst of all is using it to wrap ducts. The problem is that all ducts will leak eventually. Maybe not right away, but in 10 years cracks will form. Then the air return ducts will start sucking in at the seams, and eventually that will be sucking glass fibers in, then blowing them out all over the house.

Just say no to toxic shit in your home.


10-03-11 - Amortized Hashing

The "hash1" simple Zip-style matcher was very fast. So why don't we love it?

The problem is amortized hashing. "Amortized" = we just stop walking after A steps. This limits the worst case. Technically it makes hash1 O(N) (or O(A*N)).

Without it, hash1 has disastrous worst cases, which makes it just not viable. The problem is that the "amortize" can hurt compression quite a bit and unpredictably so.

One perhaps surprising thing is that the value of A (the walk limit) doesn't actually matter that much. It can be 32 or 256 and either way you will save your speed from the cliff. Surpisingly even an A of 1024 on a 128k window helps immensely.

not "amortized" :

 file,   encode,   decode, out size
lzt00,  225.556,   35.223,     4980
lzt01,  143.144,    1.925,   198288
lzt02,  202.040,    3.067,   227799
lzt03,  272.027,    2.164,  1764774
lzt04,  177.060,    5.454,    11218
lzt05,  756.387,    3.940,   383035
lzt06,  227.348,    6.078,   423591
lzt07,  624.699,    4.179,   219707
lzt08,  311.441,    7.329,   261242
lzt09,  302.741,    3.746,   299418
lzt10,  777.609,    1.647,    10981
lzt11, 1073.232,    4.999,    19962
lzt12, 3250.114,    3.134,    25997
lzt13,  101.577,    5.644,   978493
lzt14,  278.619,    6.026,    47540
lzt15, 1379.194,    9.396,    10911
lzt16,  148.908,   12.558,    10053
lzt17,  135.530,    5.693,    18517
lzt18,  171.413,    6.003,    68028
lzt19,  540.656,    3.433,   300354
lzt20,  109.678,    5.488,   900001
lzt21,  155.648,    3.605,   147000
lzt22,  118.776,    6.671,   290907
lzt23,  103.056,    6.350,   822619
lzt24,  218.596,    4.439,  2110882
lzt25,  266.006,    2.498,   123068
lzt26,  118.093,    7.062,   209321
lzt27,  913.469,    4.340,   250911
lzt28,  627.070,    2.576,   322822
lzt29, 1237.463,    4.090,  1757619
lzt30,   75.217,    0.646,   100001

"amortized" to 128 steps :

 file,   encode,   decode, out size
lzt00,  216.417,   30.567,     4978
lzt01,   99.315,    1.620,   198288
lzt02,   85.209,    3.556,   227816
lzt03,   79.299,    2.189,  1767316
lzt04,   90.983,    7.073,    12071
lzt05,   86.225,    4.715,   382841
lzt06,   91.544,    6.930,   423629
lzt07,  127.232,    4.502,   220087
lzt08,  161.590,    7.725,   261366
lzt09,  119.749,    4.696,   301442
lzt10,   55.662,    1.980,    11165
lzt11,  108.619,    6.072,    19978
lzt12,  112.264,    3.119,    26977
lzt13,  103.460,    6.215,   978493
lzt14,   87.520,    5.529,    47558
lzt15,   98.902,    7.568,    10934
lzt16,   90.138,   12.503,    10061
lzt17,  115.166,    6.016,    18550
lzt18,  176.121,    5.402,    68035
lzt19,  272.349,    3.310,   304212
lzt20,  107.739,    5.589,   900016
lzt21,   68.255,    3.568,   147058
lzt22,  108.045,    5.867,   290954
lzt23,  108.023,    6.701,   822619
lzt24,   78.380,    4.631,  2112700
lzt25,   93.013,    2.554,   123219
lzt26,  108.348,    6.143,   209321
lzt27,  103.226,    3.468,   249081
lzt28,  145.280,    2.658,   324569
lzt29,  199.174,    4.063,  1751916
lzt30,   75.093,    1.019,   100001

The times are in clocks per byte. In particular let's look at some files that are really slow without amortize :

no amortize :

 file,   encode,   decode, out size
lzt11, 1073.232,    4.999,    19962
lzt12, 3250.114,    3.134,    25997
lzt15, 1379.194,    9.396,    10911
lzt27,  913.469,    4.340,   250911
lzt29, 1237.463,    4.090,  1757619

amortize 128 :

 file,   encode,   decode, out size
lzt11,  108.619,    6.072,    19978
lzt12,  112.264,    3.119,    26977
lzt15,   98.902,    7.568,    10934
lzt27,  103.226,    3.468,   249081
lzt29,  199.174,    4.063,  1751916

Massive speed differences. The funny thing is the only file where the compression ratio is drastically changes is lzt12. It's changed by around 4%. (lzt29 is a bigger absolute difference but only 0.34%)

So amortized hashing saved us massive time, and only cost us 4% on one file in the test case. Let me summarize the cases. There are three main classes of file :

1. 80% : Not really affected at all by amortize or not. These files don't have lots of degeneracy so there aren't a ton of links in the same hash bucket.

2. 15% : Very slow without amortize, but compression not really affected. These files have some kind of degeneracy (such as long runs of one character) but the most recent occurances of that hash are the good ones, so the amortize doesn't hurt compression.

3. 5% : Has lots of hash collisions, and we do need the long offsets to make good matches. This case is rare.

So obviously it should occur to us that some kind of "introspective" algorithm could be used. Somehow monitor what's happening and adjust your amortize limit (it doesn't need to be a constant) or switch to another algorithm for part of the hash table.

The problem is that we can't tell between classes 2 and 3 without running the slow compressor. That is, it's easy to tell if you have a lot of long hash chains, but you can't tell if they are needed for good compression or not without running the slow mode of the compressor.


10-02-11 - How to walk binary interval tree

I noted previously that SA3 uses a binary interval tree. It's obvious how that works but it always takes me a second to figure it out so let's write it down.

This is going to be all very similar to previous notes on cumulative probability trees (and followup ).

Say you have an array of 256 entries. You build a min tree. Each entry in the tree stores the minimum value over an interval. The top entry covers the range [0,255] , then you have [0,127][128,255] , etc.

If you're at index i and you want to find the next entry which has a value lower than yours. That is,

find j such that A[j] < A[i] and j > i

you just have to find an interval in the min tree whose min is < A[i]. To make the walk fast you want to step by the largest intervals possible. (once you find the interval that must contain your value you do a normal binary search within that interval)

You can't use the interval that overlaps you, because you only want to look at j > i.

It's easy to do this walk elegantly using the fact that the binary representation of integers is a sort of binary interval tree.

Say we start at i = 37 (00100101). We need to walk from 37 to 256. Obviously we want to use the [128,256) range to make a big step. And the [64,128). We can't use the [32,64) because we're inside that range - this corresponds to the top on bit of 37. We can use [48,64) and [40,48) because those bits are off. We can't use [36,40) but we can use [38,40) (and the bottom on bit corresponds to [37,38) which we are in).

Doing it backwards, you start from whatever index (such as i=37). Find the lowest on bit. That is the interval you can step by (in this case 1). So step by 1 to i=38. Stepping by the lowest bit always acts to clear that bit and push the lowest on bit higher up (sometimes more than 1 level). Now find the next lowest on bit. 38 = 00100110 , so step by 2 to 40 = 00101000 , now step by 8 to 48 = 00110000 , now step by 16 to 64 = 01000000. etc.

In pseudo code :

Start at index i

while ( i < end )
  int step = i & (-i); // isolate bottom bit
  // I'm looking at the range [i,i+step]
  int level = BitPos(step);
  check tree[level][i>>level];
  i += step;
Now this should all be pretty obvious, but here comes the juju.

I've written tree[][] as if it is layed out in the intuitive way, that is :

tree[0] has 256 entries for the 1-step ranges
tree[1] has 128 entries for 2-step ranges

The total is 512 entries which is O(N). But notice that tree[0] is only actually ever used for indexes that have the bottom bit on. So the half of them that have the bottom bit off are not needed. Then tree[1] is only used for entries that have the second bit on (but bottom bit off). So the tree[1] entries can actually slot right into the blanks of tree[0], and half the blanks are left over. And so on...

It's a Fenwick tree!

So our revised iteration is :

// searching :
Start at index i
(i starts at 1)

while ( i < end )
  int step = i & (-i); // isolate bottom bit
  // I'm looking at the range [i,i+step]
  check fenwick_tree[i];
  i += step;

// building :

for all i
  int step = i & (-i); // isolate bottom bit
  fenwick_tree[i] = min on range [i,i+step]

(in practice you need to build with a binary recursion; eg. level L is built from two entries of level L-1).

Note that to walk backwards you need the opposite entries. That is, at level 7 (steps of 128) you only need [128,256) to walk forward, never [0,128) because a value in that range can't take that step. To walk backwards, you only need [0,128) , never [128,256). So in fact to walk forward or backward you need all the entries. When we made the "Fenwick compaction" for the forward walk, we threw away half the values - those are exactly the values that need to be in the backward tree.

For the three bit case , range [0,8) , the trees are :

Forward :

|              0-8              |
|       ^       |      4-8      |
|   ^   |  2-4  |   ^   |  6-8  |
| ^ |1-2| ^ |3-4| ^ |5-6| ^ |7-8| 

where ^ means go up a level to find your step
the bottom row is indexed [0,7] and 8 is off the end on the right
so eg if you start at 5 you do 5->6 then 6->8 and done

Backward :

|              8-0              |
|      4-0      |       ^       |
|  2-0  |   ^   |  6-4  |   ^   |
|1-0| ^ |3-2| ^ |5-4| ^ |7-6| ^ | 

the bottom row is indexed [1,8] and 0 is off the end of the left
so eg if you start at 5 you go 5->4 then 4->0 and done

You collapse the indexing Fenwick style on both by pushing the values down the ^ arrows. It should be clear you can take the two tables and lay them on top of each other to get a full set.


10-01-11 - String Match Results Part 6

You knew that couldn't be the end.

SuffixArray3 : suffix array string matcher which uses a min/max tree to find allowed offsets.

The min/max tree is a binary hierarchy ; at level L there are (size>>L) entries, and each entry covers a range of size (1 << L). Construction is O(N) because N/2+N/4+N/8 ... = N

The min/max tree method is generally slightly slower than the elegant "chain of fences" approach used for SuffixArray2, but it's close. The big advantage is the min/max tree can also be used for windowed matching, which is not easy to integrate in SA2.

(ADDENDUM : this is super unclear, see more at end)

First check that it satisfies the O(N) goal on the stress tests :

0 = stress_all_as
1 = stress_many_matches
2 = stress_search_limit
3 = stress_sliding_follow
4 = stress_suffix_forward
5 = twobooks

Yep. Then check optimal parse, large window vs. the good options :

0 = ares_merged.gr2_sec_5.dat
1 = bigship1.x
2 = BOOK1
3 = combined.gr2_sec_3.dat
4 = GrannyRocks_wot.gr2
5 = Gryphon_stripped.gr2
6 = hetero50k
7 = lzt20
8 = lzt21
9 = lzt22
10 = lzt23
11 = lzt24
12 = lzt25
13 = lzt27
14 = lzt28
15 = lzt29
16 = movie_headers.bin
17 = orange_purple.BMP
18 = predsave.bin

The good Suffix Trie is clearly the best, but we're in the ballpark.

Now optimal parse, 17 bit (128k) window :

Test_MMC2 : DNF 
Test_LzFind2 : 506.954418 , 1355.992865 
Test_SuffixArray3 : 506.954418 , 514.931740 
Test_MMC1 : 13.095260 , 1298.507490 
Test_LzFind1 : 12.674177 , 226.796123 
Test_Hash1 : 503.319301 , 1094.570022 

Finally greedy parse, 17 bit window :

Test_MMC2 : 0.663028 , 110.373098 
Test_LzFind2 : DNF 
Test_SuffixArray3 : 0.663036 , 236.896551 
Test_MMC1 : 0.663028 , 222.626069 
Test_LzFind1 : 0.662929 , 216.912409 
Test_Hash1 : 0.662718 , 62.385071 

average match length :

And once more for clarity :

Greedy parse, 16 bit window , just the good candidates :

Test_SuffixArray3 : 0.630772 , 239.280605 
Test_LzFind1 : 0.630688 , 181.430093 
Test_MMC2 : 0.630765 , 88.413339 
Test_Hash1 : 0.630246 , 51.980073 

It should be noted that LzFind1 is approximate, and Hash1 is even more approximate. Though looking at the match length chart you certainly can't see it.

ADDENDUM : an email I wrote trying to explain this better :

First a reminder of how the normal suffix array searcher works :

We're at some file position F
We look up sortIndex[F] to find our sort position S
we know our longest matches must be at sort positions S-1 or S+1
we step to our neighbors

The problem with windowed matching is our neighbors may be way out of the window

So what we ant is a way to step more than 1 away to find the closest neighbor in the sort order that is within our desired window.

So really all we are doing is adding another search structure on top of the suffix array.  It's a search structure to go in either direction from S and find the closest spot to S that has file position in the range [F-window,F-1] 

To do this I just build a tree.  It's indexed by sort position, and its content is file positions.

Level L of the tree contains nodes that cover an interval of size 1<<L 

That is, level 0 has intervals of size 1, that's just

Tree_0[S] = sortIndexInverse[S] 
  (eg. it's just the file position of that sort pos)
  (in fact we don't make this level of the tree, we just use sortIndexInverse which we already have)

Tree_1[i] covers steps of size 2, that is :
  file positions sortIndexInverse[i*2] and sortIndexInverse[i*2+1]
  I store the min and max of that

Tree_2[i] covers steps of size 4, that is min and max of Tree_1[i*2] and Tree_1[i*2+1]

Now once you have this binary interval tree you can do the normal kind of binary tree walk to find the closest neighbor that's in the range you want.

Also see the following blog on how I walk binary interval trees and the released source code for StringMatchTest contains the full code for this matcher.

10-01-11 - Seattle Stop Shitting on my Face

I've been thinking about upgrading my neoprene gear so that I can swim in the lake through the fall/spring, not just the summer.

I fucking hate swimming laps in a pool during official lap swim hours, with all your "rules" and your system keeping me down. And it just doesn't make sense to go into some nasty indoor crowded box when I'm literally surrounded by miles of beautiful open water.

But there's a bit problem with this idea. Seattle is shitting on my face.

The fall/spring, when a wet suit would help, is when it rains. When it rains, the sewers overflow and drain into the lake. Then you get itchy bumps and vomiting and so on.

Seattle is basically doing nothing about it. There are some little programs to do "rain gardens", but those are sort of like using a tampon to stop elephant piss. What we need is serious fucking civil engineering. ("rain gardens" are also better known as "mosquito breeders"; we Houstonians are always amazed and delighted about the lack of mosquitos here; it's because Seattle is hilly and surrounded by lakes so the water doesn't pool, but the city is doing their best to ruin that). (a much bigger impact than piddly residential rain gardens would be to outlaw concrete parking lots; grass/gravel/lattice parking lots work perfectly fine for holding cars).

Of course this is a problem that is occuring all over the US. I hear NY has a major sewer problem as well. The population and development of most US cities has outstripped their infrastructure, and in our shitty faux-libertarian plutocracy of course there's no money for basic civil engineering. Only the heavy hand of EPA orders is forcing these dumb ass local governments to do anything.

The real solution is something like :

1. Separate the sewage and rain runoff. Run sewage to treatment plants in a closed system so it can never get out. (probably not realistic; alternatively, add a new clean storm water only system)

2. Since you will allow non-sewer storm water to drain to the lake, make the pollutants that run off to the lakes illegal. Fertilizers, pesticides, etc. everything that's water soluble and washes into the lakes is illegal right fucking now.

10-01-11 - More Reliable Timing on Windows

When profiling little code chunks on Windows, one of the constant annoyances is the unreliability of times due to multithreading.

Historically the way you address this is run lots of trials (like 100) and take the MIN time of any trial.

(* important note : if you aren't trying to time "hot cache" performance, you need to wipe the cache between each run. I dunno if there's an easy instruction or system call that would invalidate all cache pages; what I usually do is have a routine that goes and munges over some big array).

It's a bit better these days because of many cores. Now you can quite often find a core which is unmolested by annoying services popping up and stealing CPU time and messing up your profile. But sometimes you get unlucky, and your process runs on an IdealProc that has some other shite.

So a simple loop helps :

template <typename t_func>
uint64 GetFuncTime( t_func * pfunc )
    HANDLE proc = GetCurrentProcess();
    HANDLE thread = GetCurrentThread();
    DWORD_PTR affProc,affSys;
    uint64 tick_range = 1ULL << 62;
    for(int rep=0;rep<24;rep++)
        DWORD mask = 1UL<<rep;
        if ( mask & affProc )

        uint64 t1 = __rdtsc();
        uint64 t2 = __rdtsc();

        uint64 cur_tick_range = t2 - t1;
        tick_range = MIN(tick_range,cur_tick_range);



    return tick_range;

which makes it reasonably probable that you get a clean run on some core. For published results you will still want to repeat the whole thing N times.

10-01-11 - Used Prices

I really like the idea of buying used. Save some money, maybe prevent some goods from going to the land fill, fuck the retailer, etc. But used prices are just really fucking out of whack.

Used good need to have a *huge* discount. If you look at the utility, when I see something for sale on craigslist,

I can't just click "buy it" and it shows up at my door. So I need a big discount for that.

It's usually badly described and photographed, so I have to do a bunch of research. Discount.

I can't pay with CC at all so I have to go fetch some cash. Discount.

The seller has a 50% chance of being a flake in some way, like telling me its been sold after I show up for my appointment, or putting up the wrong item. Discount.

If it's broken or wrong or whatever, I can't return it. Big discount for this.

There's always an information gap problem when buying used - the seller knows more about the item than you. This fucks you with new items too, of course, but it's worth with used. eg. there's a chance that it's in worse condition than you know. Discount for this.

Even if some of the bad eventualities don't happen on a particular purchase, the price needs to be discounted by the probability that they happen times the cost to you when they do. eg. if your $100 purchase has a 10% chance of being fucked and worthless, you need a $10 discount on all purchases PLUS a discount for the trouble in that event, which is probably another $100 times 10%, so $20 total discount.

The result when you add up all the factors is that something that sells for $1000 new needs to be under $500 or so for it to be +EV to buy used. And it just isn't. In fact it's usually over $800. Used prices are uniformly too high.

Part of the problem is that the price of goods has gotten so out of whack with the price of labor.

old rants