10/26/2012

10-26-12 - Simple Stuff C Should Have

I'm not interested in pie-in-the-sky weirdo shit, only very simple stuff that I believe would make real world coding massively better.

1. Global variables should not be allowed without a prefix "global". (optionally). eg.


int x;  // compile failure!
static int y;
global int x; // ok

(this should be turned on as an option). I should be able to search for "global" and find them all (and remove them, because globals are pointless and horrible, especially in the modern age of multi-threading).

2. Name-hiding should require an explicit "hide" prefix, eg.


int x;

{
  int x;  // compile failure !
  hide int x; // ok

}

I hate name-hiding and think it never should have been in the language. It does nothing good and creates lots of bugs. Similarly :

3. Overloads and virtual overrides should have an explicit prefix.

This ensures that you are adding an overload or virtual override intentionally, not by accident just because the names happen to line up. The entire C overload/override method only works by coincidence; like it's a coincidence that these names are the same so they are an overload; there's no way to say "I intend this to be an overload, please be an error if it's not" (and the opposite; I intend this to be a unique function, please error if it is an overload).

Similarly, it catches the very common error that you wrote a virtual override and it all worked and then somebody changes the signature in the base class and suddenly you have code that still compiles but the virtuals no longer override.

4. C-style cast should be (optionally) an error.

I should be able to flip a switch and make C-style casts illegal. They are the devil, too easy to abuse, and impossible to search for. There should be a c_cast that provides the same action in a more verbose way.

5. Uninitialized variables should be an error. (optionally).


int x; // compile failure!
int x(0); // ok

Duh. Of course. Same thing with member variables in class constructors. It's ridiculous that it's so easy to use uninitialized memory.

6. Less undefined behavior.

Everything that's currently "undefined" should be a compile error by default. Then let me make it allowed by setting pragmas, like :


#require signed_shift

#require flat_memory_model

which then changes those usages from compile errors to clearly defined operations.

7. Standardize everything that's sort of outside the standard, such as common pragmas, warning disablement, etc.

The way to do this is to have a chunk of standard that's like "you don't have to implement this, but if you do the syntax must be like so" and then provide a way to check if it's implemented.

Just a standard syntax for warning disablement would be great (and of course the ability to do it in ranges or even C scopes).

Things like SIMD could also be added to the language in this way; just simple things like "if you have a simd type it is named this" would massively help with standardizing shit which is different on every platform/compiler for no good reason.

8. (optionally) disable generation of all automatic functions, eg. assignment, copy construct, default construct. The slight short-term convenience of these being auto-generated is vastly outweighed by the bugs they create when you use them accidentally on classes that should not be copied. Of course I know how to put non-copyable junk in classes but I shouldn't have to do that; that should be the default, and they should only be copyable when I explicitly say it's okay. And you shouldn't have to write that out either, there should be a "use default copy" directive that you put in the class when you know it's okay.

9. Trivial reflection. Just give me a way to say "on all member variables". There's no reason not to have this, it's so easy for the compiler to add, just give me a way to do :

template <typename t_op>
void Reflect( t_op & op )
{
    op(member1);
    op(member2);
    ...
}
In which the compiler generates the list of members for me, I don't have to manually do it.

10-26-12 - Oodle Rewrite Thoughts

I'm getting increasingly annoyed with the C-style Oodle threading code. It's just such a nightmare to manually manage things like object lifetimes in an async / multi-threaded environment.

Even something as simple as "write part of this buffer to a file" constantly causes me pain, because implied in that operation is "the buffer must not be freed until the write is done" , "the buffer should not be changed in the area being written until the write is done" , and "the file should not be closed until the write is done".

When you first start out and aren't doing a lot of complicated ops, it doesn't seem too bad, you can keep those things in your head; they become "comment-enforced" rules; that is, the code doesn't make itself correct, you have to write comments like "// write is pending, don't free buffer yet" (often you don't actually write the comments, but they're still "comment-enforced" as opposed to "code-enforced").

I think the better way is the very-C++-y Oodle futures .

Oodle futures rely on every object they take as inputs having refcounts, so there is no issue of free before exit. Some key points about the Oodle futures that I think are good :

A. Dependencies are automatic based on your arguments. You depend on anything you take as arguments. If the arguments themselves depend on async ops, then you depend on the chain of ops automatically. This is super-sweet and just removes a ton of bugs. You are then required to write code such that all your dependencies are in the form of function arguments, which at first is a pain in the ass, but actually results in much cleaner code overall because it makes the expression of dependencies really clear (as opposed to just touching some global deep inside your function, which creates a dependency in a really nasty way).

B. Futures create implicit async handles; the async handles in Oodle future are all ref-counted so they clean themselves automatically when you no longer care about them. This is way better than the manual lifetime management in Oodle right now, in which you either have to hold a bunch of handles.

C. It's an easy way to plug in the result of one async op into the input of the next one. It's like an imperative way of using code to do that graph drawing thing ; "this op has an output which goes into this input slot". Without an automated system for this, what I'm doing at the moment is writing lots of little stub functions that just wait on one op, gather up its results and starts the next op. There's no inefficiency in this, it's the same thing the future system does, but it's a pain in the ass.

If I was restarting from scratch I would go even further. Something like :

1. Every object has a refcount AND a read-write lock built into. Maybe the refcount and RW lock count go together in one U32 or U64 which is maintained by lockfree ops.

Refcounting is obvious. Lifetimes of async ops are way too complicated without it.

The RW lock in every object is something that sophomoric programmers don't see the need for. They think "hey it's a simple struct, I fill it on one thread, then pass it to another thread, and he touches it". No no no, you're a horrible programmer and I don't want to work with you. It seems simple at first, but it's just so fragile and prone to bugs any time you change anything, it's not worth it. If every object doesn't just come with an RW lock it's too easy to be lazy and skip adding one, which is very bad. If the lock is uncontended, as in the simple struct handoff case above, then it's very cheap, so just use it anyway.

2. Whenever you start an async op on an object, it takes a ref and also takes either a read lock or write lock.

3. Buffers are special in that you RW lock them in ranges. Same thing with textures and such. So you can write non-overlapping ranges simultaneously.

4. Every object has a list of the ops that are pending on that object. Any time you start a new op on an object, it is delayed until those pending ops are done. Similarly, every op has a list of objects that it takes as input, and won't run until those objects are ready.

The other big thing I would do in a rewrite from scratch is the basic architecture :

1. Write all my own threading primitives (semaphore, mutex, etc) and base them on a single waitset. (I basically have this already).

2. Write stack-ful coroutines.

3. When the low level Wait() is called on a stackful coroutine, instead yield the coroutine.

That way the coroutine code can just use Semaphore or whatever, and when it goes to wait on the semaphore, it will yield instead. It makes the coroutine code exactly the same as non-coroutine code and makes it "composable" (eg. you can call functions and they actually work), which I believe is crucial to real programming. This lets you write stackful coroutine code that does file IO or waits on async ops or whatever, and when you hit some blocking code it just automatically yields the coroutine (instead of blocking the whole worker thread).

This would mean that you could write coroutine code without any special syntax; so eg. you can call the same functions from coroutines as you do from non-coroutines and it Just Works the way you want. Hmm I think I wrote the same sentence like 3 times, but it's significant enough to bear repetition.

10/22/2012

10-22-12 - Windows 7 Start Menu Input Race

I've been super annoyed by some inconsistent behavior in the Windows 7 start menu for a while now. Sometimes I hit "Start - programname - enter" and nothing happens. I just sort of put it down to "god damn Windows is flakey and shit" but I finally realized yesterday exactly what's happening.

It's an input race , as previously discussed here

What happens is, you hit Start, and you get your focus in the type-in-a-program edit box. That part is fine. You type in a program name. At that point it does the search in the start menu thing in the background (it doesn't stall after each key press). In many cases there will be a bit of a delay before it updates the list of matching programs found.

If you hit Enter before it finds the program and highlights it, it just closes the dialog and doesn't run anything. If you wait a beat before hitting enter, the background program-finder will highlight the thing and hitting enter will work.

Very shitty. The start menu should not have keyboard input races. In this case the solution is obvious and trivial - when you hit enter it should wait on the background search task before acting on that key (but if you hit escape it should immediately close the window and abort the task without waiting).

I've long been an advocate of video game programmers doing "flakiness" testing by playing the game at 1 fps, or capturing recordings of the game at the normal 30 fps and then watching them play back at 1 fps. When you do that you see all sorts of janky shit that should be eliminated, like single frame horrible animation pops, or in normal GUIs you'll see things like the whole thing redraw twice in a row, or single frames where GUI elements flash in for 1 frame in the wrong place, etc.

Things like input races can be very easily found if you artificially slow down the program by 100X or so, so that you can see what it's actually doing step by step.

I'm a big believer in eliminating this kind of flakiness. Almost nobody that I've ever met in development puts it as a high priority, and it does take a lot of work for apparently little reward, and if you ask consumers they will never rate it highly on their wish list. But I think it's more important than people realize; I think it creates a feeling of solidness and trust in the application. It makes you feel like the app is doing what you tell it to, and if your avatar dies in the game it's because of your own actions, not because the stupid game didn't jump even though you hit the jump button because there was one frame where it wasn't responding to input.

10-22-12 - LZ-Bytewise conclusions

Wrapping this up + index post. Previous posts in the series :

cbloom rants 09-02-12 - Encoding Values in Bytes Part 1
cbloom rants 09-02-12 - Encoding Values in Bytes Part 2
cbloom rants 09-02-12 - Encoding Values in Bytes Part 3
cbloom rants 09-04-12 - Encoding Values in Bytes Part 4
cbloom rants 09-04-12 - LZ4 Optimal Parse
cbloom rants 09-10-12 - LZ4 - Large Window
cbloom rants 09-11-12 - LZ MinMatchLen and Parse Strategies
cbloom rants 09-13-12 - LZNib
cbloom rants 09-14-12 - Things Most Compressors Leave On the Table
cbloom rants 09-15-12 - Some compression comparison charts
cbloom rants 09-23-12 - Patches and Deltas
cbloom rants 09-24-12 - LZ String Matcher Decision Tree
cbloom rants 09-28-12 - LZNib on enwik8 with Long Range Matcher
cbloom rants 09-30-12 - Long Range Matcher Notes
cbloom rants 10-02-12 - Small note on LZHAM
cbloom rants 10-04-12 - Hash-Link match finder tricks
cbloom rants 10-05-12 - OodleLZ Encoder Speed Variation with Worker Count
cbloom rants 10-07-12 - Small Notes on LZNib
cbloom rants: 10-16-12 - Two more small notes on LZNib

And some little additions :

First a correction/addendum on cbloom rants 09-04-12 - LZ4 Optimal Parse :

I wrote before that going beyond the 15 states needed to capture the LRL overflowing the control byte doesn't help much (or at all). That's true if you only go up to 20 or 30 or 200 states, but if you go all the way to 270 states, so that you capture the transition to needing another byte, there is some win to be had (LZ4P-LO-332 got lztestset to 12714031 with small optimal state set, 12492631 with large state set).

If you just do it naively, it greatly increases memory use and run time. However, I realized that there is a better way. The key is to use the fact that there are so many code-cost ties. In LZ-Bytewise with the large state set, often the coding decision in a large number of states will have the same cost, and furthermore often the end point states will all have the same cost. When this happens, you don't need to make the decision independently for each state, instead you make one decision for the entire block, and you store a decision for a range of states, instead of one for each state.

eg. to be explicit, instead of doing :


in state 20 at pos P
consider coding a literal (takes me to state 21 at pos P+1)
consider various matches (takes me to state 0 at pos P+L)
store best choice in table[P][20]

in state 21 ...

do :

in states 16-260 at pos P
consider coding a literal (takes me to states 17-261 at pos P+1 which I saw all have the same cost)
consider various matches (takes me to state 0 at pos P+L)
store in table[P] : range {16-260} makes decision X

in states 261-263 ...

so you actually can do the very large optimal parse state set with not much increase in run time or memory use.

Second : I did a more complex variant of LZ4P (large window). LZ4P-LO includes "last offset". LZ4P-LO-332 uses a 3-bit-3-bit-2-bit control word (as described previously here : cbloom rants 09-10-12 - LZ4 - Large Window ) ; the 2 bit offset reserves one value for LO and 3 values for normal offsets.

(I consider this an "LZ4" variant because (unlike LZNib) it sends LZ codes as a strictly alternating LRL-ML pairs (LRL can be zero) and the control word of LRL and ML is in one byte)

Slightly better than LZ4P-LO-332 is LZ4P-LO-695 , where the numbering has switched from bits to number of values (so 332 should be 884 for consistency). You may have noticed that 6*9*5 = 270 does not fit in a byte, but that's fixed easily by forbidding some of the possibilities. 6-9-5 = 6 values for literals, 9 for match lengths, and 5 for offsets. The 5 offsets are LO + 2 bits of normal offset. So for example one of the ways that the 270 values is reduced is because an LO match can never occur after an LRL of 0 (the previous match would have just been longer), so those combinations are removed from the control byte.

LZ4P-LO-695 is not competitive with LZNib unless you spill the excess LRL and ML (the amount that is too large to fit in the control word) to nibbles, instead of spilling to bytes as in the original LZ4 and LZ4P. Even with spilling to nibbles, it's no better than LZNib. Doing LZ4P-LO-695, I found a few bugs in LZNib, so its results also got better.

Thirdly, current numbers :

raw lz4 lz4p332 lz4plo695 lznib d8 zlib OodleLZHLW
lzt00 16914 6473 6068 6012 5749 4896 4909
lzt01 200000 198900 198880 198107 198107 198199 198271
lzt02 755121 410695 292427 265490 253935 386203 174946
lzt03 3471552 1820761 1795951 1745594 1732491 1789728 1698003
lzt04 48649 16709 15584 15230 14352 11903 10679
lzt05 927796 460889 440742 420541 413894 422484 357308
lzt06 563160 493055 419768 407437 398780 446533 347495
lzt07 500000 265688 248500 240004 237120 229426 210182
lzt08 355400 331454 322959 297694 302303 277666 232863
lzt09 786488 344792 325124 313076 298340 325921 268715
lzt10 154624 15139 13299 11774 11995 12577 10274
lzt11 58524 25832 23870 22381 22219 21637 19132
lzt12 164423 33666 30864 29023 29214 27583 24101
lzt13 1041576 1042749 1040033 1039169 1009055 969636 923798
lzt14 102400 56525 53395 51328 51522 48155 46422
lzt15 34664 14062 12723 11610 11696 11464 10349
lzt16 21504 12349 11392 10881 10889 10311 9936
lzt17 53161 23141 22028 21877 20857 18518 17931
lzt18 102400 85659 79138 74459 76335 68392 59919
lzt19 768771 363217 335912 323886 299498 312257 268329
lzt20 1179702 1045179 993442 973791 955546 952365 855231
lzt21 679936 194075 113461 107860 102857 148267 83825
lzt22 400000 361733 348347 336715 331960 309569 279646
lzt23 1048576 1040701 1035197 1008638 989387 777633 798045
lzt24 3471552 2369885 1934129 1757927 1649592 2289316 1398291
lzt25 1029744 324190 332747 269047 230931 210363 96745
lzt26 262144 246465 244990 239816 239509 222808 207600
lzt27 857241 430350 353497 315394 328666 333120 223125
lzt28 1591760 445806 388712 376137 345343 335243 259488
lzt29 3953035 2235299 1519904 1451801 1424026 1805289 1132368
lzt30 100000 100394 100393 100010 100013 100020 100001
total 24700817 14815832 13053476 12442709 12096181 13077482 10327927

And comparison charts on the aggregated single file lzt99 :

Speeds are the best of 20 trials on each core; speed is the best of either x86 or x64 (usually x64 is faster). The decode times measured are slightly lower for everybody in this post (vs the last post of this type) because of the slightly more rigorous timing runs. For reference the decode speeds I measured are (mb/s) :


LZ4 :      1715.10235
LZNib :     869.1924302
OodleLZHLW: 287.2821629
zlib :      226.9286645
LZMA :       31.41397495

Also LZNib current enwik8 size : (parallel chunking (8 MB chunks) and LRM 12/12 with bubble)

LZNib enwik8 mml3 : 30719351
LZNib enwik8 stepml : 30548818

(all other LZNib results are for mml3)

10/16/2012

10-16-12 - Two more small notes on LZNib

Followup to Small Notes on LZNib

1. Because cost ties are common, and ties are not actually ties (due to "last offset"), just changing the order that you visit matches can change your compression. eg. if you walk matches from long to short or short to long or low offset to high offset, etc.

Another important way to break ties is for speed. Basically prefer long matches and long literal runs vs. a series of shorter ones that make the same output length. Because the code cost is integer bytes, you can do this pretty easily by just adding a small bias to the cost (one thousandth of a byte or whatever) each time you start a new match or LRL.

(more generally in an ideal world every compressor should have a lagrange parameter for space-speed tradeoff, but that's the kind of thing nobody ever gets around to)

2. Traditional LZ coders did not output matches unless they were cheaper than literals. That is, say you send a match len in 4 bits and an offset in 12 bits, so a match is 2 bytes - you would think that the minimum match length should be 3 - not 2 - because sending a 2 byte match is pointless (it's cheaper or the same cost to send those 2 bytes as literals (cheaper as literals if you are in a literal run-len already)). By using a larger MML, you can send higher match lengths in your 4 bits, so it should be a win.

This is not true if you have "last offset". With LO in your coder, it is often beneficial to send matches which are not a win (vs literals) on their own. eg. in the above example, minimum match length should be 2 in an LO coder.

This is one of those cases where text and binary data differ drastically. If you never tested on structured data you would not see this. Really the nature of LZ compression on text and binary is so different that it's worth considering two totally independent compressors (or at least some different tweaked config vals). Text match offsets fall off very steadily in a perfect curve, and "last offsets" are only used for interrupted matches, not for re-using an offset (and generally don't help that much). Binary match offsets have very sparse histograms with lots of strong peaks at the record sizes in the file, and "last offset" is used often just as a way of cheaply encoding the common record distance.

On text, it is in fact best to use an MML which makes matches strictly smaller than literals.

If I keep at this work in the future I'm sure I'll get around to doing an LZ specifically designed for structured data; it's sort of hopeless trying to find a compromise that works great on both; I see a lot more win possible.

10-16-12 - Thoughts on Bit-Packing Structs Before Compression

If you're trying to transmit some data compactly, and you are *not* using any back-end compression, then it's relatively straightforward to pack the structs through ad-hoc "bit packing" - you just want to squeeze them into as few bits as possible. But if you are going to apply a standard compressor after bit packing, it's a little less clear. In particular, a lot of people make mistakes that result in larger final data than necessary.

To be clear, there are two compression steps :


{ Raw structs } --[ad hoc]--> { Bit Packed } --[compressor]--> { Transmitted Data }

What you actually want to minimize is the size of the final transmitted data, which is not necessarily achieved with the smallest bit packed data.

The ideal scenario is if you know your back-end compressor, simply try a variety of ways of packing and measure the final size. You should always start with completely un-packed data, which often is a reasonable way to go. It's also important to keep in mind the speed hit of bit packing. Compressors (in particular, decompressors) are very fast, so even though your bit-packing may just consist of some simple math, it actually can very easily be much slower than the back-end decompressor. Many people incorrectly spend CPU time doing pre-compression bit-packing, when they would be better off spending that same CPU time by just running a stronger compressor and not doing any twiddling themselves.

The goal of bit-packing should really be to put the data in a form that the compressor can model efficienctly. Almost all compressors assume an 8-bit alphabet, so you want your data to stay in 8-bit form (eg. use bit-aligned packing, don't use non-power-of-2 multiplies to tightly pack values if they will cross a byte boundary). Also almost all compressors, even the best in the world (PAQ, etc) primarily achieve compression by modeling correlation between neighboring bytes. That means if you have data that does not have the property of maximum correlation to its immediate neighbor (and steady falloff) then some swizzling may help, just rearranging bytes to put the correlated bytes near each other and the uncorrelated bytes far away.

Some issues to consider :

1. Lossy bit packing.

Any time you can throw away bits completely, you have a big opportunity that you should exploit (which no back end compressor can ever do, because it sends data exactly). The most common case of this is if you have floats in your struct. Almost always there are several bits in a float which are pure garbage, just random noise which is way below the error tolerance of your app. Those bits are impossible to compress and if you can throw them away, that's pure win. Most floats are better transmitted as something like a 16 bit fixed point, but this requires application-specific knowledge about how much precision is really needed.

Even if you decide you can't throw away those bits, something that can help is just to get them out of the main stream. Having some random bytes mixed in to an otherwise nicely compressible stream really mucks up the order-0 statistics, so just putting them on the side is a nice way to go. eg. you might take the bottom 4 or 8 bits out of each float and just pass them uncompressed.

(in practical bone-head tips, it's pretty common for un-initialized memory to be passed to compressors; eg. if your structs are padded by C so there are gaps between values, put something highly compressible in the gap, like zero or a duplicate of the neighboring byte)

2. Relationships between values.

Any time you have a struct where the values are not completely independent, you have a good opportunity for packing. Obviously there are cases where one value in a struct can be computed from another and should just not be sent.

There are more subtle cases, like if A = 1 then B has certain statistics (perhaps it's usually high), while if A = 0 then B has other statistics (perhaps it's usually low). In these cases there are a few options. One is just to rearrange the transmission order so that A and B are adjacent. Most back end compressors model correlation between values that are adjacent, so putting the most-related values in a struct next to each other will let the back end find that correlation.

There are also often complicated mathematical relationships. A common case is a normalized vector; the 3 values are constrained in a way that the compressor will never be able to figure out (proof that current compressors are still very far away from the ideal of perfect compression). When possible you want to reduce these related values to their minimal set; another common case is rotation matrices, where 9 floats (36 bytes) can be reduced to 3 fixed points (6-9 bytes).

This is really exactly the same as the kinds of variable changes that you want to do physics; when you have a lot of values in a struct that are constrained together in some way, you want to identify the true number of degrees of freedom, and try to convert your values into independent unconstrained variables.

When numerical values are correlated to their neighbors, delta transformation may help. (this particularly helps with larger-than-byte values where a compressor will have a harder time figuring it out)

3. Don't mash together statistics.

A common mistake is to get too aggressive with mashing together values into bits in a way that wrecks the back-end statistical model. Most back end compressors work best if the bytes in the file all have the same probability histogram; that is, they are drawn from the same "source". (as noted in some of the other points, if there are multiple unrelated "sources" in your one data stream, the best thing to do is to separate them from each other in the buffer)

Let me give a really concrete example of this. Say you have some data which has lots of unused space in its bytes, something like :


bytes in the original have values :

0000 + 4 bits from source "A"
0001 + 4 bits from source "B"

(when I say "from source" I mean a random value drawn under a certain probability distribution)

You might be tempted to bit-pack these to compact them before the back end compressor. You might do something like this :


Take the top 4 bits to make a flag bit
Take 8 flag bits and put them in a byte

Then take the 4 bits of either A or B and put them together in the high and low nibble of a byte

eg, in nibbles :

0A 1B 1B 0A 0A 0A 1B 0A 

--[bit packed]-->

01100010 (binary) + ABBAAABA (nibbles)

(and A and B are not the hex numbers but mean 4 bits drawn from that source)

It looks like you have done a nice job of packing, but in fact you've really wrecked the data. The sources A and B had different statistics, and in the original form the compressor would have been able to learn that, because the flag bit was right there in the byte with the payload. But by packing it up tightly what you have done is made a bunch of bytes whose probability model is a mix of {bit flags},{source A},{source B}, which is a big mess.

I guess a related point is :

4. Even straightforward bit packing doesn't work for the reasons you think it does.

Say for example you have a bunch of bytes which only take on the values 0-3 (eg. use 2 bits). You might think that it would be a big win to do your own bit packing before the compressor and cram 4 bytes together into one. Well, maybe.

The issue is that the back end compressor will be able to do that exact same thing just as well. It can see that the bytes only take values 0-3 and thus will send them as 2 bits. It doesn't really need your help to see that. (you could help it if you had say some values that you knew were in 0-3 and some other values you knew were in 0-7, you might de-interleave those values so they are separated in the file, or somehow include their identity in the value so that their statistics don't get mixed up; see #5)

However, packing the bytes down can help in some cases. One is if the values are correlated to their neighbors; by packing them you get more of them near each other, so the correlation is modeled at an effective higher order. (eg. if the back end only used order-0 literals, then by packing you get order-3 (for one of the values anyway)). If the values are not neighbor-correlated, then packing will actually hurt.

(with a Huffman back end packing can also help because it allows you to get fractional bits per original value)

Also for small window LZ, packing down effectively increases the window size. Many people see advantages to packing data down before feeding it to Zip, but largely that is just reflective of the tiny 32k window in Zip (left over from the DOS days and totally insane that we're still using it).

5. Separating values that are independent :

I guess I've covered this in other points but it's significant enough to be redundant about. If you have two different sources (A and B); and there's not much correlation between the two, eg. A's and B's are unrelated, but the A's are correlated to other A's - you should try to deinterleave them.

A common simple case is AOS vs SOA. When you have a bunch of structs, often each value in the struct is more related to the same value in its neighbor struct than to other values within its own struct (eg. struct0.x is related to struct1.x more than to struct0.y). In this case, you should transform from array-of-structs to struct-of-arrays ; that is, put all the .x's together.

For example, it's well known that DXT1 compresses better if you de-interleave the end point colors from the palette interpolation indeces. Note that AOS-SOA transformation is very slow if done naively so this has to be considered as a tradeoff in the larger picture.

More generally when given a struct you want to use app-specific knowledge to pack together values that are strongly correlated and de-interleave values that are not.

10/15/2012

10-15-12 - Treat People Like Children

One of the things I've realized in the last year or two is that you should treat people like children. It's what they really want, and it's also just better for yourself.

What I mean is, when you treat someone "like an adult", you let them be responsible for their own decisions, you let them suffer the ill consequences of their own mistakes, and you listen to their words as if they mean what they say literally. When you treat someone "like a child" , you clean up after them, you fix their mistakes for them, you assume that when they say something wrong they didn't mean it, etc.

I think some examples may clarify what I mean.

Say you're going hiking in the mountains with a friend. You notice that they have not brought a jacket and you know it will be cold up there. You say "hey do you want to borrow a jacket?" and they say "nah, I'll be fine". You know fully well they will not be fine. If you "treat them like an adult", you would just let them suffer the ill consequences of their bad decision, but the result will be unpleasant for you as well, they will complain, they'll be all pouty and in a bad mood, they'll want to leave quickly, it will suck. Either you can say "fuck you, I told you to bring a jacket, I want to stay, suck it up!" or you can accomodate them and leave early, and either way sucks. So the better thing is to "treat them like a child" and just say at the start "well I'll bring an extra one anyway in case you want it". (with particularly childish friends you shouldn't even say anything and just silently bring an extra one).

(The same goes with snacks and water and such obviously; basically you're better off being like a mom and carrying a pouch of supplies to keep all the "children" (ie. all humans) from getting cranky).

Say you're driving with your dad and you're lost and he doesn't want to stop for directions. If you treat him "like an adult" you would either just speak to him rationally and say "hey this is silly you need to stop and ask someone, don't be so childish" or you would just let him suffer the ill consequences of being lost. But of course neither of those would actually work (almost nobody responds well to having their bad behavior pointed out to them). What you need to do is treat him like a pouty child and fix the situation yourself; eg. say you really need to pee, can we stop for that please, and then ask for directions yourself.

A very common one is just when someone is really pouty or starts acting like a jerk to you. You could "treat them like an adult" and assume they are aware of what they are saying and actually mean to be a jerk to you. But in reality they probably don't, they are just hungry or cranky or need a poop (they are a child) and you shouldn't take it personally. If you need to interact with them, you should get them some food and water and try to fix their crankiness before proceeding.

I find in general that interactions with people work much better if I treat them like a child. (and the same goes in reverse - I get along much better with people who treat me like a child). (basically the idea of the rational self-responsible adult is an invention that does not correspond to reality)

(I guess a related thing that everyone in "communication" knows is you can't just criticize someone and expect them to rationally accept the information and decide if it is useful or not; you have to butter them up first and do it really gently and all that, just like you were trying to critique your child's drawing ("that tree is awful, trees don't look like lollipops, you moron"))

(I guess 99% of modern publicity is just treating people like children. It doesn't matter how good your product is if it has enough stars on the box; your store can sell garbage if it smells like cookies; it doesn't matter what the president actually says as long as he has good hair. I feel like in the 50's before PR was figured out, that media actually treated adults like adults a bit more, and the cleverness of the modern age is realizing that everyone is an easily manipulated pouty child (suck on your iNipple))


Related : thoughts on using money.

I have enough money now to live comfortably, much more so than when I was a child. The little differences are really what strike me. When I was a child of course you would buy store brand aluminum foil, of course you would use coupons, those dollars all mattered. Buying food at Disneyland was a huge luxury (to be avoided if possible, you can wait till we get out of the park, right?) because it was marked up so much. So the first good use of money is just hey you can buy whatever basic necessities you want and not waste your time worrying about the price.

I've tried various ways of spending money now and think that I've made some discoveries. Fancy cars and fancy houses are not good ways to spend money. They are not any better and don't improve your life. In general buying stuff/goods/toys is not helpful (except when it allows you to do an activity that you could not have otherwise done, and you actually do that activity and enjoy it; eg. buying fancy road bikes has zero value if you already had a bike that was good enough and you enjoyed riding; if it doesn't change your ability to do an activity, just making it faster or easier or whatever has zero actual value; but if you had no bike and buy one and then actually ride it, okay that's a good use of money).

Anyway, one of the best uses of money is just to fix all those little moments of crankiness. Like you're in a museum and you're kind of tired or hungry or thirsty; you start to get cranky and not enjoy it. My depression-era upbringing tells me to just gut it out; stay the hell away from the museum cafe, because it's crap food and it's way overpriced. But with money you can just buy the ten dollar tuna sandwich and it will fix your bad mood; that's a good use of money. (in my youth we would have brought homemade sandwiches).

10-15-12 - Joy

Paso Robles Joy :

The sun, the heat. The big open spaces and shade trees. It makes me want to get naked and feel the hand of the sun on my skin, run around in the field, it's the way human beings should be. The vast rolling hills just beg you to get on a horse and ride (though our shitty world of fences and private property make that only a fantasy).

The biking is just fantastic. Pretty deserted roads (though there's more traffic than I remember being here 10 years ago), decent pavement. Grape vines and oak trees all around, and lovely rolling terrain. I hate flats, and I kind of hate endless climbs; here I have the sweet mix, a hard sprint climb, and then a fun windy descent, then a bit of a gradual climb, up and down, lots of variety, never boring. Really great riding, and so many different routes with varying difficulty levels, all out in the country but close by.

The smell; maybe above all the thing that hits me any time I come back to California are the sweet smells; sage and grass up on the dry hillsides, and bay laurel down in the river hollows, the gentle breezes just rich with the wild smells.

October might be the best time here; the grapes are ripe and just about to be picked (in fact there are pickers working right now); you can smell them as you ride around, or stop and have a snack. Wine grapes are super delicious; they have much more interesting flavors than the garbage you get in grocery stores, tons of weird musky notes and caramel and just lots of complexity, not so sweet and boring.

I love that everyone drives fast in California. It just makes life so much easier for me, because I'm not constantly fighting the general flow around me. (in fact being used to Northwest driving, I'm often the slowest person here). I know that it doesn't mean that people here are actually more intelligent or better drivers, they're just following the regional habit the same way Northwest people are (the way people so uniformly just go with the habit of their area is a great demonstration of how little actual individuality anyone has; 99% of your "personality" is just where you live and the time and place you were raised), but man it is a fucking bummer driving in the Northwest with all the up-tight busy-body dumbass passive-aggressive speed-limit followers (who are actually very dangerous drivers, because they don't adapt their behavior to the situation at hand).

Seattle Joy :

This is a memory of July/August, trying to remind myself of the good things.

Being able to walk down and swim in the lake is incredibly nice. I love to just swim out a hundred feet or so and float; getting away from shore you get a view of Mt Rainier and the city skyline. Incredibly it really never gets too crowded in the lake, and even on busy boating days it clears out around twilight, which is one of the best times to be in the lake.

It's really magical when the blackberries get ripe all over the city. The sweet rich smell fills the air and you get it just everywhere as you walk around town. You can ride around Mercer Island and stop and snack as you go.

I've found some pretty decent river swims; they aren't the river swims of my dreams (too cold, and not private enough to get naked), but they are a joy on those rare hot summer days, when you get a bracing dip that shocks you and makes you feel alive.

One of the things that I totally take for granted is that we have no bugs. I completely forget that it's true until I go visit some place like PA or The South where you just can't even sit outside at all without being attacked. It absolutely sucks to live in places with bugs and it's some kind of bizarre miracle that we don't have them (it makes no sense to me that we don't, there's lots of water, and it doesn't get that cold, it should be ideal mosquito land, wtf).

Of course the high mountains are really incredible. Once again I only got backpacking twice; every year I tell myself that I need to go more next year, but it doesn't happen. One problem is that I feel like I can't take that much time off work; another problem is that just staying in the city and swimming and biking near home is so sweet in the summer that the motivation to go way out to the woods is reduced. Anyway, once again I swear I'll try to get out more often next year. It would be easier if I could work in the mountains.


Comparing the Northwest with California, I've had some revelations about what makes for really great driving/riding roads. The driving & riding around Seattle just sucks, and surprisingly in CA which is a much much more populous state, that doesn't really seem to be that much older, it's way better.

The key thing for great roads is that they are somewhat old and now disused. That is, there had to be some reason to put in good country roads long ago (mining, farming) but now there is not much reason for people to be on those roads, so they are low traffic. They have to be old enough to be made before earth-moving equipment, so they are nice and windy and narrow.

The problem with the Northwest is it's just too young. Habitation in the area is only 100 years old; there aren't farm roads from 150 years ago. The only old roads are logging roads and those are/were dirt and temporary. There's only a handful of nice windy old mountain pass roads, and they all are popular tourist attractions which makes them no good for me.

Of course one of the things that makes the Central Coast area so great is the strict development controls that keep the towns from creeping into the countryside and devouring it with endless suburbs. With no housing subdivisions on these old farm roads there's not much use for them, and that makes them heaven for a windy road lover.


Being back in SLO gives me some perspective on how badly I've lived my life. Walking around downtown Tasha asked me if I did this or that, did I go to college parties? did I surf? did I make wine? No, not really. What did I do all the time that I lived here? Pretty much just worked. What a retard.

I feel like I accomplish a lot more than the average programmer, and I like to think that it's because I'm smart and more efficient; I think I have good methodology and solve problems more directly, but maybe I don't, maybe I just work more. When I'm in the moment I can't see it, but any time I look back at my life with 10 years or more of distance I go "wtf I was just doing nothing but working the whole time".

Maybe that's just the way it is for everyone; you work and buy groceries and sleep and go through life without ever doing much.

In related news, I think going out to dinner and going to movies and such is a really horrible way to spend your time. It doesn't really impact your life, you don't remember it down the road, it's just a way of killing time, it's not much better than watching TV or drinking booze (which is the ultimate in "please just make this lifetime go away with as little involvement from me as possible").

10/14/2012

10-14-12 - Ranting

We're staying in this cabiny rental in Paso Robles. I walk in the door and one of the first things I see is a sign in the kitchen saying "don't cut on the butcher block counter; if you spill wine please use stain remover; etc..". What the fuck is wrong with everyone !? It is mind boggling how bad you all are at life. (my ex-landlady had the same dumb rule about the butcher block). First of all, the whole point of butcher block counter is to cut on, you dumb suburban ladies see them in magazines and think it looks nice and don't actually understand them, but whatever even if you don't agree with that - don't put it in a fucking rental unit if you don't want people to cut on it, it's just so incredibly stupid, it's creating problems for yourself. Of course renters are going to use it in a reasonable way (eg. cut on it) not in your unreasonable uptight way. (it's particularly ridiculous here because it's a rustic cabiny thing with super-shitty home-improver home-depot fixtures; ooo wouldn't want to get a water mark on your plywood furniture ooo)

It sort of reminds me of all the terrible park designers who make these circuitous awkward paths such that the direct route to get through the park is straight through the greenery, and then put up signs saying "stay off the grass". If you wanted people to stay off the grass you should have put the path in the natural place to walk. Of course people are going to cut off your dumb artsy loop. It's not their fault for walking on the grass, they're not doing anything wrong, you did something wrong by building your paths dumbly.

Driving down I-5 to get here, you're on this seemingly endless straight flat stretch of highway; I spent the whole time thinking about what incredible morons everyone around me was. First of all, hardly anybody seems to use cruise control, so I'm going along at 70 and people keep passing me and then slowing down in front of me and such annoying shit. Dumbasses. Even the people who are reasonably adept at controlling a car are just such inconsiderate assholes. For example people would constantly pull out to pass me at like 71 mph when I'm going 70, which causes them to box me in on the left for like half an hour because it takes them so long to pass; inevitably some truck is in the right lane and I get trapped. It was so consistent that I started to just hold the left lane (which I hate to do and felt like an asshole) until another car proved to me that they were a decent human being (eg. would pass me quickly). I'm trying to be less curteous to strangers; my new rule is that you have to give me some kind of sign that you aren't a waste of oxygen before I get upset with myself for inconveniencing you. (it's not really working yet, I still instinctively get out of the way of assholes who are rudely barging through a crowd, etc)

Tasha and I have both gotten speeding tickets recently while passing, right in that brief moment when we sped up to get the pass over with quickly. Speeding tickets in general are obviously a farce, so this is no surprise, but it's completely absurd to suggest that you should pass without speeding. If someone is going 3 mph under the limit and you want to pass (in a 65 mph zone, and assuming you need 200 feet of clearance on each side to pass safely) it would take 1.64 miles to make the pass. Of course the safest way to make a pass is to get it over with as quickly as possible, eg. pop up to 90 mph briefly; in a reasonable world you should get a ticket for passing without speeding up enough (which I occasionally see and is very dangerous) (or of course for blocking up traffic and not pulling out).

Because I have a ticket I'm trying to be really careful and not speed at all, and it *sucks* god it sucks. It's not the speed I miss, I'm actually totally fine with just driving slowly for a while, it's the ability to get away from all the dumb fuckers out there. If you actually drive the speed limit everywhere, you are constantly surrounded by other cars and they are just constantly doing cock-ass-motherfucker things like changing lanes right into me without signalling so that I have to take evasive action, or just hanging out right in my blind spot and matching my speed, or speeding up to pull in front of me and then slowing down again, etc. etc. It's just awful driving in a pack. I actually think it was much safer when I was speeding, because I would use it to find empty spots on the freeway and just get alone. I also think it's a lot safer to always be slightly faster than the average traffic, because then it's all layed out in front of you for you to see, rather than buzzing around and coming up from behind. (obviously this is a local optimization not a global one) (I think that most drivers are not actually watching for other cars the way I do; that's why nobody but me seems to care about that fact that almost every modern car has absolutely horrid visibility. When I drive I know exactly where every car around me is, so that I can always make an evasive move without looking, because I know there's a space on my left or right.)

In other news of "my god everyone is so incredibly dumb" I had three retail experiences in a row with the exact same bizarre dumb interaction. I went to this gross mongolian wok place and asked the cashier for a "grilled pork bowl" and she looked at me like I had just said "blerkabootyppsh" , she was like "err, what is that?"; eventually I re-checked the menu and said "barbecue pork bowl" and she was like "oh, okay". Huh? What? You can't figure out that maybe that's what I meant? It's not a very hard puzzle, the only things you sell are "chicken bowl" , "pork bowl", and "beef bowl", so just because I put the wrong adjective in front shouldn't have blown your mind (and FYI it's actually grilled not barbecued); it's like there's just empty space behind those eyes.

The one that really boggles my mind is the constant level of stupidity in coffee shops; I ordered a doppio somewhere and the girl was like "err.. uhh.. do you mean a double espresso?" , uh yeah, you work in a fucking espresso shop and you've never heard of a doppio before? Of course it always kind of blows my mind the way people can do a job day after day and not be at all interested in learning about it or doing it well.

10/13/2012

10-13-12 - Sports

Random thoughts from someone who doesn't know much about sports.

1. Of course Lance Armstrong was on drugs; if you didn't know that, you're a moron. He completely dominated a field which was full of dopers, winning in the mountains and on the flats; if he could do that naturally he would be some kind of super-human abnormality, which of course he wasn't. It doesn't diminish his amazing achievements at all. Everyone he was competing against was on drugs too, so it was a totally level playing field. Everyone in cycling has been on drugs since maybe the 30's or so when they took straight amphetamines. (so did most atheletes in those days). You do know that Eddy Merckx was on drugs, right? And pretty much every TdF winner ever. Everyone in every sport in the world has been on drugs for roughly the past century, it's bizarre to act like it's a scandal. It's sort of like a man admitting that he thinks about women other than his wife and everyone gets all upset about it (harrumph and drops their monocle); it's fucking retarded to have these societal faux-pas that publicly we decry and nobody can admit to, but anybody with a brain knowns that everybody does it.

Even if you have perfect drug testing of pro athletes, it wouldn't diminish the importance or usage of drugs in sport. eg. say you tested every single day and the tests could detect everything so no doping was possible. That would just make it even more important for the kids to dope in high school before going pro and getting into the testing regime (which is what happens in NFL football these days; to be a football player you must use steroids in high school).

(The French obsession with taking down Lance for doping is particularly ridiculous; they're upset that Lance beat all their French stars so badly, and just generally upset that French cyclists all suck so bad these days, but of course the only French cyclists who have had any success at all recently (eg. Richard Virenque) were huge huge dopers (which is inevitable when you are carrying the expectations of a nation))

2. Sebastien Loeb was probably the greatest racing driver of all time. Unfortunately for him, the WRC format has just not been very interesting during his reign, and he didn't have the fortune of a good foil - he needed a rival to seriously challenge him and make it interesting for the fans, but nobody ever could. (it also didn't help that the cars are so boring now; historic rallies are probably the best rallies to watch now; I love to watch the old Ford Escorts in the 2wd historic rallies hanging the tail around every corner).

Obviously 9 straight championships speaks of his dominance, but if you actually watched some of the races you would appreciate that his supremacy was at a level even beyond what the numbers show. You could tell that he was playing it safe most of the time, that he always had a little more speed in the bank. Of course that's smart, and part of what made Loeb the greatest of all time, that he was not only skilled but crafty and good at managing the risk and percentages. He would drive just fast enough to win and no faster; sometimes he would fall behind in an early stage of a race, and then he would push a little harder and just rip time out of the competition, showing how much speed he really had.

3. The current Spanish national soccer team is a real joy to watch; one of the best international teams I've ever seen (but I don't watch a lot of soccer). The thing that makes them so great is they play a dynamic style with lots of movement (their movement off the ball is particularly good; they run great "give and go's" in basketball lingo), and they make goals from the natural flow of play. Way too many international teams use on a very boring defensive style, where they just randomly launch the ball forward or rely on set pieces (corner kicks, penalties) for goals. It's so much nicer to see goals come out of the flow of play rather than set pieces. The German teams in particular are always very effective but just agony to watch. Even the great Brazilian teams have fun individual flair but actually play a pretty defensive configuration most of the time and rely on just a few forwards to make something happen.

Soccer, like most sports, is clearly broken. By "broken" I mean that the rules of the game do not encourage beautiful play; in fact they punish it. A well designed game system makes it so that playing smartly (eg. to maximize the chance of winning) also causes you to play in a way that is elegant and nice to watch and in the spirit of the game. I don't think that playing 8 defenders and winning on penalty kicks is in the spirit of the game and the rules should not let that be such a profitable strategy.

4. I've been enjoying watching F1 recently, mainly as a nice way to zone out (it's very boring, a bit like watching golf or something, just a nice bit of mindless background).

One of the very annoying things about it is that all the video feeds are provided by the F1 organization (not each TV channel) and they are just terrible. The director seems to have very little clue about what is interesting to watch in racing. They're constantly showing cars in 20th place (HRT or whatever) going into the pits; oo 20th place pitted, better cut away to that, fascinating. And of course they cut away from the leaders right when they are setting up for a pass.

F1, like almost all sports, also has just terrible announcers. There are lots of interesting things happening all the time that you would not have any clue about unless you really know racing, because the commentators don't tell you. For example smart drivers like Alonso are very clever about how they interact with other cars; if he's trying to make a pass, he will pester the car in front in areas of the track where he is not planning to pass; this makes the leading car use up its KERS unwisely, meanwhile Alonso is saving all his KERS for the spot where he is planning to make the pass. Sometimes a tough pass is set up with fakes (bluffs) over the course of several laps; you show that you want to pass in one part of the track, so that the leading driver starts going defensive in that spot, then you actually make the pass in a totally other section where you have previously bluffed that you don't have pace. Good commentators should be telling you about these dynamics, as well as just constantly telling stories about the drivers to give you some background on how they interact with each other. If you actually stop and think about how good commentating could be, and how shitty it actually is, the gulf is massive. We've been kind of inured to just atrocious sports commentary, so much so that it is the expected norm (it doesn't feel right to watch football without some commentator saying "they need to get more physical").

I feel like Red Bull is actually way way better than all the other teams, but it doesn't seem that way on the surface because they keep getting torpedoed by the FIA. Every time they make another advancement that lets them run away with races, the rules get changed to make their innovation illegal. Certainly the racing is more interesting if the teams are close by, but constantly changing the rules to hinder the leader is not a good way to make a sport competitive.

10/08/2012

10-08-12 - Oh dear god, what a relief

For the past few days I've had a terrible heisen-bug. It mainly showed up as decompression failing to reproduce the original data. At first I thought it was just a regular bug, some weird case exposing a broken pathway, but I could never get it to repro, and it only seemed to happen when I ran very long tests; like if I compressed 10,000 files it would show up in the 7000th one, but then if I ran on just the file that failed it would never repro.

I do a lot of weird threading stuff so my first fear was that I had some kind of race. So I turned off all my threading, but it kept happening.

My next thought was some kind of uninitialized memory problem or out-of-bounds problem. The circumstances of failure jive with the bug only happening after I have touched a lot of memory and maybe moved into a weird part of address space, or maybe I'm writing past the end of a buffer somewhere and it doesn't show up and hurt me until much later.

So I turned on my various debug allocator features and tried a bunch of things to stress that, but still couldn't get it to fail in any kind of repeatable way.

Yesterday I saw the exact same kind of bug happen in a few of my different compressors and the lightbulb finally came on in my head : maybe I have bad RAM. Memtest86 and just a few seconds in, yep, bad RAM.

Phew. As pissed as I am to have to deal with this (getting into the RAM on my lappy is a serious pain in the ass) it's nice to not actually have a bizarro bug.

The failure rate of RAM in desktop-replacement lappies is around 100% in my experience. I've had two different desktop replacement lappies in the past 8 years and I have burned out 3 RAM chips; I've blown the OEM RAM on both of them and on this one I also toasted the replacement RAM. Presumably the problem is that it just gets too hot in there and they don't have sufficient cooling. (and yes I keep them on a screen for air flow and all that, and never actually use them on a lap or pillow or anything bad like that). (perhaps I should get one of those laptop stands that has active cooling fans).

Also, shouldn't we have better debugging features by now?

I should be able to take any range of memory, not just page boundaries, and mark it as "no access". So for example I could take compression buffers and put little no access regions at the head and tail.

For uninitialized memory you want to be able to mark every allocation as "fault if it's read before it's written". (this requires a bit per byte which is cleared on write).

You could enforce true const in C by making a true_const template that marks its memory as read-only.

I've ranted before about how thread debugging would be much better if we could mark memory as "fault unless you are thread X", eg. give exclusive access of a memory region to a thread.

I see two good solutions for this : 1. a VM that could run your exe and add these features, or 2. special memory chips and MMU's for programmers. I certainly would pay extra for RAM that had an extra 2 bits per byte with access flags. Hell with how cheap RAM is these days I would pay extra for more error-correction bits too; maybe even completely duplicate bytes. And self-healing RAM wouldn't be bad either (just mark a page as unusable if it sees failures in that page).

(for thread debugging we should also have a VM that can record exact execution traces and replay them, of course).

10/07/2012

10-07-12 - Small Notes on LZNib

Some little thoughts.

1. It's kind of amazing to me how well LZNib does. (currently 30,986,634 on enwik8 with parallel chunked compress and LRM). I guess it's just the "asymptotic optimality" of LZ77; as the dictionary gets bigger, LZ77 approaches perfect compression (assuming the data source is static, which of course it never is, which is why LZ77 does not in fact approach the best compressor). But anyway, the point is with basic LZ the way matches are encoded becomes less and less important as the window gets bigger (and the average match length thus gets longer).

2. With byte-wise coders you have something funny in the optimal parser than you don't run into much with huffman or arithmetic coders : *ties*. That is, there are frequently many ways to code that have exactly the same code length. (in fact it's not uncommon for *all* the coding choices at a given position to produce the same total length).

You might think ties don't matter but in fact they do. One way you can break a tie is to favor speed; eg. break the tie by picking the encoding that decodes the fastest. But beyond that if your format has some feedback, the tie is important. For example in LZNib the "divider" value could be dynamic and set by feedback from the previous encoding.

In my LZNib I have "last offset" (repeat match), which is affected by ties.

3. My current decoder is around 800 mb/s on my machine. That's almost half the speed of LZ4 (around 1500 mb/s). I think there are a few things I could do to get a little more speed, but it's never going to get all the way. Presumably the main factor is the large window - LZ4 matches mostly come from L1 and if not then they are in L2. LZNib gets a lot of large offsets, thus more cache misses. It might help to do a lagrangian space-speed thing that picks smaller offsets when they don't hurt too much (certainly for breaking ties). (LZNib is also somewhat more branchy than LZ4 which is the other major source of speed loss)

4. One of the nice things about optimal parsing LZNib is that you can strictly pick the set of matches you need to consider. (and there are also enough choices for the optimal parser to make interesting decisions). Offsets can be sent in 12 bits, 20 bits, 28 bits, etc. so for each offset size you just pick the longest match in that window. (this is in contrast to any entropy-coded scheme where reducing to only a few matches is an approximation that hurts compression, or a fixed-size scheme like LZ4 that doesn't give the optimal parser any choices to make)

5. As usual I'm giving up some compression in the optimal parser by not considering all possible lengths for each match. eg. if you find a match of length 10 you should consider only using 3,4,5... ; I don't do that, I only consider lengths that result in a shorter match length code word. That is a small approximation but helps encoder speed a lot.

6. Since LZNib uses "last offset", the optimal parse is only approximate and that is an unsolved problem. Because big groups of offsets code to the same output size, the choice between those offsets should be made by how useful they are in the future as repeat matches, which is something I'm not doing yet.

10/05/2012

10-05-12 - Picture of the Day

(Animated gifs are so annoying.)

10-05-12 - OodleLZ Encoder Speed Variation with Worker Count

Thought I would look into this. One thing I've been wondering is whether putting workers on the hyper-threads helps or not.

Measured speed on enwik8. This is the slow optimal encoder to give it something to do. enwik8 is encoded by breaking into 4 MB chunks (24 of them). Each chunk gets 4 MB of dictionary overlap precondition. Matches before the overlap are found using the LRM (Long Range Matcher). The LRM is created for the whole file and shared between all chunks.

What we see :

The speed dip from 0 to 1 workers is expected, it's the cost of firing up threads and communication and chunking and such. (0 = synchronous, just encode on the main thread).

My machine has 4 real cores and 8 hyper-cores. From 1-4 workers we see not-quite-linear speedup, but big steps. Once we get into the hyperthreads, the benefit is smaller but I'm still seeing steady speedup, which surprises me a bit, I thought it would flatten out more after 4 workers.

(the wiggle at 7 is probably just a random fluctuation in Windows (some service doing something I didn't ask it to do, you bastards); I only ran this test once so the numbers are not very solid; normally I run 40 trials or so when measuring speeds on Windows).

And here's the Oodle ThreadProfile of the encode showing what's happening all the threads :


(click to zoom)

Of course part of the reason for the not-quite-linear speedup is the gap at the end when not all the workers are busy. You can fix that by using smaller chunks, but it's really not anything to worry too much about. While it does affect the latency of this single "encode enwik8" operation, it doesn't affect throughput of the overall system under multiple workloads.


OodleLZHLW enwik8 compressed size variation with different chunkings :


28,326,489   4 MB chunks - no LRM
27,559,112   4 MB chunks with LRM
27,098,361   8 MB chunks with LRM , 4 matches
26,976,079   16 MB chunks , 4 matches
26,939,463   16 MB chunks , 8 matches
26,939,812   16 MB chunks , 8 matches, with thresholds

In each case the amount of overlap is = the chunk size (it's really overlap that affects the amount of compression). After the first one, all others are with LRM. Note that the effective local dictionary size varies as you parse through a chunk; eg. with 4 MB chunks, you start with 4 MB of overlap, so you have an effective 4 MB local window, as you parse your window effectively grows up to a max of 8 MB, so the end of each chunk is better compressed than the beginning.

My LZHLW optimal parse only considers 4 matches normally; as the overlap gets bigger, that becomes a worse compromise. Part of the problem is how those matches are chosen - I just take the 4 longest matches (and the lowest offset at each unique length). Normally this compromise is okay, you get a decent sampling of matches to choose from; on moderate file sizes the cost from going to infinite to 16 to 4 matches is not that great, but as the dictionary gets bigger, you will sometimes fill all 4 matches with high offsets (because they provide the longest match lengths) and not any low offsets to try.

At 16 MB chunks (+16 overlap = 32 MB total window) it becomes necessary to consider more matches. (in fact there's almost no benefit in going from 8 MB to 16 MB chunks without increasing the number of matches).

I tried adding "thresholds"; requiring that some of the matches found be in certain windows, but it didn't help; that merits more investigation. Intuitively it seems to me that the optimal parser wants to be able to choose between some long high-offset matches and some shorter low-offset matches, so the question is how to provide it a few good selections to consider. I think there's definitely some more win possible in my optimal parser by considering more matches, or by having a better heuristic to choose which matches to consider.

10/04/2012

10-04-12 - Work and Life Patterns

Some random thoughts.

1. I'm pretty sure that people who have "work-life balance" are not actually working. Not by my standard of "work". I see these people sometimes who manage to exercise every day, take a nice relaxing lunch break, stop working to be sweet to their wife or play with their kids. No fucking way those guys are working, you can't put in a solid self-destructing day when you're doing that.

It seems like you should be able to stop and take 30 minutes off for stretching or whatever, but in fact you can't. For one thing, if you are really in deep "work mode", it takes at least an hour to get out of it. Then if you really were working hard, your body and mind are exhausted when you get out so you don't want to do anything. Then when you do go back to work, it takes hours to really get your mind back up to full speed.

The worst are the motivational speaker douchebags who will tell you can get more done if you only work 1 hour a day, or the dot-com-luckboxes who made millions over some trivial bullshit and now think they are business geniuses. I get more done in 5 minutes than you fuckers have done in your entire lives. I don't think you have any concept of what people do when they're actually working.

2. I've been in crazy crunch all summer long, and only in the last few weeks have kind of "hit the wall" where it's become a bit unpleasant. Not just in terms of job work, but also exercising, house work, etc. it's been a summer of work work work, take a break from one kind of work to do another kind of work.

(aside : actually taking a break from work to do other work is wonderful; I find that almost any day on which I do a variety of jobs I'm quite happy; like 6 hours of job work, then a few hours of wood working in the garage, then a few hours of gardening; that's a nice day. Any day that I spend doing all the same work all day is a sucky horrible day; obviously job work all day long is miserable, but so is home improving all day long. I've never been much for socializing or relaxing or whatever you're supposed to do when you're not working, so a lifestyle of hobbies and chores is okay with me.

Sometimes I see these old guys, generally 50-60 or so, wirey leather-hard old guys, who are just always doing something, they built their own house, they're overhauling an old engine, carrying bags of feed; you know they're really miserable inside their own brains which is why they never stop working to just sit and think or talk with the family, but they've found a way to live by just keeping themselves busy all the time. I look at those old guys and think yeah I could see myself getting through life that way.)

Anyhoo, now that fall is rolling in my body & mind want to quit. It occurred to me that this is the ancient Northern European life cycle; when spring rolls around you kick into high gear and take advantage of the long days and work your ass off for a while, then falls rolls around and you retreat into your dens. In the long ago, Northern Europeans actually almost hibernated in the winter; they would sleep for 16+ hours a day, and their heart rates and calorie consumption would drastically lower.

One of the problems with the modern world is that Northern Europeans won. With the advent of artificial light and heat, they can keep that Northern European summer work ethic going all year round. Back in the ancient days if you lived somewhere where you could work year round (eg. the tropics) then of course you took a slower pace. It's a real un-human situation we've gotten ourselves into. The Northern Europeans had to work their asses off in the summer because they didn't have much opportunity; and they had to be really careful uptight jerks, cache their food carefully and repair their shelters and such, because if they didn't they would die in the winter.

To be repetitive : in the ancient days you had the tropical peoples that lived a slower pace year round, and the northern peoples who lived very intensely, but only for the brief summer. What we've got now is basically that intense summer pace of life, but year round.

(as usual I have no idea if this is actually true; but a good parable is much better than factual accuracy).

3. Work life quality is obviously a race for the bottom. Basically capitalism is a pressure against life quality. I suppose the fundamental reason is that productivity is ever increasing (as knowledge increases, the value of each laborer goes down), and population is also increasing. But anyway, it's clear that the fundamental direction of capitalism is towards worse life quality (*). There are two factors I see that resist it : 1. unions , and 2. new fields of work. (or 3. get to the top of the hierarchy)

(* = this is clearly a half baked thought, as there are various ways in which capitalism is a pressure towards better life quality overall. I guess I'm specifically talking about the pressure of competition in a field where the number of people that want to be in it is greater than what's really needed. All fields go through a progression where at first the number of people trying to do it is very small, there are great opportunities in that phase, but at some point it becomes a well known thing to do and then the pressure is towards worse and worse life quality. I'm also normalizing out the general improvement in life quality for all, since human perception also normalizes that out and it doesn't affect our perception of our life quality)

The "race for the bottom" basically works like this : say you have some job that pays decently and gives you decent life quality; someone else sees that and says "hey I'll do the same job but for 90% of the pay" or more often "I'll take the same pay but work 120% of the hours". Because there is excess labor, the life quality for the worker goes down and down.

New areas of work, where there is a relatively small pool of competent labor, is one of the few ways to avoid this. Software has been new enough to be quite nice for some time, but for your standard low-level computer programmer is already no longer so, and it will only get worse as it becomes more mature.

The "race for the bottom" also occurs due to competition. Say you're an independent, maybe you make an archiver like WinPackStuffSmall, if your competition starts working crazy hours adding more and more features, suddenly you have to do the same to compete; then anybody else who wants to get into that business has to work even harder; over time the profit gets smaller and the work conditions get worse. This has certainly happened in games; it's almost impossible to make a competitive game with a small budget in a small amount of time without just killing the employees.

Anyway, I certainly feel it in data compression; there are so many smart guys putting in tons of work on compression for free because it's fun work, that you can't compete unless you really push hard. If you're going for the maximum compression prize and somebody else is putting in just killer work to do all the little things that get you a little more win, you can't compete unless you do it too. Being more efficient or having better ideas wins you a little bit of relaxation, but in the end some grindstone time is inevitable.

4. I really want my cabin in the woods to go off and work. It's too hard for me to try to work and live a normal life at the same time; I'd like to be able to just go out and be alone and eat gruel and code for a week straight.

For a while I was thinking about buying my own piece of land and building a little basic shack. But now that I own a house I'm not so sure. Owning property fucking sucks, it's a constant pain in the ass. (the only thing worse is renting in America, where the landlords have all the rights and are even worse pains in the ass). Sometimes I think that it would be nice to own a piece of mountain land, maybe an orchard, a beach house in the tropics, that that would be a legacy to pass on to my children, to stay in the family, but god damn it's a pain in the ass maintaining properties.

I wish I could find a rental, but I just cannot find anything decent, which is very odd, I know it must be out there. If I went out to my coding shack and I owned it, I would spend much of the time stressed out about the fixes I should be doing to it, at least with a rental I can go "yeah this place sucks but it's not my problem".

I sort of vaguely considered going backpacking-working, but I can't stand working on laptops, and carrying out the standing desk seems a bit difficult. (I said to James when we were backpacking that if I was rich it would be sweet to go backpacking and have a mule team or something carry in a nice bit of kit for you, way back into the inaccessible wilderness, so you could be out there all alone but with a nice supply of non-freeze-dried food (and a keyboard and standing desk) (like an old timey British explorer; have a coolie carry my oak desk into the woods on his back).

I do think the best way for a programmer to work (well, me anyway) is not the steady put in 8 hours every day and plod along. It's take a few weeks off and basically don't work at all, then go heavy crunch for a few months where you just dive in and every thought is about work. It's so much better if you can stay focused on the problem and not have to pop the stack and try to be relaxed and social and such. I'm not fucking relaxed, I can't chit chat with you, I have shit to get done! Unfortunately the periodic lifestyle doesn't work very well with other people in your life. (and mainstream employers expect you to do the crunch part but not the take a break part).

5. I've always thought that the ideal job would be a seasonal/periodic one. Something like being an F1 engineer, or an NFL coach. (NFL coach was my dream job in college; now I think F1 engineer looks mighty attractive; you get the fun of competition, and then you get to go back to your shed and study film and develop strategies and run computer models). There's some phase when you're "on" where you just work like crazy, and then you get a little bit of a break in the off season. (unfortunately, due to the "race to the bottom", the break in these kinds of jobs is disappearing; back in the ancient days they really were seasonal, in the off season everyone would just go relax, but then uptight assholes starting taking the jobs and working year round, and now that's more the norm).

The other awesome thing about F1 engineer or NFL coach is that you get a big feedback moment (eg. "I won" or "I lost") which is very cathartic either way and gives you nice resolution. For me the absolute worst kind of work is the never-ending maintenance; you do some work, and then you do some more; guess what, next year you do some more; there's no real end point. Working on games at least does have that end point (whew, we shipped!) but they're way too far apart to be a nice cyclical lifestyle; you want it once a year, not once every 3-4 years.

I also like the overt competition in those kind of jobs. Real intellectual competition is one of the most fun things in the world; it's what I loved about poker, about going after the most compression, the Netflix prize, etc. It's so cool to see someone else beat you, and you get motivated and try to figure out how they did it, or take the next step yourself and come back and top them. Love that. And you don't have to listen to any dumb fucker's opinion about what the best way is, you go out and prove it in the field of combat; if your ideas are right, you win.

(for quite a while I've been thinking about making my own indie game, solely for the competitive aspect of it; I want to prove I can make a game faster and better than my peers. I really have very little interest in games, for me the "game" is the programming; I want to win at programming. Good lord that is a childish bad reason to make a game. Anyway that part of me is slowly dieing as I get older so the chance of me actually making an indie game declines by the day.)

6. I can be quite happy with a simple lifestyle : work really hard, then exercise hard to release the stress and relax the body, then sleep a lot. It actually feels pretty great. The problem is it's an unstable equilibrium, like a pendulum on its tip. The slightest disturbance sends it toppling down.

Any day you don't get enough sleep, suddenly the work is agony and you don't feel like exercising, and then you carry the stress and it's a disaster. In this lifestyle I feel very productive and healthy, but I'm also very prickly; you have to be quite self-defensive to make it work, you can't let people sap your energy because you are so close to using all the energy you have. You will seem quite unreasonable to others; if someone asks you for a little favor or even just wants to socialize or something; no, sorry I can't do it; I have to work and then I have to go swim or everything is going to come crashing down.

I see a lot of the type A successful douchebag types living this lifestyle, and I've never quite put my finger on it about what makes it so douchey. I suppose part of it is jealousy; somebody who actually manages to put in a hard day of work and then exercise off the stress and have a good evening is something that I am rarely able to do, and I'm jealous of people who pull it off. But part of it is that it is a very self-centered lifestyle; you have to be very selfish to make it work.

7. I certainly am aware that I am using work to avoid life at the moment. I've got a bunch of home improving I need to do and other such shite that I really don't want to deal with, so every morning I wake up and just get straight on the computer to do RAD work so that I don't have to think about any of the other things I should be doing.

Of course that's not unusual. I have quite a few friends/acquaintances around here who very reliably use work to avoid life; they can't do this or that "because they have to work". Bullshit, of course you don't have to work right at that moment, you almost never do, you're just avoiding life. It's not really even a lie; if you think it's a lie it's just because you're listening too literally; they're really just saying "no I don't want to" or "my head is all fucked up right now and it's better if I don't spend time in the normal world".

A few months ago I had a fence put in, and on the day that the guys were doing the layout, I felt like I had to be at the office. Of course they did some kind of fucked up things because I wasn't there to supervise, and of course looking back now I can't even remember why it was I felt like I really had to go to work that day, of course I didn't.

8. The times that I really kill myself working are 1. when a team depends on me; like if I made a bug that's blocking the artists, of course I'll kill myself until it's fixed (and you're an asshole if you don't), 2. when I'm working on something that I kind of am not supposed to be; eg. back when I did VIPM at WildTangent or cube map lights at Oddworld or many things at RAD (LZNib the latest); even if it's sort of within the scope of what I should be doing, if it's not what I would have told myself to do if I was the producer, then I feel guilty about it and try to get it over with as quickly as possible and feel bad about it the whole time. 3. when I'm embarassed; that's maybe the biggest motivator. If I release a product that has bugs, that's embarassing, or if I claim something is the best and then find out it's not, I have to go and kill myself to make it right.

Right now I'm embarassed about how long Oodle has taken to get out, so I'm trying to fix that.

9. There's a kind of mania you can get into when you're working a lot where you stop seeing the forest for the trees. You can dive down a hole, and you just keep doing stuff, you're knocking items off the todo list, but you aren't seeing the big picture. It's like the more you work the more you only see the foreground, the details. You have to stop and take a break to take stock and realize you should move onto a different topic.

Sometimes when you are faced with a mountain of tasks and are kind of overwhelmed about where to start, the best thing is to just pick one and do it, then the next, and eventually you will be done. But that rarely works with code, because there are really an infinite number of tasks, doing each one creates two new ones, so "putting your head down" (as producers love to say) can be non-productive.

old rants