9/29/2008

09-29-08 - 1

There's a certain file IO library for a certain console platform that recently came out which is rather similar to the Oodle stuff I'm working on. I think that's actually good for me, since people who use that will already have set up their game to work with a decent layer, so they can easily switch their calls to talk to me instead, and they'll get platform independence and all the other good stuff I do.

Anyhoo, this certain library is a ridiculously overcomplicated mess of heavy C++. And it's on the least C++ friendly major platform in existence. WTF. The whole advantage of making libraries for a single platform is that you can make things very light and easy to use and tailored to the strengths of that platform.

9/28/2008

09-28-08 - 1

I just don't understand the anger against regulation and big government. People act like regulation killed their mother. Here we are in the middle of the nation's second largest ever financial collapse, caused almost 100% by lack of regulation, and still people are railing about how government needs to be smaller. Taxes have been slashed for the super rich, and 90% of americans are far worse off for it (even though their taxes were slightly cut, the cost to them in inflation and loss of services and the national debt and the failure of our government to take care of the country is worth much more) - and yet still people rail about how taxes are too high. Certainly most of the politicians are simply in the pocket of industries, when they push for deregulation it's just so their donor can make more money at the expensive of the safety and security of the average citizen, but a lot of the voters actually believe this malarkey.

Bill Clinton was on the Daily Show last week, and he was charming and charismatic as ever. He gave a very simple analysis of the problem, which I largely agree with. After the dot com crash, we were basically in a recession. The Fed and GWB tried to block the recession by cutting taxes and slashing the Fed overnight rate to ridiculous lows, pumping the US economy full of tons of cheap credit. Unfortunately that money wasn't invested into anything productive since all our industries were tanking, the only thing profitable they could find to jam it in was real estate. It's easy to be seduced by Bill, but we should remember that he's one of the people behind this mess. The 1999 bank deregulation (also here ) that he passed was the opening of the floodgates for financial irresponsibility. Those regulatations were put in place in the depression because the financial industry had been so reckless and caused that mess, and guess what, you repeal them and they immediately do it again, big surprise! (Clinton's first step was followed up whole heartedly by GWB, who gutted the SEC and just about every other regulatory agency through appointment of people who were incompetent or simply not interested in enforcing the law; this is the real tragic legacy of the Bush administration IMO, not the Iraq war).

Of course nobody's talking about fixing the corruption that would really help. How about opening up the government-sponsored mopolies of the cable and cell networks to true open competition so that we can get some innovation and compete with other countries on tech infrastructure? How about opening up immigration for qualified applicants? Tech companies desperately want to hire more skilled labor; hell, the entire Indian tech boom would be diminished or delayed if we'd just let them come to America the way they all wanted to.

This crisis also brings into focus something I have often written about here : the corruption & giant loop hole of the corporate veil. It's almost like you can call yourself a corporation, then go on a killing spree, then when the cops come looking for you, you go "oh yeah, that corporation really fucked up, unfortunately it went bankrupt and dissolved" and the cops just say okay and go home. You can basically do whatever the fuck you want as a corporation, no laws or code of ethics binds you in any way, because if your malfeasance ever catches up with you, you just go bankrupt and everyone walks away clean. In the mean time you paid yourself huge profits. This is assuming that everyone in charge is smart enough to not link themselves personally to any of the bad deeds or do something stupid with finances to pierce the corporate veil, but that's not a question of how bad their deeds are, it's just a question of how good their lawyers are and how careful they are about working the loop hole.

9/27/2008

09-27-08 - 3

I stumbled on some interesting interviews with Stepanov, the main guy behind the STL, here and here . He's a bit of a nut; there are lots of little funny gems in there, like : "I think that object orientedness is almost as much of a hoax as Artificial Intelligence."

Followup on yesterday's rant : apparently the guy who did the STL sort is Dave Musser; he has an article on his introsort which is apparently the good sort that's in the SGI/STLport STL.

09-27-08 - 2

On LZ and ACB

LZ77 coders are hugely inefficient. Everybody knows the basic idea of LZ77. You compress data using back references to previous data. You either write a match, consisting of an offset & length, or a literal, consisting of one raw character.

The inefficiencies are intuitively obvious. If the coding was perfect, then if you were the decoder, the encoded stream should be totally random, you shouldn't be able to predict the code stream, but of course you can. One way is with context modeling. If you see a context like "what the ", then it's highly likely the next sequence is "fuck" so offsets which point to that are far more likely. Similarly, certain offsets point at certain words, and there are natural likely match lengths for those as well.

Someday soon I'm going to write about how I'm doing "Optimal Parsing". But the very fact that you can do optimal parsing is a sign of the flaw of LZ77. The encoder has many choices of different ways to write the code stream, which means there is massive redundancy in the code space. There are many different LZ77 code streams which all correspond to the same input. The optimal parser tries to pick the best one of these, but it doesn't change the fact that all the ways to encode that you did not choose are just wasted probability space in the coded bits.

Let's look at how you might reduce the redundancy in LZ77 if you for some reason had to do LZ coding and wanted to go as far with it as you can :

First of all, whether or not a match will be found is highly predictable. Most high compression LZ coders do a lot of context modeling for the match flag, because you can tell by looking at the local neighborhood whether it's an area that's likely to match. There are also finite-state patterns that arise in many types of data (particularly binary data), where it may be very likely that the data goes match-literal-match-literal , or mat-lit-lit-mat-lit-lit-mat , or whatever. Apparently LZMA does this.

Modeling the redundancy in the offset and the length are more interesting and difficult. The first thing we see is that they are not really two separate parameters, they are highly correlated, but not in a trivial way. For example, say we simply find the longest possible match and send the {offset,length} in the normal LZ77 way. There were very similar shorter matches that we passed up. When we send just the {offset} , the decoder knows what the match string we want looks like, and it knows that we passed up all the other matches, so this offset must be the longest one. That means it can mostly infer the match length. The decoder gets the offset, that specifies the match string, then the decoder finds the other string in the window that has the longest prefix length with the match string. The actual match must be at least the prefix length + 1. You can then either just output that match, or you could send the amount that the lengths exceeds that.

Let me show an example for clarity :

T A B C X X A B C D T T * A B C D ...

current code pointer is at (*)

encoder send match offset "-6"

decoder sees the match string "ABCDTT"

it looks for the longest other matching prefix and finds "ABCXX" with shared length 3

therefore we must match of at least length 4

decoder outputs "ABCD"
(then possibly also an additional match length starting at 0)

Now, we're still sending the offsets raw, but of course the offset is highly predictable too. Given the current context (or whatever type of model you want to use), you're much more likely to match certain offsets than others. You shouldn't just code the offset as a backward index into the window, you should assign a probability to each. The "ROLZ" and "PMR" LZ coders of today do this to some extent.

(aside : ROLZ = reduced offset LZ; basically means only code matches that share some amount of context, perhaps also MTF the offsets within the contexts; you code many fewer matches, but that's a good thing because ROLZ coders generally use a strong context model for literals; in fact if you use a powerful PPM for literals then the fewer matches you code the better your compression is. PMR = previous match reference, just codes references to previous matches in fewer bits because they are more likely, encoder also favors choosing PMR's over other offsets).

We can code an arbitrary offset with some modeling by reordering them. Make offset 0 be the most likely, up to offset N the least likely. One way to do this is to order by context match length. Say the longest context match is L symbols, then all the offsets that match with a context of length L are first, then all the offsets that match at length (L-1), down to length (0). Within each context, order so that more recently seen & used offsets are lower. So to code an offset, you still find the longest match, then you see where that offset is in the list ordered by likelihood, and output that index. (alternatively you could just context code the offset in the normal context way).

This is very closely related to LZP, Block Sorting, and Fenwick's Symbol Ranking stuff ( for example ).

We can do something similar in a slightly different way by sorting. Consider all previous possible match spots. Sort them by their context string, backwards, like for PPM. Now to match your current string, find the longest match with your forward string. This is the offset you want to send. To send it, find your current context in the sorted list - you'll find the longest match (actually you'll be between two strings in the list). Send the difference between your position in the list and the position of the longest match. Hopefully this delta is usually small, because the best match usually comes from a context similar to yours. The length of match is like before - the offset implicility send the length, you send the additional length after the implicit.

Obviously our goal has been to make the offset and match length both very close to zero. I mentioned block sort and symbol ranking before; the statistics of the codes we're generating here should look very much like those statistics.

Well, holy crap, this is exactly ACB !! (ACB = Associative Coding by Buyanovsky). I found a paper on ACB by Tomas Skopal or you can read this or this on my site.

ACB was never given much attention for a few reasons. 1) George's english is awful and he could never write a decent paper on it, so it never got into academia, his terminology is all non-standard and very confusing, 2) it's way the hell off the optimal curve in terms of speed vs. compression. It's slower than PPM and can't compress as well. It is however, interesting to me in a theoretical sense, because it's sort of like as good as you can do with dictionary compression, and as we see it's extremely similar to LZP, block sorting, and symbol ranking. In fact all of these coders wind up sucking in the same way, because by simply turning your correlation knowledge into an index with low mean, you're giving away a lot of good local information that a plain context modeller could do more with.

09-27-08 - 1

It actually sort of sucks that there are so many super tuned tweaked compressors out there. It's really stifling to innovation. It means if you try something new and interesting, it's almost gauranteed to look bad by comparison because it's not as refined. This is a general problem that I often see in development. Somebody has a stupid O(N^2) sort that they put in assembly and optimized for ages. You try something better like an O(N log N) sort, but the speed numbers in your test application are worse. Well, that's not fair, your new algorithm hasn't gotten all the same tweaking and refinement. But now if I want to compare a bunch of possible solutions, I have to waste a ton of time tweaking them all out. The very existance of that one super optimized possibility really fucks up my entire algorithm search.

The thing that makes this super extra hard is that different algorithms lend themselves to more tweaking. So you really can't just implement them all in a rough high-level fashion and know that you're picking the right thing. In compression for example, you might look at Block Sorting and PPM and say hey these are both very simple elegant high level algorithms, and they perform very comparably. But they are not equal. PPM lends itself to far more tweaking and refinement and tricks and specialization, while with block sorting you're stuck in a dead end where the general solution is all you can do. Of course you see the same thing in speed and optimization.

9/26/2008

09-26-08 - 2

If you search for pages on sorting (real world practical fast sorting), you might find Paul Hsieh's page or this comparison and good implementation or this detailed page with lots of graphs . But all of them are slower than std::sort. Where's the guy who wrote std::sort ? Why doesn't he have a web page? And why aren't there tons of web pages of sorts that beat it? std::sort should certainly NOT be the fastest sort at everything. It's a very very very good implementation, but it's a compromise that trades off reasonable code size and flexibility, and of course sorts are fickle bitches and it should be easy to cook up some test set where your weird sort is better than std::sort. It's kind of like these guys came up with compression algorithms that are worse than pkzip, but they published them anyway, when something better is right there for anyone to use for free.

Of course it's all sort of pointless because if you actually are doing something where sort speed really matters you should use a sort that's customized for your application. For string suffixes I'm currently using Michael Maniscalco's . For floats you can use Pierre Terdiman's radix sort. Hmm.. I just searched for the link and Google tells me the site will harm me. Better wear an e-condom if you go to coder corner. Oh yeah, apparently Michael Herf has an improved version of Pierre's radix, so just go there.

Actually it's not pointless. Having the std::sort as a good baseline is extremely valuable. It's a sanity check that lets you know if you're doing something reasonable. Good libraries like this are extremely valuable *even when you don't use them* because you can test your code against them and know you're not off in lala land. Stuff like dlmalloc and tcmalloc are great for the same reason; I've never actually shipped a product with either, but I consider them invaluable because I can easily link with them and make sure my allocator is at least as good as they are.

09-26-08 - 1

Sarah Silverman is coming back! I heard she was cancelled but she's not, OMG OMG OMG.

Plus "Sunny" is back on. Season 3 was kind of hit & miss, partly because they started using writers other than the original creators, such as in "The Gang Gets Invincible" and "The Gang Gets Taken Hostage" which were very sitcom. Hopefully the new stuff is back on form.

9/25/2008

09-25-08 - 3

Compression's really fun to work. For one thing the algorithms are fun, the theory is complex and mathematical, but the practice involves a lot of trial and error and hacking. It's a nice intersection of science and engineering. And then you have an absolute metric to test yourself on. In a lot of fields I find people going "oh yeah our method for this and that is awesome". Is it? Really? Did you test it? Against the known best algorithms, on the exact same data that they used, with a non-subjective measure of performance? No, you probably didn't, and in fact your method probably sucks bawls and you're just kidding yourself. Compression lets you check your numbers and there's no way to hide.

09-25-08 - 2

I think the obsession with perfecting minutia is very unhealthy and illogical. Everything is so incredibly fucked up that going off and iterating over and over on some little thing is usually not the best bang for your buck. Certainly in game development you usually want to attack your biggest flaws, which are all over in systems that people don't usually spend much time on (UI, load times, controls, etc.) At RAD I'm seeing some of the upside to iterating on minutia. When you make one really great tool, even if it is rather narrow in focus and everything around it still sucks, it means you can use that one tool with confidence and not worry about it sucking.

09-25-08 - 1

I found a cool windows tip . Windows uses the read only flag on *dirs* to mark them as special. If you're like me and you hate all the windows special folder shit, you can disable it easily by doing "attrib /d -r dirname". For example this will make it stop calling the fucking "My Videos" folder in shared documents "Shared Videos". Just show me the real fucking dir name mkay thanks.

9/24/2008

09-24-08 - 2

Fucking upstairs neighbor woke up at 6 and clomped around in the bedroom over my head. I lay in bed clutching the pillow over my ears, dreaming of beating the bloody shit out of him (literally), imagining the pleasant weight of a hammer as I swung it and cracked his skull, I pictured lurking in the stairwell waiting for him to come down and jumping out and pushing him down the stairs. I haven't slept well in days and it's making me depressed. I have no energy, a mild headache, and a sour demeanor. Some days it feels like all I do is fucking drive back and forth to work. I hate all this rushing. I cram breakfast down my throat at 10 AM when I realize I've been working all morning and haven't eaten yet and I need to get to work. At night I get home at 9 and realize I'm starving because I skipped lunch and have to just cram dinner down my throat. I like to take my meals slow, cook, enjoy it. Eating lunch feels like such a huge waste of time, I feel like I have no hours to work if I take a lunch break.

The commute here is really awful. It's almost impossible to avoid traffic. It's trafficy in the morning from about 7 AM to 10 AM. In the evening it's trafficy from 3 PM to 8 PM. The really frustrating thing to me is that it's such a short distance, and there really aren't that many cars on the road. The traffic is 90% just because these people are all fucking retards. They see a car half a mile ahead touch the brakes, so they slam on the brakes, and then everyone behind them slams on the brakes and we get a huge lockup. I hate to be that California guy who's always talking about how it's so much better there, but my god the drivers here are so much worse. Every single merge is a clusterfuck. The people merging don't get up to speed, the people being merged into don't change lanes, and they try to be "courteous" by slamming on the brakes to let the damn slow merger in.

Fuck, this rant tires and bores me.

I'm sick of my energy and vitality being drained by people who are just fucking bores or assholes or morons. I don't want to deal with any of you. I can't teach you, I can't improve you, you fucking suck, get away from me and stop dragging me down. I want a cabin in the woods. Perhaps a yurt on a mining claim. I'll run naked in the grass, feel the sun on my skin, jump in the freezing water, slink through the forest, speak with the deer, and be happy.

The drivers here are like holier-than-thou know-it-all sort of nosey minister's wife think they know best, but also retarded. You get a lot of scenes like when two people pull up to a stop sign and they both sit there and wave for each other to go. That is not courtesy, that's just retarded. And in fact its very discourteous to the people behind you. Somebody just fucking go!

09-24-08 - 1

So here's my screen saver : cbsaver zip 112k ; with source code ; it's crufty as hell but IMO it works pretty nicely. It doesn't do multimon or a lot of other junk screen savers should do, like the preview drawing is just a pink fill. I should also make the transitions actually use MMX or something fast, since like hey I work for the company that does the fastest damn software color junk in the world.

9/23/2008

09-23-08 - 1

So I wrote my own screen saver. I just wanted a fucking decent "My Pictures" screen saver. My media PC runs on an old CRT TV, and those things have this property that if the screen image ever changes really abruptly, they make a weird hiss sound, like electricity crackling. So I tried turning on "transition" in the My Pictures screen saver. LOL. What it does is first completely blacks the screen, and then does a transition effect to bring in the new image. Then when it's time to bring in the next image, pop to black, then some lame effect.

So my screen saver just cycles through your pictures and actually does smooth transitions. And hey, I also made it grab the desktop and use that as the first image to transition from, so it comes on smoothly too. And I wrote a couple little transition effects to start. I'm doing them in software in C so they're slow as hell at the moment (at 1920x1200 or higher). It would be kind of fun to do transitions with D3D but I don't really want to worry about my screensaver using the 3d hardware right in all cases that it can come on.

I was pleased to find that MS actually made it pretty nice and easy to make screensavers. Hey it's just an EXE that you rename to a ".SCR" and you can do whatever you want in there.

Anyhoo, I'll post it in a few days once I get a freaking settings dialog box and such done.

9/22/2008

09-22-08 - 1

It's interesting to me the way H264 is designed. The coder is so simple, it really relies on the good motion compensation + bit allocation to do a good job, and it does very well. What this means is that the when image is easily predicted from motion vectors, you get a very high quality reproduction. When the image does not map to motion vectors, you get noise until it settles down. This is exactly the right way to create error from a human visual point of view. As long as there are still or simple translation elements that our eyes can track, we are very sensitive to error, and H264 does very well in those cases. When something changes in a non-linear way, like it rapidly changes direction randomly, or a new object is created via explosion or expulsion or whatever, then it appears distorted until it becomes linear, but that's exactly when our own visual system sort of fuzzes out the object anyway!

You can see my point better perhaps if you contrast with the opposite way of doing things. Something like motion-jpeg2000 might have roughly the same CPU usage, but it's not doing any motion comp and instead spends that CPU time on a better transform & still image coder. The total error rate is the same (pretend), but the motion-jpeg2000 coder is putting error all over the place, and we'll see horrible wiggles in still spots or simple translating spots.

9/21/2008

09-21-08 - 5

Followups :

Lomont says you can't rely on the behavior of shifts on negative numbers. IMO if you start dealing with really weird platofrms then you need to make wrapper functions for everything and customize them for each platform.

I've been thinking about negatives and two's complement and such. It's one of those things that's common knowledge and all, but it seems new and confusing every time I think about it.

The easiest way for me to understand it is to think about ints as a ring. Unsigned ints are a ring that go from [0,FFFF] and then loop back around. Don't think about the integer line as a straight line, it's a circle like a clock. You count up to 65535 then +1 and you're back at 0. Similarly you can subtract by going backwards on the clock, so 3 - 7 = 65532. Sometimes I get confused when I'm doing subtracts with unsigned values, but of course subtracting unsigned just means walk backwards around the ring.

Now, signed ints are just exactly the same thing, but offset into the middle of the range. Note with signed ints, 0 is part of the "positive" half of the range. It helps to think of 0 as positive, and the actual middle of the range is at -0.5. So the positives are [0,32767] and negative is [-32768,-1]. This is also a ring, it loops around, when you take 32767 and add 1 you get -32768.

Conceptually you could pretend that a signed value is unsigned offset by half, like S = U - 32768. That's not how the bits are actually stored, but it gives you equivalent operations. It makes it obvious that addition and subtraction and so on produce the exact same values, because you're just offset by a constant amount around the ring.

In terms of data compression if you think about the ring, in unsigned ints, 0 and FFFF are actually neighbors, but they look very different bitwise. As an example, we can consider the JPEG-LS or PNG style bytewise delta operations :

D = A - B;
A = D + B;
These are reversible in 8 bits. That's not completely obvious. The range of D is actually [-255,255] , it would seem we've expanded our 8 bit values to 9 bits. But depending on the value of B, not all of that range is possible. A - B is the delta from B, which is the number of steps on the ring from B in the positive direction. If A is less than B, you'll loop all the way around the ring, eg. 3 - 7 = 252 steps around the ring forward. That number of steps is always in [0,255]. To undo the delta you just take the same number of steps. Obviously you don't want to transmit this because it's offset, but
D = A - B + 128;
A = D + B - 128;
is nicely centered. Again the ring wraps when A-B = 127 to 128 you get a bitwise discontinuity, but that's okay because in compression that should be very rare, and those values have equal probability.

Speaking of probabilities, I wrote a little thing before about how to shift and preserve average by making half round down and half round up. It's an exercise for the reader to make that work on negative numbers too (aka, I don't want to do it myself).

Writing with Chris it made me realize that division has a very weird property with negatives. Int division goes to zero (in C). Shifts always go "down" (towards minus infinity (on x86 anyway)). Int division basically takes the abs, does a positive divide, then puts the sign back (it's not actually done this way of course, that's just the concept). Shifts are actual bitwise register shifts, so the top sign bit gets in. What this means is that integer division actually makes you switch directions of rounding at zero. That never really seemed like a big deal to me, but man it's really weird. For example, if you have a bunch of signed ints and take the average, now you divide them all by two in ints - take the average of the new numbers and multiply that up by two. With real numbers you'd have the same value. With positive ints, what you get is everything shifted down towards zero a bit, so the result is less by 0.5. If you had a bunch of numbers whose average was zero before, the average is still zero! Because half went down and half went up!

In terms of probabilities, if you have a bunch of ints scattered evenly over the int line, and you divide them all by two - the probabilities of the result are no longer evenly distributed! The probabilty of a zero is 3/2 the probability of any other value. All values in the result come from 2 values in the source, eg. {7,6} -> 3 , except for zero which comes from 3 values! {-1,0,1} -> 0. This makes it obvious why you can never use division of negatives for wavelet transforms or any kind of reversible mapping, because you are mapping too many values to zero. BTW in a "deadzone" style quantizer we actually do this exact thing on purpose, we make the zero more likely than anything else.

09-21-08 - 4

I just had this idea for embedded unit tests. My big problem with unit tests has always been that they're off in seperate code that's hard to maintain and gets out of date and is too much of a pain to create and people don't do it. Also it's not as valuable as an assert that's right there in the code as documentation when you read it.

Now, I have a COMPILER_ASSERT thing and love that, but it can't call functions, you can only check constant conditions. What if you could do a compiler assert that was right there, something like :


REQUIRE( ilog2ceil(256) == 8 );
REQUIRE( ilog2ceil(11) == 4 );

Now, in the actual compile there would just be a

#define REQUIRE(exp)

that makes those expressions do nothing, but you would run your own custom build step that scans the file for "REQUIRE" and grabs all those expressions and puts them in their own little .cpp file, compiles it and runs it with

#define REQUIRE(exp)	if ( ! (exp) ) { make bad error report, send people email, whatever }

The stuff in REQUIRE might use things in the file, so the easiest thing would be for the autogen to make the test function at the bottom of each .cpp Then call them all from main. The autogen would also have to disable the original main somehow to run its own main. In VC you would make the autogen thing a custom build step as part of your project build, so that a unit test failure would result in a failure to build.

Actually I can do something without autogen that's not bad. With cblib right now I can do :


#define REQUIRE(exp)	AT_STARTUP( ASSERT_RELEASE(exp) )

That executes at startup so it can't make the build fail, but hey I can do it right now without any autogen code madness.

For longer unit tests you can just put them in a function, like :

static bool CheckLotsOfStuff()
{
	.. blah blah ... do unit testy stuff here ...
	return true;
}

REQUIRE( CheckLotsOfStuff() );

09-21-08 - 3

}:<

Fu Manchu emoticon

09-21-08 - 2

I've been noticing recently that everyone seems even more retarded than usual. I just got a hint what the problem might be. Low dose daily Cialis. Everyone's walking around with a halfie and no blood in their brain.

09-21-08 - 1

As a walker and cyclist, I fucking hate hybrids. They creep up on me silently, I pull out to make a left turn and WTF there's a car behind me! I rely on my ears to know when cars are coming up from behind me. They should stick baseball cards in their spokes.

9/20/2008

09-20-08 - 2

Macros are really the fucking devil. It's unfortunate that they still are faster than inline functions. Maybe on MS compilers with LTCG the inline function is pretty close, but most compilers just don't inline like they should. Macros are hard to use, you have no argument info, when you pass in the wrong thing the error messages are impossible, they aren't type safe which can make them fail very mysteriously, you can't trace in with the debugger; I'm finding myself wasting a lot of time debugging stupid things because I'm using the macros wrong. If I never use another macro again in my life it would be a happy life indeed. It would be ideal to just write inline functions and have them act like macros in the compiler.

I really fucking hate unsigned types. I've got unsigned types all over for no good reason at the moment and they keep giving me bugs where I try to count backwards or something, like


for( ; pos>=0 ; pos--)
{
}

Hey I just wrote an infinite loop cuz freaking pos is unsigned and I wasn't thinking about that and I don't want to have to worry about that all the time. IMO it's best to just use signed types unless you specifically need to allow two billion - and as Won points out if you actually are trying to handle two billion in an unsigned int you have to be super careful (eg. never add two numbers that can cause a carry ,etc.).

I really really like parameter checking, but people mostly do it wrong and don't get the point. Checking pointers for NULL, for example, is pretty worthless (unless it's a release-mode check that throws; I'll assume your parameter checking is just an assert). When someone passes in a NULL they will get a very obvious crash, the system provides a nice message for that already, and you can trivially see what caused the problem by going "hey that pointer is NULL". The good parameter checking is for conditions that will cause non-obvious failures that may be hard to debug and may not cause a crash at all, just incorrect operation. For example, things like :


U32 ShiftRight(U32 value, U32 shift)
{
	ASSERT( shift >= 0 && shift < 32 ); // !! 32 is not valid !
	return ( value >> shift );
}

or

void memcpy256(void * dest,void * src,int count)
{
	ASSERT( abs( (int)dest - (int) src ) >= 32 ); // must be >= 256 bits apart !!

	// ... do 256 bit memcpy
}

The other thing people often do wrong in parameter checking is go the opposite way and check too strictly on conditions that are just how they think things should be, but aren't actually necessary for the function to get right. Stuff like :


void Matrix::Transpose()
{
	ASSERT( IsOrthonormal() );

	// ...
}

where some jackass put too strict an assert because he was thinking you wanted the inverse of the matrix, but that's not the only reason to use transpose. Asserts should check for cases that will cause failure, not checks for being in the expected bounds of operation. Another common kind of bogus parameter check I see stuff like :

float lerp(float from,float to,float t)
{
	ASSERT( t >= 0 && t <= 1 );

	return from + (to - from) * t;
}

Umm, no, this parameter check is all wrong. It doesn't prevent any bugs, extrapolating is perfectly valid and will return a reasonable and correct answer.

I also think that result checking is perhaps more important than parameter checking. Especially in cases where a function does something really complex that you're not sure you got right, but is easy to check. For example :


uint ilog2ceilasm(uint val)
{
	__asm
	{
		FILD val
		FSTP val
		mov eax,val
		add eax,0x7FFFFF // 1<<23 - 1
		shr eax,23
		sub eax,127
	}
}

uint ilog2ceil(uint val)
{
	ASSERT( val > 0 );

	uint ret = ilog2ceilasm(val);

	ASSERT( (1 << ret) >= val && val > (1 << (ret-1)) );

	return ret;
}

The result check does more than just make sure I wrote my code right. It also implicitly checks the parameters because invalid parameters will cause the result verification to fail. It also provides a constraint for modifications. It makes it clear what contract this function has made with its clients, so that if you try to optimize in the future you know exactly what you must satisfy, and the assert tells you if you screw up. (obviously we'd love compile time automatic dynamic analysis instead of relying on the actual bad case to run and trigger the assert)

Of course asserts and parameter checking and such should never be used at all on stuff that comes from users or artists. That stuff should be manually checked with code and they should be informed about bounds violation etc. We had a pretty good rule at Oddworld I think - any condition violation that can only be generated by coders errors is an assert and you can halt the game if necessary, any condition violation that can be caused by bad data or artist error should be a log/throw/warning & you try to keep running despite the error.

09-20-08 - 1

Ten years ago people were predicting the end of consumer electronics. The argument basically went like this : computers can do everything - play CD's, be a word processor, be a TV, etc. so eventually you will just have one powerful computer in your house and all the consumer electronics will go away. The idea was that the consumer would rather have one multi-purpose device that can do anything rather than a bunch of single-function devices.

The reality is that powerful complex devices like a multipurpose computer don't work very well. It's hard to make good UI's that are right for all those different functions. It's hard to make absolutely sure that they work right all the time and never crash, because they're so complex. And it's hard to teach your grandma how to use them.

The exact opposite of that old prediction now seems to be the trend of the future - rather than 1 powerful computer, you will still have tons of single-function devices - they will just all be computers! A computer to watch TV, a computer for video games, a computer to browse the web, a computer for business software. They'll be very powerful, but very cheap, with special single-function software that's easy to use and reliable. This is of course already happening.

9/18/2008

09-18-08 - 1

Fucking Eudora 5 handles multiple personalities really badly. When you reply to a mail, it should always use the personality that received that mail as the originator of the reply. It doesn't. Instead the "current personality" is sticky and gets set when you do anything with that personality, including just receiving mail. This is screwing up my new rad personality.

And no I won't use any web mail ever, or any type of mail where they aren't just raw text files on my own computer. I tried going up to Eudora 7 a while ago and there were some things I didn't like but I forget what they were, and I don't recall if they fixed the dumb personality behavior either.

How fucking hard is it to write a decent mail client? It's one of those things I should just do myself. Right after I write a fucking music library browser and a fucking image editor. (yes, I'm looking at you, iTunes and Photoshop).

9/17/2008

09-17-08 - 3

Vector dot products and matrix multiplies need to be accumulated in doubles. This is super obvious and I'm sure all the scientific computing people are going "duh" but I bet 99.9% of game engines don't do this. A matrix multiply is just a bunch of dot products, so let's consider the dot product :

dot = ax * bx + ay * by + az * bz;

or

dot = x + y + z;

since all that really matters here is the accumulation. I'm going to assume that this is done like :
dot = x + y;
dot += z;
though the exact order of summation depends on the language convention and compiler and whatnot. The really bad case happens when 'x' is small, 'y' is large, and 'z' is large and nearly exactly negative 'y'. When you compute (x+y) the precision in x is killed, then you add on 'z' and it cancels y, but you get something that's nowhere near where it should be (in a relative sense), eg. (x+y)+z is way off from x+(y+z).

This may seem rather obscure and arcane but in fact it happens all the time in very common cases. When 'a' and 'b' are two vectors that are mostly in the YZ plane (small x component) and are nearly 90 degrees off each other, this is what happens.

Its also super trivial to fix and the performance cost is near zero on platforms where the native float is double anyway like the PC.

I've had so many problems in the last few years with mathematical code failing due to precision, that I now just use doubles everywhere until I see some very good reason to go to floats. Stuff like the SVD and PCA can very easily lead to total garbage if you don't have enough precision in your temporaries. I learn the lessons of coding over and over. Don't prematurely optimize. Always be extra safe and robust until you have a good reason not to be. And check your results even when you're sure you don't have to. I only know that I have had precision problems because I do things like after I get the singular vectors out of the SVD, I go back through and dot them all against each other and assert that they're actually orthogonal to each other.

09-17-08 - 2

Block-based compressors can be lapped and use fancy coders and all that to compete with wavelets in most every way. But there is a really nice thing about wavelet compression as opposed to block-based. It's the resolution independence. If I take a photo at 1024x1024, and take the same photo at 2048x2048 and compress them both - the wavelet coder has the smaller one as a subset of what it does; it basically just keep doing the same thing, it still codes the LL DPCM band at the same resolution, 32x32 or whatever you chose. In contrast, the block-based coder is now working on very different data. It still works on 8x8 blocks, but they are now half as big in real space, they contain detail at a different scale. The DC coefficient set is now 256x256 instead of 128x128 and the DPCM properties of it are quite different.

One thing this means is that the block based coders that are super tuned (like JPEG,MPEG,H264) are actually dialed in for a certain era and a certain typical image resolution.

09-17-08 - 1

Wow, some wacky shit has been going on here.

A few nights ago a girl came on my front lawn. I had just gotten home from work around 9:30 and was setting up my laptop and stuff when I hear this loud moaning coming from somewhere. I listened to try to hear where it was coming from, and it seemed like it was coming from the window, so I walked over and looked outside. There was a couple on the sidewalk, arms around each other, and the guy had his hand up the girls skirt, vigorously finger-fucking her. She literally collapsed onto the lawn and he fell down with her and kept going at it for a minute or so with her wailing. My front lawn is not large and in no way is it at all private, there are apartments all around and no bushes or any kind of shelter. It was quite surprising.

Tonight I watched my upstairs neighbor buy drugs. I was standing near the front window lifting weights and watching TV. I hear neighbor come pounding down the stairs, and just as he does a car pulls up in front of the building and drives right up onto the curb (our street is only one lane wide, there's nowhere to pull over). Upstairs neighbor gets in the passenger side of the car and they just sit there. I can see very clearly into the car whenever the door is opened because the light comes on, but it's less clear once he closes the door. I see the driver go into the center console and pull out a ziploc bag. Upstairs neighbor rustles around in his pockets. After 30 seconds or so, the passenger door opens again and I see upstairs neighbor making sure what's in his pocket is definitely inside. He gets out and comes back in, car drives off. This explains the coming home at 3 or 4 AM and clomping around on the floor above my brain. I just wish he would get on heroin instead of coke.

9/16/2008

09-16-08 - 1

I've been trying to figure out what's happened in data compression in the last 10 years. Here's my quick run-down :

H.264 : the new video codec; a very simple 4x4 transform used with highly tuned quantizers and predictive coders, and very sophisticated motion estimation and bit allocation. Many different profiles support different bit rates. There are *TONS* of papers on this beast and every aspect has been highly tuned.

PPMii / PPMd / PPMonstr / Durilca : stuff by Dmitry Shkarin. This was a while ago, it's the successor to PPMZ and very good. Some code and papers are available. One of the things that he does that everyone does now is detect data types and do some special munging to make the data more compressible.

PAQ by Matt Mahoney : bitwise coder with lots of context models and weighting, sort of like CTW, but really more like PPMZ. Uses SSE similar to the SEE in PPMZ. Model weighting is well done, uses many types of models such as models with skips. The best compressor at the moment, very slow. Bitwise compressors like PAQ are somewhat easier to write than byte-wise, because you don't have the issue of escapes. Escapes and sparse contexts are really hard to get perfectly right, with bits you always have a 0 and 1, you just need to put a probability on each.

Preprocessors & dictionaries are super common now. Lots of people use a variant of an "LZP" preprocessor, Shkarin and Mahoney both have one. Most of the top compressors will decompress already packed data so they can recompress it, they do an E8E9 transform on exes, and most are conditioned with some training model. (E8E9 transform changes relative jumps to absolute jumps; that way function calls to "strcpy" or whatever always look the same in bytes, so they're much more compressible). Most compressors also detect binary data semi-automatically and use skip-contexts.

ROLZ : aka "reduced offset LZ" ; apparently this is in RKive and many of the Russians are using it, eg. I believe it's in 7-Zip. I haven't found any actual papers or good descriptions of how they do it exactly, but the basic idea is to do something like an LZP to build a list of only a few possible match indexes using context prediction, write a code to select one of those with context modeling.

There are a bunch of big packages with source code, such as 7-Zip, LZMA, FreeArc, BALZ, and I'm sure many more. While this is cool, they seem to be huge messes with little or poor documentation. Holy crap the Russians write really god awful unreadable code. A lot of them look they were intentionally written for an obfuscated C coding contest, stuff like :

m = H(++*p, i & ( -i >> DS )) && *++p | ( dec(m.[(i>>MS)&0xFF) + t );
These are some actual lines of code from Shkarin's PPMD (and seriously, he is definitely one of the cleanest Russian coders) :
    SubRange.HighCount=(SubRange.scale += (SubRange.LowCount=LoCnt));
        psee2c += 2*(2*NumStats < t+NumMasked)+Flags;
        (FoundState=p)->Freq=(HiCnt += 4);  SummFreq += 4;

In the super high lossless compression area, there's a bunch of talk in the Russian forums such as encode.ru . It's super hard to actually find the good information in there though.

Lapped image transforms. Malvar's page at MS is a good place to start on this. Basically improves quality & compression of block based image coding by using basis functions that extend past the blocks. An alternative viewpoint is that a Lifting based smearing filter is run on the blocks at decompress, and the inverse filter is run at compress.

There's some new interesting stuff on blocksort. Jurgen Abel after BWT , and Michael Maniscalco has a bunch of stuff.

For lossless image compression, it seems to be stuff like GLICBALS and MRP , basically still DPCM but with rigorous linear predictor training.

There are some new arithmetic coders. One I found is "rc.c" in the MRP distribution which is called the "Russian Range coder". It's slightly different than the standard Schindler implementation that I use, but seems to preform almost identically. Another one is coder used by PAQ, the best one seems to be "fpaq0p". Again it's a very slightly different binary arithcoder implementation than the one I use but seems to perform almost identically. The big new one is the strange paper by Jarek Duda called "Asymmetric Binary System". The paper is totally unreadable IMO, it's more useful to try to find people's notes on it around the web, such as this .

9/13/2008

09-13-08 - 2

I'm pretty fucking delighted with safeprintf. I forget that I'm even using it most of the time, since I mapped the names "printf" and such to redirect through my safeprintf. And then today I run my app and it says :

safeprintf type error; wanted (int), found (float)
safeprintf err in fmt "got option : quantizer = %d

but then it just keeps rolling like nothing happened!

09-13-08 - 1

All the theory of digital signal processing can get a little silly. People do all this analysis of the aliasing properties and frequency response of these various filters, they come up with these massively complex things like using sinc functions windows by Kaiser-Bessel functions - and then you wind up using it to change an image size by 2X which means you actually only evaluate those functions at 5 discrete values. So it's just a table of 5 floats that looks sort of like a hump.

I'm doing a little lapped image transform. To do a 4x4 DCT you need an 8x8 lap window, but by symmetry and reversibility it's separable to 8 taps, 4 are just mirrored, and half of those are constrained by the mirrored sum = 1 rule, so there are only 2 free floats. So your lap window can be a kaiser or a sin(sin^2) window, or you can just parameterize by two floats and tweak them to maximize PSNR and subjective quality.

9/11/2008

09-11-08 - 3

Seattle's really quite beautiful all summer long; our neighborhood here is delightful, tree lined streets and old houses. There are tons of great parks and we've been tossing the frisbee when I get away from work.

09-11-08 - 2

So I did my first project for RAD, which was using the PCA to find the KLT for color conversions. We're also going to be doing optimal channel decorrelation for audio, and in fact it can be used for other things, such as coordinates in geometry; basically any channels of data that are correlated is interesting to consider.

A comprehensive list of (decent) color conversions for image compression :

General note : color conversions which involve floats should & can be used for lossy image compression, but require a whole pipeline made for floats; many color conversion scale the components so again your quantization system must be aware of that. Lossless color conversion that go int->int are easier to analyze. They generally use lifting of some sort.

YUV or YCbCbr like in JPEG ; this is lossy and crappy.

KLT : matrix multiply by the PCA of the color planes of the image.

fixed KLT : constant KLT matrix optimized on the Kodak image set; you can find this in the FastVDO H.264 paper.

DCT3 : 3-sample DCT. Whoah, when I thought of this it gave me a huge "duh" moment. It's actually a very very good float->float matrix color conversion. Implemented as just a 3x3 matrix multiply. In fact there's a theoretical result that the DCT optimally decorrelates any data which is just Gaussian noise with order-1 correlation, and in fact hey color is very close to that.

YCoCg : both lossless and lossy versions. See Malvar papers at microsoft research or his H.264 submission. BTW this is equivalent to doing Haar[R,B] then Haar[G,B] (the Haar acts in place like {x,y} <- { (x+y)/2, y-x })

CREW aka RCT aka J2K-lossless : older lossless transform; pretty much always worse than YCoCg

FastVDO : another lossless transform proposed for H.264 ; matches the KLT slightly better than YCoCg , but actually usually codes worse than YCoCg.

This last one leads me to a general issue that was somewhat confounding :

Decorrelation is not the same as real world coding performance. That is, the most decorrelating color transform (the KLT) is not the best for coding in most cases. In fact, the KLT was quite poor. I did come up with some heuristic tricks to make a pseudo-KLT that does code quite well.

There's a theoretical measure of "coding gain" and the KLT maximizes that, but when run through real coders it falls down. I'm not sure at this point exactly what's happening. I have some theories; one issue is that the original RGB is not a Gaussian float, it's integers, so things don't behave smoothly; for example, long long ago I wrote on here about how D(Q) is not smooth in the real world, that is the distortion for a given quantizer does not increase monotonically with Q; it has special peaks when Q hits rational numbers, because those values map ints to ints better. All the theoretical literature on rate-distortion is almost garbage because D(Q) and R(Q) are so non-smooth in the real world. My other theories are that the oblique rotations the KLT sometimes takes is essentially making the bottom bit random which is hurting the spatial prediction of later coding stages.

One interesting case for games is compressing images with an alpha channel. In that case, the alpha channel can be losslessly predicted from a linear combination of RGB, which is a very good model of many alpha channels, which leads to them being packed in only a few bytes.

09-11-08 - 1

So my main project for RAD that I'll be on for the next N months is a big file system / data packaging / DVD / paging thing. If you are a game developer and have thoughts on your content pipeline & loading system, I'd like to hear it; particularly if you like your pipeline and think it would be good for me to learn from, or if you have some ideas on how things should be, or if you think it really needs to handle a certain something for you to use it.

9/10/2008

09-10-08 - 1

WTF , there's no fucking perforce option to tell it to NOT FUCKING CHANGE MY LINE ENDINGS !! That's so fucking retarded. Those guys definitely suffer from the deplorable mania of "we know the Right Way and we don't care if 99% of our user base wants something different, we refuse to give it to them". Witness for example their refusal to provide some candy syntax for a rename despite it being the #1 question in the FAQ for lo these last 10 years.

9/09/2008

09-09-08 - 1

Int division takes you toward zero; shifting right takes you toward -infinity :
-1/4 == 0
-1>>2 == -1
-5/4 == -1
-5>>2 == -2

To divide an int by 2 and preserve exact average of a bunch of ints, you need to push the odd values up half the time and down half the time; this is :

( val + ((val>>1)&1) ) >> 1

One way to get rid of signs is to interleave the positive and negative halves of the lines : (0,1,2) -> (0,2,4) and (-1,-2,-3) -> (1,3,5)

if ( val >= 0 ) return val*2;
else return -val*2 - 1;
I wouldn't actually code from this directly, however, what you can do is first code the 0 or not-0 flag, then code the sign bit, which is just (val&1) , then code the magnitude :
if ( positive ) mag = val>>1;
else mag = (val+1)>>1;

To code a value with a Laplacian probability distribution ( P(x) = c * r^x ), simply using a Golomb code with W = average ; this code is :

top = value / W
bottom = value mod W = value - top * W
send top with unary
send bottom raw

9/08/2008

09-08-08 - 3

There are two ways to think about the Haar transform. Haar takes two values {A,B} and makes the average and difference. Naively it seems like this might be impossible to do losslessly in integers because the average has a divide by two. It's crucial that you make average, not sum, because you don't want "bit expansion" (that is, requiring one more bit each time you do a Haar). BTW Haar aka S-transform.

One way is this magic sort of trick with where the LSB goes. You form

L = (A+B)/2 
H = B - A

(all divides floored on integers); how do I get back my original values? The trick is getting the bottom bit right, and the key is that when H is odd it means you need to set a bottom bit on either A or B; if H is positive you should put it on B, if negative put it on A. This means :

A = L + (1-H)/2
B = L + (H+1)/2

An easier way to see it is via the generic lifting/predictor mechanism. You can always create a transform that is reversible by taking a series of predictor steps. One predictor step is :

A -= P(B)
You predict A only using B, and subtract it off; you can invert it of course just by doing +=. To make the Haar transform this way you do :
B -= A;
A += B/2;
To invert you flip the signs and reverse the order :
A -= B/2;
B += A;

P() the lifting predictor can actually be any function, it need not be linear to be reversible, but if we restrict ourselves to linear transforms, then it can be written as a 2x2 matrix multiply :

A = [ 1 X ] [ A ]
B   [ 0 1 ] [ B ]
Any transform like this with 1's on the diagonal and half the matrix zero is a valid lifting transform. This is a "shear". One useful fact is that any rotation can be written as a shear. In particular :
A)
[ c -s ] = [ 1   0 ] [ 1  -sc ] [ c  0  ]
[ s  c ]   [ s/c 1 ] [ 0   1  ] [ 0 1/c ]

B)
t = (c-1)/s
[ c -s ] = [ 1 t ] [ 1 0 ] [ 1 t ]
[ s  c ]   [ 0 1 ] [ s 1 ] [ 0 1 ]
Form A uses a scale first which is not reversible, form B is pure shears. Form B is good except when the sine "s" becomes small, in which case the temp value "t" becomes very large. Even ignoring divergence, a large "t" value is very bad for accuracy and bit expansion. You don't really want to be doing lifting shears with multipliers much bigger than 2.0 ; fortunately, when "s" is small, "c" is very close to 1 (or -1), which means the scale term in form A is nearly identity. In that case we can just approximate it as exactly one and not do the scale step at all. If we drop the scale a more accurate lifting rotation is just :
near s = 0
[ c -s ] ~= [ 1 0 ] [ 1 -s ] (* -1 if c < 0)
[ s  c ]    [ s 1 ] [ 0  1 ]
Using this, any matrix that can be written as rotations (eg. any orthonormal matrix) can be decomposed into lossless integer lifting steps.

There's a beautiful paper by Srinivasan called Modulo Transforms - An Alternative to Lifting . I don't think it's actually very practical because division and mod are so expensive, but it is a very nice explanation of integer transforms. The description of the quantization lattice and volume expansion is the best I've seen.

Let's look at the Haar again. It starts as a 45 degree rotation :

[ sqrt(1/2)  -sqrt(1/2) ] =  sqrt(1/2) * [ 1 -1 ]
[ sqrt(1/2)   sqrt(1/2) ]                [ 1  1 ]
This is orthonormal, which means it's L2 preserving, and BTW it's what the KLT would tell you to do if your data is 100% correlated; that is, the fit is through x = y.

To do this on ints, the scaling factor sqrt(1/2) ruins int->int , so the easiest thing to do is just don't do the scale :

[ 1 -1 ]
[ 1  1 ]
BTW this is called the "Hadamard transform". It's int->int, but it's expanding. It uniformly expands the L2 norm by *2 (the determinant of the matrix is the expansion factor). On ints, this expansion means that half of the output values in the valid range are impossible. In particular the two outputs always have the same bottom bit. In the Srinivasan paper he has some nice pictures of how int transforms give you a lattice of valid values with lots of blanks. Obviously we can get rid of the volume expansion by quantizing the output space. This just means dividing one of the two outputs by 2.

This is where it gets subtle. We can divide either of the two outputs by 2. Both choices eliminate the volume expansion, but neither choice is L2 preserving, and the L2 norm of the two choices is different.

V(x) = < x^2 > is the variance of x assuming zero mean
< y > means average of y

The Hadamard expansion is :

V(a+b) + V(a-b) = 2 * ( V(a) + V(b) )

The normal Haar (quantizing the sum) gives variance :

V( (a+b)/2 ) + V( a-b ) = 5/4 * ( V(a) + V(b) ) - 3/2 * < a*b >

Quantizing the difference gives :

V(a+b) + V( (a-b)/2 ) = 5/4 * ( V(a) + V(b) ) + 3/2 * < a*b >
Note that quantizing the difference is the same as negating one of the values before doing the Haar.

You can see the Haar transform is not variance preserving. It reduces the variance if the correlation between the values is high enough, in particular :

Haar reduces variance if :

< a*b >  >=  (1/6) ( V(a) + V(b) )

This is interesting; the Haar may actually be better than the original 45 degree rotation it approximates. There is some implicit data modelling going on in the Haar; like most wavelets it is not norm preserving, not orthonormal, and it behaves well (reduces norm) when the input data fits the predicted pattern.

Addendum : I found another nice paper ( PL Haar PDF ), on something called the "PL Haar" (Piecewise Linear Haar) ; the PLHaar transform is itself interesting, but aside from that they go over the same points I'm talking about here re: the Haar and range expansion, etc. , and they have some nice pictures.

BTW you can see the implicit modeling I'm talking about in the PL Haar very neatly as well. By design, the PLHaar preserves Linf norm, not L2 norm. L2 norm is (roughly) what matters for coding. In the square domain, stuff in the corners has the max L2 norm, and stuff along the sides has the lower L2 norm. By doing the PL 45 degree rotation, PLHaar takes the large L2 norm stuff along the x=y lines and rotates them to the sides of the matrix, giving them the lower norm.

09-08-08 - 2

Some things naive wavelet coders get wrong :

1. Extra truncations that just throw away data. One way to say this is they basically do math in floats and then convert to int, and do it multiple times. Often they do their math "in ints" and right shift away extra bits, which is the same mistake. The easiest way to get this right is just to read in your input image to floats, do your color transform, your wavelet transform, all in floats. Then when you quantize, quantize float->int and do your coding on ints. (of course you could also just code the floats, instead of doing bit masks for bit plane coding you check if value >= thresh).

2. Quantizing wrong. Quantizing is pretty simple if you think about it in terms of buckets and medians. You're taking a certain range and labeling that with an int. When you dequantize back to floats, you should restore to the middle of the range. (actually you should restore to the most likely value in the range, which might not be the middle in a typical skewed distribution, but that's a very minor detail). Many wavelet quantizers intentionally use an ESZZ (Extra Size Zero Zone) quantizer, which is an artificial way of getting more zeros, which is generally a win because the coder likes lots of zeros - but an ESZZ is wrong on the DC LL band !. When you have zeros you don't need to send sign bits. One thing people frequently get wrong is handling negatives wrong; you have to be careful about trying to divide negatives and do float-to-ints and all that; I find it easiest to take the sign off and do my math then put the sign back.

3. Not accounting for norm scaling. If all your transforms were orthonormal you wouldn't have to do this, but most wavelet transforms are not. What that means is that a value of 1 in the LL and a value of 1 in the HH do not have the same L2 norm after untransforming. That means you can't quantize them with the same value, or use the same bitplane truncation. Quantizers need to be scaled, as do any rate-distortion functions.

4. Not sending the most important bits first (assuming you want an "embedded" aka truncatable stream, which you do). Obviously when you are planning to chop off the end of your data you need to sort it by importance. To really get this right you need to be able to shuffle in different bits. You want to send the Y LL first, then the U LL, V LL, then the Y LH, etc. but not necessarilly in a fixed order; in some images you might want to send the whole Y band before you send any U. That's global level stuff that even value-based coders should do. In the extreme case you can do things like the bit-plane coder EBCOT that splits the bits of a row into chunks to try to get the most important bits first in the stream. Note that each bit of the compressed stream always has the same amount of information; what we want to do is put the bits that have the biggest L2 norm significance first; that is to say, bits that wind up affecting a large region of the image with large value changes.

09-08-08 - 1

Some further little notes about PCA :

The primary principal component is just a linear fit to the data; then the other axes are perp and point in the next primary axes of variance. This is why a lot of people heuristically think of it as the "best fit" to the data.

For doing 2-means or BSP splitting, the primary principal component of course tells you the axis to split along. To find the best BSP plane, you can search along that axis in O(NlogN) by sorting all the data by their dot along that axis and trying the plane between each consecutive pair of points. Since you're incrementally stepping you can just increment to count the number of points on each side, as well as the mean & sdev of points of each side.

You should remove the means to calculate the covariance, but of course you don't have to remove the means to transform the data. The mean is just multiplied by the basis transform and becomes the mean in the new coordinate system. If you do remove the mean, then of course the mean in the new space is also zero.

You should *NOT* divide by sdevs in the covariance. The true "correlation" does divide by sdevs to give you a relative correlation measure that's scaled to [-1,1], where 1 = 100% correlation. The "covariance" without the sdevs divided out will have units of [value]^2 , just like the variance. A good thing to display is the *root* of the covariance, since that's in [value] units. I actually find root-covariance a more useful measure than the true correlation, because it tells you how significant the value correlation is in the original scale, and it won't exaggerate up small wiggles; obviously this depends on the problem you are solving. eg. [4040404] [1010101] has a root-covariance of 2, a correlation of 1.0 , while [808080808] [80808080] has a root covariance of 8.

9/06/2008

09-06-08 - 1

I've been working way too hard; I'm having trouble getting anything done, partly because I'm really uncomfortable in the new coding environment, and it makes me want to just work harder to get past the problems. I have this personality flaw where when I'm focused on some problem, I just can't stop thinking about it. I wake up first thing in the morning and todo lists start popping into my head and I feel antsy until I start doing them. When I finally decide to quit working for the day, my brain is fried, I'm exhausted, I can hardly manage a grunt hello for my sweet girlfriend, and then the coding problems start going around in my head again. It's never pressure from the job that does this, it's the obsessive in me.

I miss the creature comforts of home at the office. The home is really a lovely place to be, I've got natural light, wood floors, windows that actually open for fresh air, and also crucially a kitchen. I miss being able to stop coding and go cook myself something. I've found myself feeling miserable at the office, and I think it's largely the light and air. I'm very sensitive to environments; of course it's my own fault for staying inside there too much.

I think the ideal work situation for me would be lounging in a lush courtyard like some greek philosopher, snacking on olives and almonds in the warmth and fresh air, chatting with other wise men, tended by attractive youths, and occasionally doling out words of wisdom to customers.

Universities have pretty sweet offices in the older buildings, courtyards or lawns around, lots of trees and windows, cozy rooms for the offices with lots of brick and wood and windows that open; they feel warm and cozy, where you could hole up for hours and puff on a pipe and think about general relativity.

9/05/2008

09-05-08 - 2

Anybody out there have some large lossless test images? I'm having trouble finding good ones; all the old test images are like 512x512 and not really representative of modern content. Maybe some of you photographers that take RAW format photos could send me some.

Ah, Won pointed out to me that Google has a filetype thing in image search so you can ask for big PNG's.

09-05-08 - 1

URGH why in the fuck is the windows Task Manager not always resident!?!? An OS should have a tiny super-stable core that maybe even lives in ROM. Then there should be a next super minimal layer that can run very simple apps, and stuff like a light task manager should live in that layer. Then all the complicated stuff can get built on that, and the complicated stuff can crash and be all kinds of mess and never take down the little layer that lets you still control your machine.

9/04/2008

09-04-08 - 3

I can't get Visual Assist X to behave like Visual Assist .NET ; it keeps wanting to either not do completion or do this god-awful fucking listbox popup thing. For the moment I've reverted back to using the old version, but if there's a VAX user out there who can help me, please do.

09-04-08 - 2

P4SCC :

Known Problems

      (Bug #14194) 
      Error adding files already under source control to project: 
      "chmod: : 
      The system cannot find the file specified."

      Workaround: Quit and relaunch the IDE, open the project,
      and revert the file, then retry.

Translation : "our shit is totally fucking broken, but we're gonna release it anyway. Deal with it."

09-04-08 - 1

Yay, our media machine is sort of working. I bailed on the LCD and am just giving up on TV for now, I'll keep using my TiVo for what tiny bit of real TV I watch. But I have video out and the music library interface of MediaPortal is pretty okay.

MediaPortal has a really nice crossfade when you fuck around with music; it's so obviously the right thing to do, like when you skip or mute or anything it's not abrupt. The default setting is like 4 seconds which is retarded awful laggy, but you can turn that down to like 0.5 seconds and it's nice.

The LEDs are retardedly bright; I need to just unplug them. All this makes me appreciate my old analog amp even more. It's got nice subtle LEDs. It's got a heavy smooth big knob for the volume. All the buttons are switches that actually flip so you can tell which way they're set and they feel solid. And there's never any incompatibility problem. Audio comes in on wires. Audio goes out on wires. That's it.

9/03/2008

09-03-08 - 3

"At work" is like the opposite of "in bed" ; you can stick it on any phrase and it instantly becomes uninteresting and unappealing. Like :

"I have some good ideas of things to do" ... at work.

You get excited for a second, hey he has some good ideas, oh, it's at work, bah, boring.

"Something funny happened today" ... at work.

Oh dear god please no more work stories they are not funny.

09-03-08 - 2

I'm sitting at home waiting for the traffic on the 520 to die down. I've been ready to work for an hour. This fucking sucks. I like to wake up in the morning and start working right away.

09-03-08 - 1

WTF, Yelp just closed my account. As best I can figure out, I wrote some negative reviews of places, and the people who run those businesses went on a Flag war flagging me like crazy to get the reviews removed and my account banned. I'm torn between just saying "fuck them" and staying away, or making a bunch of fake accounts and messing with them.

9/02/2008

09-02-08 - 1

We were talking about SVD and PCA and KLT and such yesterday and it made me realize I'm still a little fuzzy on that stuff and it could use a really simple explanation. They're actually all exactly the same thing, and they're also closely related to eigen decomposition. Basically the PCA (Principal Component Analysis) or SVD (Singular Value Decomposition) gives you a new coordinate space (aka a set of basis vectors) that decorrelate the input data and concentrate as much energy as possible on the primary axes. If you like, they find the subset of the space that your data primarily spans, and the magnitudes tell you to what extent your data really needs all the extra dimensions. If you prefer to think in terms of transform theory, they construct the best possible linear matrix transform for decorrelating and concentrating data (like what the DCT does for JPEG). The KLT is just the name of the transform when you use the PCA vectors as your matrix transform.

As usual the Wikipedia pages are good. I'm going to be a bit sloppy with my maths and just try to hand wave a bit so you get the concept.

After I posted this Ignacio pointed me at a very cool tutorial by Jon Shlens : his page contains that and some other very cool stuff; it's a good tutorial, similar to mine but more correct and it's much more readable since it's PDF and all that.

There are two ways of thinking about this. One is you have a bunch of K-element vectors, N of them, it's like a bunch of samples of an experiment; you can of course put them together in a KxN matrix; in this mode you are usually looking for K-element basis vectors. The other way to think about it is just that you have some KxN matrix and you wish to find an approximation of it. In either case we'll call this matrix of KxN things M.

You need K basis vectors to span your K element vectors, of course. You will also get K N-element "cobasis" vectors. Your basis transform on your vectors will be a KxK matrix. Obviously the identity matrix is no transform and it corresponds to using a basis like {1000},{0100},{0010},{0001}. That is, instead of thinking of your data element as {a,b,c} think of it as a*(100) + b*(010) + c*(001). That original basis is not necessarily the best, or even the "right" one.

We can think about this in terms of data compression. For example maybe you have a sound file with K tracks and N samples in each track. If you just compress the tracks one by one in that space, there's a lot of correlation between the tracks you're wasting. What you want is to find a different basis in which all the same parts of the tracks are mashed together and each track only has the new unique bits of that track. Then, for example if two tracks are exactly the same, a whole track will be zeros. In general you can model any linear relationship between tracks with a linear transform like a matrix multiply, eg. stuff like D = A + B - C can be completely modeled out.

The best possible linear basis transform is the PCA. The PCA finds the "principal components" of the set. These basis functions are orthogonal and completely decorrelate the dimensions (in the strict linear correlation sense; more on this later). We can figure out the PCA from this :

The "covariance" of dimension i with dimension j is :

Cij = Sum(n to N) { M'in * M'jn }

(M' = M with the mean of each dimension subtracted off, note this is almost the "correlation" but the sdevs are not divided out). I'm going to drop the prime on the M and just assume that you took out the mean from now, but make sure you do that.

Of course we can just write this same thing in matrix notation :

C = M'^T * M'

^T = Tranpose

The diagonal elegents of C are the variances of the elements (the true correlation with yourself is 1). C is symmetric, and the off-axis elements tell you the correlations of one dimension with another. We want a basis transform that makes the cross-correlations zero. That is we want to find :

D = B * C * B^T

where D is some diagonal matrix and B is a basis transform.

For those who know something about eigenvectors you can see that the basis transform we want is just the eigenvectors of C.

The eigenvector has the property (by definition) that

C * e_i = k_i * e_i

That is, multiplying by C gives you the same vector scaled by some number. That is, C does not rotate the eigenvector at all, it stays in the same direction. That means if the eigenvectors were our basis vectors, then multiplying by C on any vector would only scale the coordinates in each dimension, it would not ever munge one coordinate into another.

Any matrix can always be written as the outer product of a spanning set of vectors. That is :

C = Sum{ij} Cij * |i> * where |i> is the ith basis vector and is the jth basis vector transposed.

If we write it in terms of the eigenvectors, then C must be diagonal because we know C turns e_i into e_i , thus all the off-diagonal terms must be zero, so

C = Sum{i} k_i * |e_i> * Since = delta_ij , that is they are orthonormal. The basis matrix B that we wanted is just all the eigenvectors stacked up, and the diagonal matrix D, that we get, is just the eigenvalues down the diagonal.

So, we see that the eigenvectors of C are a diagonal representation of C, which means that the non-self correlations are zero, which is exactly what we want, which means this B is exactly the basis transform we want for PCA.

Now, the eigenvalues also tell us how much energy is in each axis. If you look back at the definition of the correlation matrix, you see the diagonal terms are just the L2 norm of that dimension. Now if we look at C in terms of eigenvectors we can see the eigenvalues of C are just equal to the L2 norms of the data in that direction. If we sort the eigenvalues so k0 is the biggest, k1 is the next biggest, etc. this is the PCA.

Now let me look at the exact same thing a completely different way in terms of successive approximation.

You want to construct basis vectors which are orthogonal and decorrelating. You want the 0th to have most of the signal energy, then the 1st the next most, etc. You can do this just by learning each one one by one.

First find a basis vector b such that the linear correlation of b to the vectors in M is maximized. That's actually the same as minimizing the least square error of a fit with arbitrary coefficient :

Err = Sum{i in K,n in N} ( (Min - an * bi)^2 }

for some vector a ; b is a basis vector that has K element, a is a "cobasis" that has N elements. Either you can think about this as trying to find b and the a's are just arbitrary coefficients, or you can think of it as trying to approximate the matrix with an outer product of two vectors.

This gives your first b. Then take |a> The result that we get is a representation of M in terms of a sequence of outer products :

M = Sum{i} |ai> * The a are N-vectors and the b are K-vectors, our bases. If we normalize the a and b and pull out their lengths this is :

M = Sum{i} si * |a'i> * or M = A * S * B

Where S is a diagonal matrix of si values, A is NxK and B is KxK. Now C the correlation matrix = M^T * M = B^T * S * A^T*A * A * B = B^T * S*S * B

This is exactly like our eigenvectors decomposition, except we have S*S instead of the eigenvalues. Well they must be the same thing. So these B that we found by successive fitting are in fact exactly the same as the eigenvectors of the correlation.

Well this successive fitting is just the SVD, the Singular Value Decomposition. The S are the singular values - and we see now that the S are the square root of the eigenvalues of the correlation matrix. Or rather, if you square the singular values you get the L2 norm of the data in each basis, and they are sorted from largest to smallest.

What we have found is a K -> K orthogonal basis transform that takes us to a coordinate space where the dimensions are linearly decorrelated and as much energy (in an L2) sense is in the lowest dimensions.

This is often done as a way of reducing the number of dimensions you have to deal with. You can see how much L2 energy is in each dimension, and just say I'll use enough dimensions to capture 99% of the energy and just ignore the rest. It also lets you see when you have redundant dimensions. If one dimension can be written as a linear combo of another dimension, that's equivalent to saying you actually have lower dimensional data embedded in higher dimension space, like your data is actually planar but it's embedded in 3d space. The PCA vectors will be two primary vectors in the plane of the data, and one vector with very low energy orthogonal to them.

When this is done in data compression, it's often called the Karhunen-Lo�ve transform (KLT). PCA/SVD/KLT is obviously good for data compression not just because it docorrelates (linearly), but also because it puts as much energy as possible up front and then you can just truncate the stream.

The KLT is orthonormal, which means it's just a rotation. Orthonormal transforms are not required for reversibility; obviously any matrix that has an inverse is a reversible transform. The nice thing about Orthonormal is that the L2 norm of the data is preserved, which means that the magnitudes of the transformed data is in a space you can understand and compare to untransformed norms. That means you can measure the distortion in transformed space, you can quantize in transformed space, etc. and not have to worry about how that scales to the original space. In fact Wavelet transforms are not orthogonal, but are reversible, and the "short time fourier transform" that's sometimes used for audio compression is not even reversible the way it's often implemented. I should also note that if your input data is in integers, an arbitrary matrix may not map ints to ints and thus not be reversible. However, since the KLT is orthonormal it can be turned into a lifting transform that does map ints to ints, at the cost of not exactly decorrelating. (actually, most Wavelet transform implementations are not norm-preserving, which is fine, but the vast majority of implementations that I've seen screw up and fail to account for that when doing quantization and measuring error).

It's interesting to compare the KLT to something like the Fourier transform. The KLT has the property that the best subset of the data is the prefix. That is, the best possible 4 value representation is just the first 4 values, not ever the 5th or the 6th. (this is true over the whole set that the KLT was made for, not necessarilly true on any one given sample).

The Fourier Transform, or more correctly the DFT, is also a basis transform that can be written as a matrix transform. (we don't actually implement the DFT as a matrix transform because a matrix multiply is N^2 and the DFT can be done in NlogN due to the magic of the FFT). The Fourier Transform also has the optimal prefix property (called the "least squares property") - that is the first K terms of the DFT are the best possible K terms of any trigonometric transform (that is, no later terms could be substituted). This is very useful because it means we can truncate the terms correctly, the first ones are always the most important. However, unlike the KLT, the DFT is not necessarilly fully decorrelating, and that means that the energy in the lower dimensions is not maximized.

You could imagine taking a ton of images and gathering all the runs of 8 pixels into samples, so you have K=8 and N samples. Do the PCA to find the best spanning 8-vectors and use these instead of the DCT in something like JPEG. What you would find is that your PCA vectors would look very much like the DCT basis vectors. And you broke the magic NlogN fastness of the FFT.

BTW people in video games have long used the eigenvectors of the correlation matrix (aka the PCA) as basis vectors to find bounding volumes around points. This is complete crap. The PCA has absolutely zero connection to what axes would make good bounding volume axes. There are established very good algorithms for fitting bounding volumes that work in totally different ways.

Why don't people use KLT transforms more in data compression? Well, for one thing, you have to transmit your basis. If you use something like the DCT the basis is implicit, you're assuming some knowledge about the type of data you're working on which saves you bits in the stream. Also obviously it's slower, though the slowness on encode isn't as big a deal as the slowness on decode (there are lots of applications where slow encode would be fine). The main reason is that getting your transform perfect is actually not that important to modern data compression. If you are using a fancy quantizer and a fancy context coder on the back end, then mistakes in the transform can largely be hidden. For example, if your transform is not perfectly decorrelating, your entropy coder can largely model any correlation that's left over and hide that mistake. The simpler your back end is, the more your transform matters.

09-02-08 - 2

I want to set up a server at home so I can get to my home machines from work. It's somewhat complicated by the fact that I don't have static IP's at home (though I do have an IP at cbloom.com that maybe I could redirect to my home machines somehow).

There seem to be a bunch of remote control apps that are sort of intended for this. GoToMyPC is the old craptastic one; the newer ones are LogMeIn, WebEx PCNow, and BeAnywhere.

Really I don't want a custom web interface; I want to hook up to my home machines over regular Windows file sharing. I guess I could have my home machines dial in to the VPN before I leave then I should be able to get to them, but that messes up the network at home and relies on me making sure I put them in the VPN whenever I want them.

9/01/2008

09-01-08 - 2

GLICBAWLS is a really sweet little lossless image compression by B. Meyer (the guy who made TMW). It's a beautiful, simple, elegant paper (get the PDF here ) ; the only thing I don't like about is the way he introduces the one-over-sigma weighting at the very end as if it's no big deal. It uses a clever running sum that graphics people should be very familiar with. I'm still not sure if it should be pronounced "glick balls" or "gee lick balls".

Holy fucking crap. g-lick-balls is not only one of the sweetest and most obviously right little compression algorithms ever, with the best acronym, it also has the absolute coolest source code . The original web page and PDF are gone from the net now, but fortunately these pieces remain. The hint text that went with the source code also contains some tidbits that weren't in the PDF article.

One of the other major lossless image compression algorithms is RkIm, part of the Rkive suite by my old compadre Malcolm Taylor. Malcolm was always a bit of a troubled weird character, and he seems to have disappeared off the face of the earth. His page is gone and I can't find any information about him or his work around on the net any more. Malcolm made the original Rkive right around the same time I wrote the original PPMZ; we were working in very similar territory, we had the top two text compressors in the world for a long time, he helped me work out many of the details of PPMZ and he made the best implementation of it. He was just a hobbyist, not employed, and I invited him to come out to Wild Tangent to interview when I worked there. He came out and was kind of weird, and then disappeared; he never replied to any followup emails or calls. I wonder what's become of him.

09-01-08 - 1

I went in for my first day of work today. Ugh, work is tiring. I fell right back into bad work habits : I got really thirsty and had to pee really bad, and rather than get up and get a drink and pee, I kept sitting there working away at the bastard computer. On the plus side, the work situation is really good and I'm excited to do some good coding.

I set up a VPN to RAD on my laptop, and I've been having this problem that the fucking VPN connection dialog randomly pops up when I'm home browsing. Fucking Windows. This happens because VPN is treated like a dial up adapter, and Windows has junk in it to automatically dial your dialup connection whenever you're disconnected. The best fix is to go to Control Panel -> Internet Options -> Connections and select "Never Dial a Connection".

old rants