Anyhoo, this certain library is a ridiculously overcomplicated mess of heavy C++. And it's on the least C++ friendly major platform in existence. WTF. The whole advantage of making libraries for a single platform is that you can make things very light and easy to use and tailored to the strengths of that platform.
9/29/2008
09-29-08 - 1
9/28/2008
09-28-08 - 1
Bill Clinton was on the Daily Show last week, and he was charming and charismatic as ever. He gave a very simple analysis of the problem, which I largely agree with. After the dot com crash, we were basically in a recession. The Fed and GWB tried to block the recession by cutting taxes and slashing the Fed overnight rate to ridiculous lows, pumping the US economy full of tons of cheap credit. Unfortunately that money wasn't invested into anything productive since all our industries were tanking, the only thing profitable they could find to jam it in was real estate. It's easy to be seduced by Bill, but we should remember that he's one of the people behind this mess. The 1999 bank deregulation (also here ) that he passed was the opening of the floodgates for financial irresponsibility. Those regulatations were put in place in the depression because the financial industry had been so reckless and caused that mess, and guess what, you repeal them and they immediately do it again, big surprise! (Clinton's first step was followed up whole heartedly by GWB, who gutted the SEC and just about every other regulatory agency through appointment of people who were incompetent or simply not interested in enforcing the law; this is the real tragic legacy of the Bush administration IMO, not the Iraq war).
Of course nobody's talking about fixing the corruption that would really help. How about opening up the government-sponsored mopolies of the cable and cell networks to true open competition so that we can get some innovation and compete with other countries on tech infrastructure? How about opening up immigration for qualified applicants? Tech companies desperately want to hire more skilled labor; hell, the entire Indian tech boom would be diminished or delayed if we'd just let them come to America the way they all wanted to.
This crisis also brings into focus something I have often written about here : the corruption & giant loop hole of the corporate veil. It's almost like you can call yourself a corporation, then go on a killing spree, then when the cops come looking for you, you go "oh yeah, that corporation really fucked up, unfortunately it went bankrupt and dissolved" and the cops just say okay and go home. You can basically do whatever the fuck you want as a corporation, no laws or code of ethics binds you in any way, because if your malfeasance ever catches up with you, you just go bankrupt and everyone walks away clean. In the mean time you paid yourself huge profits. This is assuming that everyone in charge is smart enough to not link themselves personally to any of the bad deeds or do something stupid with finances to pierce the corporate veil, but that's not a question of how bad their deeds are, it's just a question of how good their lawyers are and how careful they are about working the loop hole.
9/27/2008
09-27-08 - 3
Followup on yesterday's rant : apparently the guy who did the STL sort is Dave Musser; he has an article on his introsort which is apparently the good sort that's in the SGI/STLport STL.
09-27-08 - 2
LZ77 coders are hugely inefficient. Everybody knows the basic idea of LZ77. You compress data using back references to previous data. You either write a match, consisting of an offset & length, or a literal, consisting of one raw character.
The inefficiencies are intuitively obvious. If the coding was perfect, then if you were the decoder, the encoded stream should be totally random, you shouldn't be able to predict the code stream, but of course you can. One way is with context modeling. If you see a context like "what the ", then it's highly likely the next sequence is "fuck" so offsets which point to that are far more likely. Similarly, certain offsets point at certain words, and there are natural likely match lengths for those as well.
Someday soon I'm going to write about how I'm doing "Optimal Parsing". But the very fact that you can do optimal parsing is a sign of the flaw of LZ77. The encoder has many choices of different ways to write the code stream, which means there is massive redundancy in the code space. There are many different LZ77 code streams which all correspond to the same input. The optimal parser tries to pick the best one of these, but it doesn't change the fact that all the ways to encode that you did not choose are just wasted probability space in the coded bits.
Let's look at how you might reduce the redundancy in LZ77 if you for some reason had to do LZ coding and wanted to go as far with it as you can :
First of all, whether or not a match will be found is highly predictable. Most high compression LZ coders do a lot of context modeling for the match flag, because you can tell by looking at the local neighborhood whether it's an area that's likely to match. There are also finite-state patterns that arise in many types of data (particularly binary data), where it may be very likely that the data goes match-literal-match-literal , or mat-lit-lit-mat-lit-lit-mat , or whatever. Apparently LZMA does this.
Modeling the redundancy in the offset and the length are more interesting and difficult. The first thing we see is that they are not really two separate parameters, they are highly correlated, but not in a trivial way. For example, say we simply find the longest possible match and send the {offset,length} in the normal LZ77 way. There were very similar shorter matches that we passed up. When we send just the {offset} , the decoder knows what the match string we want looks like, and it knows that we passed up all the other matches, so this offset must be the longest one. That means it can mostly infer the match length. The decoder gets the offset, that specifies the match string, then the decoder finds the other string in the window that has the longest prefix length with the match string. The actual match must be at least the prefix length + 1. You can then either just output that match, or you could send the amount that the lengths exceeds that.
Let me show an example for clarity :
T A B C X X A B C D T T * A B C D ... current code pointer is at (*) encoder send match offset "-6" decoder sees the match string "ABCDTT" it looks for the longest other matching prefix and finds "ABCXX" with shared length 3 therefore we must match of at least length 4 decoder outputs "ABCD" (then possibly also an additional match length starting at 0)
Now, we're still sending the offsets raw, but of course the offset is highly predictable too. Given the current context (or whatever type of model you want to use), you're much more likely to match certain offsets than others. You shouldn't just code the offset as a backward index into the window, you should assign a probability to each. The "ROLZ" and "PMR" LZ coders of today do this to some extent.
(aside : ROLZ = reduced offset LZ; basically means only code matches that share some amount of context, perhaps also MTF the offsets within the contexts; you code many fewer matches, but that's a good thing because ROLZ coders generally use a strong context model for literals; in fact if you use a powerful PPM for literals then the fewer matches you code the better your compression is. PMR = previous match reference, just codes references to previous matches in fewer bits because they are more likely, encoder also favors choosing PMR's over other offsets).
We can code an arbitrary offset with some modeling by reordering them. Make offset 0 be the most likely, up to offset N the least likely. One way to do this is to order by context match length. Say the longest context match is L symbols, then all the offsets that match with a context of length L are first, then all the offsets that match at length (L-1), down to length (0). Within each context, order so that more recently seen & used offsets are lower. So to code an offset, you still find the longest match, then you see where that offset is in the list ordered by likelihood, and output that index. (alternatively you could just context code the offset in the normal context way).
This is very closely related to LZP, Block Sorting, and Fenwick's Symbol Ranking stuff ( for example ).
We can do something similar in a slightly different way by sorting. Consider all previous possible match spots. Sort them by their context string, backwards, like for PPM. Now to match your current string, find the longest match with your forward string. This is the offset you want to send. To send it, find your current context in the sorted list - you'll find the longest match (actually you'll be between two strings in the list). Send the difference between your position in the list and the position of the longest match. Hopefully this delta is usually small, because the best match usually comes from a context similar to yours. The length of match is like before - the offset implicility send the length, you send the additional length after the implicit.
Obviously our goal has been to make the offset and match length both very close to zero. I mentioned block sort and symbol ranking before; the statistics of the codes we're generating here should look very much like those statistics.
Well, holy crap, this is exactly ACB !! (ACB = Associative Coding by Buyanovsky). I found a paper on ACB by Tomas Skopal or you can read this or this on my site.
ACB was never given much attention for a few reasons. 1) George's english is awful and he could never write a decent paper on it, so it never got into academia, his terminology is all non-standard and very confusing, 2) it's way the hell off the optimal curve in terms of speed vs. compression. It's slower than PPM and can't compress as well. It is however, interesting to me in a theoretical sense, because it's sort of like as good as you can do with dictionary compression, and as we see it's extremely similar to LZP, block sorting, and symbol ranking. In fact all of these coders wind up sucking in the same way, because by simply turning your correlation knowledge into an index with low mean, you're giving away a lot of good local information that a plain context modeller could do more with.
09-27-08 - 1
The thing that makes this super extra hard is that different algorithms lend themselves to more tweaking. So you really can't just implement them all in a rough high-level fashion and know that you're picking the right thing. In compression for example, you might look at Block Sorting and PPM and say hey these are both very simple elegant high level algorithms, and they perform very comparably. But they are not equal. PPM lends itself to far more tweaking and refinement and tricks and specialization, while with block sorting you're stuck in a dead end where the general solution is all you can do. Of course you see the same thing in speed and optimization.
9/26/2008
09-26-08 - 2
Of course it's all sort of pointless because if you actually are doing something where sort speed really matters you should use a sort that's customized for your application. For string suffixes I'm currently using Michael Maniscalco's . For floats you can use Pierre Terdiman's radix sort. Hmm.. I just searched for the link and Google tells me the site will harm me. Better wear an e-condom if you go to coder corner. Oh yeah, apparently Michael Herf has an improved version of Pierre's radix, so just go there.
Actually it's not pointless. Having the std::sort as a good baseline is extremely valuable. It's a sanity check that lets you know if you're doing something reasonable. Good libraries like this are extremely valuable *even when you don't use them* because you can test your code against them and know you're not off in lala land. Stuff like dlmalloc and tcmalloc are great for the same reason; I've never actually shipped a product with either, but I consider them invaluable because I can easily link with them and make sure my allocator is at least as good as they are.
09-26-08 - 1
Plus "Sunny" is back on. Season 3 was kind of hit & miss, partly because they started using writers other than the original creators, such as in "The Gang Gets Invincible" and "The Gang Gets Taken Hostage" which were very sitcom. Hopefully the new stuff is back on form.
9/25/2008
09-25-08 - 3
09-25-08 - 2
09-25-08 - 1
9/24/2008
09-24-08 - 2
The commute here is really awful. It's almost impossible to avoid traffic. It's trafficy in the morning from about 7 AM to 10 AM. In the evening it's trafficy from 3 PM to 8 PM. The really frustrating thing to me is that it's such a short distance, and there really aren't that many cars on the road. The traffic is 90% just because these people are all fucking retards. They see a car half a mile ahead touch the brakes, so they slam on the brakes, and then everyone behind them slams on the brakes and we get a huge lockup. I hate to be that California guy who's always talking about how it's so much better there, but my god the drivers here are so much worse. Every single merge is a clusterfuck. The people merging don't get up to speed, the people being merged into don't change lanes, and they try to be "courteous" by slamming on the brakes to let the damn slow merger in.
Fuck, this rant tires and bores me.
I'm sick of my energy and vitality being drained by people who are just fucking bores or assholes or morons. I don't want to deal with any of you. I can't teach you, I can't improve you, you fucking suck, get away from me and stop dragging me down. I want a cabin in the woods. Perhaps a yurt on a mining claim. I'll run naked in the grass, feel the sun on my skin, jump in the freezing water, slink through the forest, speak with the deer, and be happy.
The drivers here are like holier-than-thou know-it-all sort of nosey minister's wife think they know best, but also retarded. You get a lot of scenes like when two people pull up to a stop sign and they both sit there and wave for each other to go. That is not courtesy, that's just retarded. And in fact its very discourteous to the people behind you. Somebody just fucking go!
09-24-08 - 1
9/23/2008
09-23-08 - 1
So my screen saver just cycles through your pictures and actually does smooth transitions. And hey, I also made it grab the desktop and use that as the first image to transition from, so it comes on smoothly too. And I wrote a couple little transition effects to start. I'm doing them in software in C so they're slow as hell at the moment (at 1920x1200 or higher). It would be kind of fun to do transitions with D3D but I don't really want to worry about my screensaver using the 3d hardware right in all cases that it can come on.
I was pleased to find that MS actually made it pretty nice and easy to make screensavers. Hey it's just an EXE that you rename to a ".SCR" and you can do whatever you want in there.
Anyhoo, I'll post it in a few days once I get a freaking settings dialog box and such done.
9/22/2008
09-22-08 - 1
You can see my point better perhaps if you contrast with the opposite way of doing things. Something like motion-jpeg2000 might have roughly the same CPU usage, but it's not doing any motion comp and instead spends that CPU time on a better transform & still image coder. The total error rate is the same (pretend), but the motion-jpeg2000 coder is putting error all over the place, and we'll see horrible wiggles in still spots or simple translating spots.
9/21/2008
09-21-08 - 5
Lomont says you can't rely on the behavior of shifts on negative numbers. IMO if you start dealing with really weird platofrms then you need to make wrapper functions for everything and customize them for each platform.
I've been thinking about negatives and two's complement and such. It's one of those things that's common knowledge and all, but it seems new and confusing every time I think about it.
The easiest way for me to understand it is to think about ints as a ring. Unsigned ints are a ring that go from [0,FFFF] and then loop back around. Don't think about the integer line as a straight line, it's a circle like a clock. You count up to 65535 then +1 and you're back at 0. Similarly you can subtract by going backwards on the clock, so 3 - 7 = 65532. Sometimes I get confused when I'm doing subtracts with unsigned values, but of course subtracting unsigned just means walk backwards around the ring.
Now, signed ints are just exactly the same thing, but offset into the middle of the range. Note with signed ints, 0 is part of the "positive" half of the range. It helps to think of 0 as positive, and the actual middle of the range is at -0.5. So the positives are [0,32767] and negative is [-32768,-1]. This is also a ring, it loops around, when you take 32767 and add 1 you get -32768.
Conceptually you could pretend that a signed value is unsigned offset by half, like S = U - 32768. That's not how the bits are actually stored, but it gives you equivalent operations. It makes it obvious that addition and subtraction and so on produce the exact same values, because you're just offset by a constant amount around the ring.
In terms of data compression if you think about the ring, in unsigned ints, 0 and FFFF are actually neighbors, but they look very different bitwise. As an example, we can consider the JPEG-LS or PNG style bytewise delta operations :
D = A - B; A = D + B;These are reversible in 8 bits. That's not completely obvious. The range of D is actually [-255,255] , it would seem we've expanded our 8 bit values to 9 bits. But depending on the value of B, not all of that range is possible. A - B is the delta from B, which is the number of steps on the ring from B in the positive direction. If A is less than B, you'll loop all the way around the ring, eg. 3 - 7 = 252 steps around the ring forward. That number of steps is always in [0,255]. To undo the delta you just take the same number of steps. Obviously you don't want to transmit this because it's offset, but
D = A - B + 128; A = D + B - 128;is nicely centered. Again the ring wraps when A-B = 127 to 128 you get a bitwise discontinuity, but that's okay because in compression that should be very rare, and those values have equal probability.
Speaking of probabilities, I wrote a little thing before about how to shift and preserve average by making half round down and half round up. It's an exercise for the reader to make that work on negative numbers too (aka, I don't want to do it myself).
Writing with Chris it made me realize that division has a very weird property with negatives. Int division goes to zero (in C). Shifts always go "down" (towards minus infinity (on x86 anyway)). Int division basically takes the abs, does a positive divide, then puts the sign back (it's not actually done this way of course, that's just the concept). Shifts are actual bitwise register shifts, so the top sign bit gets in. What this means is that integer division actually makes you switch directions of rounding at zero. That never really seemed like a big deal to me, but man it's really weird. For example, if you have a bunch of signed ints and take the average, now you divide them all by two in ints - take the average of the new numbers and multiply that up by two. With real numbers you'd have the same value. With positive ints, what you get is everything shifted down towards zero a bit, so the result is less by 0.5. If you had a bunch of numbers whose average was zero before, the average is still zero! Because half went down and half went up!
In terms of probabilities, if you have a bunch of ints scattered evenly over the int line, and you divide them all by two - the probabilities of the result are no longer evenly distributed! The probabilty of a zero is 3/2 the probability of any other value. All values in the result come from 2 values in the source, eg. {7,6} -> 3 , except for zero which comes from 3 values! {-1,0,1} -> 0. This makes it obvious why you can never use division of negatives for wavelet transforms or any kind of reversible mapping, because you are mapping too many values to zero. BTW in a "deadzone" style quantizer we actually do this exact thing on purpose, we make the zero more likely than anything else.
09-21-08 - 4
Now, I have a COMPILER_ASSERT thing and love that, but it can't call functions, you can only check constant conditions. What if you could do a compiler assert that was right there, something like :
REQUIRE( ilog2ceil(256) == 8 ); REQUIRE( ilog2ceil(11) == 4 );Now, in the actual compile there would just be a
#define REQUIRE(exp)that makes those expressions do nothing, but you would run your own custom build step that scans the file for "REQUIRE" and grabs all those expressions and puts them in their own little .cpp file, compiles it and runs it with
#define REQUIRE(exp) if ( ! (exp) ) { make bad error report, send people email, whatever }
The stuff in REQUIRE might use things in the file, so the easiest thing would be for the autogen to make the test function at the bottom of each .cpp
Then call them all from main. The autogen would also have to disable the original main somehow to run its own main. In VC you would make the autogen
thing a custom build step as part of your project build, so that a unit test failure would result in a failure to build.
Actually I can do something without autogen that's not bad. With cblib right now I can do :
#define REQUIRE(exp) AT_STARTUP( ASSERT_RELEASE(exp) )That executes at startup so it can't make the build fail, but hey I can do it right now without any autogen code madness.
For longer unit tests you can just put them in a function, like :
static bool CheckLotsOfStuff()
{
.. blah blah ... do unit testy stuff here ...
return true;
}
REQUIRE( CheckLotsOfStuff() );
09-21-08 - 2
09-21-08 - 1
9/20/2008
09-20-08 - 2
I really fucking hate unsigned types. I've got unsigned types all over for no good reason at the moment and they keep giving me bugs where I try to count backwards or something, like
for( ; pos>=0 ; pos--)
{
}
Hey I just wrote an infinite loop cuz freaking pos is unsigned and I wasn't thinking about that and I don't want to have to worry about that all
the time. IMO it's best to just use signed types unless you specifically need to allow two billion - and as Won points out if you actually are
trying to handle two billion in an unsigned int you have to be super careful (eg. never add two numbers that can cause a carry ,etc.).
I really really like parameter checking, but people mostly do it wrong and don't get the point. Checking pointers for NULL, for example, is pretty worthless (unless it's a release-mode check that throws; I'll assume your parameter checking is just an assert). When someone passes in a NULL they will get a very obvious crash, the system provides a nice message for that already, and you can trivially see what caused the problem by going "hey that pointer is NULL". The good parameter checking is for conditions that will cause non-obvious failures that may be hard to debug and may not cause a crash at all, just incorrect operation. For example, things like :
U32 ShiftRight(U32 value, U32 shift)
{
ASSERT( shift >= 0 && shift < 32 ); // !! 32 is not valid !
return ( value >> shift );
}
or
void memcpy256(void * dest,void * src,int count)
{
ASSERT( abs( (int)dest - (int) src ) >= 32 ); // must be >= 256 bits apart !!
// ... do 256 bit memcpy
}
The other thing people often do wrong in parameter checking is go the opposite way and check too strictly on conditions that are just how they think things should be, but aren't actually necessary for the function to get right. Stuff like :
void Matrix::Transpose()
{
ASSERT( IsOrthonormal() );
// ...
}
where some jackass put too strict an assert because he was thinking you wanted the inverse of the matrix, but that's not the only reason to
use transpose. Asserts should check for cases that will cause failure, not checks for being in the expected bounds of operation. Another
common kind of bogus parameter check I see stuff like :
float lerp(float from,float to,float t)
{
ASSERT( t >= 0 && t <= 1 );
return from + (to - from) * t;
}
Umm, no, this parameter check is all wrong. It doesn't prevent any bugs, extrapolating is perfectly valid and will return a reasonable and correct
answer.
I also think that result checking is perhaps more important than parameter checking. Especially in cases where a function does something really complex that you're not sure you got right, but is easy to check. For example :
uint ilog2ceilasm(uint val)
{
__asm
{
FILD val
FSTP val
mov eax,val
add eax,0x7FFFFF // 1<<23 - 1
shr eax,23
sub eax,127
}
}
uint ilog2ceil(uint val)
{
ASSERT( val > 0 );
uint ret = ilog2ceilasm(val);
ASSERT( (1 << ret) >= val && val > (1 << (ret-1)) );
return ret;
}
The result check does more than just make sure I wrote my code right. It also implicitly checks the parameters because invalid parameters
will cause the result verification to fail. It also provides a constraint for modifications. It makes it clear what contract this function
has made with its clients, so that if you try to optimize in the future you know exactly what you must satisfy, and the assert tells you if
you screw up. (obviously we'd love compile time automatic dynamic analysis instead of relying on the actual bad case to run and trigger the assert)
Of course asserts and parameter checking and such should never be used at all on stuff that comes from users or artists. That stuff should be manually checked with code and they should be informed about bounds violation etc. We had a pretty good rule at Oddworld I think - any condition violation that can only be generated by coders errors is an assert and you can halt the game if necessary, any condition violation that can be caused by bad data or artist error should be a log/throw/warning & you try to keep running despite the error.
09-20-08 - 1
The reality is that powerful complex devices like a multipurpose computer don't work very well. It's hard to make good UI's that are right for all those different functions. It's hard to make absolutely sure that they work right all the time and never crash, because they're so complex. And it's hard to teach your grandma how to use them.
The exact opposite of that old prediction now seems to be the trend of the future - rather than 1 powerful computer, you will still have tons of single-function devices - they will just all be computers! A computer to watch TV, a computer for video games, a computer to browse the web, a computer for business software. They'll be very powerful, but very cheap, with special single-function software that's easy to use and reliable. This is of course already happening.
9/18/2008
09-18-08 - 1
And no I won't use any web mail ever, or any type of mail where they aren't just raw text files on my own computer. I tried going up to Eudora 7 a while ago and there were some things I didn't like but I forget what they were, and I don't recall if they fixed the dumb personality behavior either.
How fucking hard is it to write a decent mail client? It's one of those things I should just do myself. Right after I write a fucking music library browser and a fucking image editor. (yes, I'm looking at you, iTunes and Photoshop).
9/17/2008
09-17-08 - 3
dot = ax * bx + ay * by + az * bz; or dot = x + y + z;since all that really matters here is the accumulation. I'm going to assume that this is done like :
dot = x + y; dot += z;though the exact order of summation depends on the language convention and compiler and whatnot. The really bad case happens when 'x' is small, 'y' is large, and 'z' is large and nearly exactly negative 'y'. When you compute (x+y) the precision in x is killed, then you add on 'z' and it cancels y, but you get something that's nowhere near where it should be (in a relative sense), eg. (x+y)+z is way off from x+(y+z).
This may seem rather obscure and arcane but in fact it happens all the time in very common cases. When 'a' and 'b' are two vectors that are mostly in the YZ plane (small x component) and are nearly 90 degrees off each other, this is what happens.
Its also super trivial to fix and the performance cost is near zero on platforms where the native float is double anyway like the PC.
I've had so many problems in the last few years with mathematical code failing due to precision, that I now just use doubles everywhere until I see some very good reason to go to floats. Stuff like the SVD and PCA can very easily lead to total garbage if you don't have enough precision in your temporaries. I learn the lessons of coding over and over. Don't prematurely optimize. Always be extra safe and robust until you have a good reason not to be. And check your results even when you're sure you don't have to. I only know that I have had precision problems because I do things like after I get the singular vectors out of the SVD, I go back through and dot them all against each other and assert that they're actually orthogonal to each other.
09-17-08 - 2
One thing this means is that the block based coders that are super tuned (like JPEG,MPEG,H264) are actually dialed in for a certain era and a certain typical image resolution.
09-17-08 - 1
A few nights ago a girl came on my front lawn. I had just gotten home from work around 9:30 and was setting up my laptop and stuff when I hear this loud moaning coming from somewhere. I listened to try to hear where it was coming from, and it seemed like it was coming from the window, so I walked over and looked outside. There was a couple on the sidewalk, arms around each other, and the guy had his hand up the girls skirt, vigorously finger-fucking her. She literally collapsed onto the lawn and he fell down with her and kept going at it for a minute or so with her wailing. My front lawn is not large and in no way is it at all private, there are apartments all around and no bushes or any kind of shelter. It was quite surprising.
Tonight I watched my upstairs neighbor buy drugs. I was standing near the front window lifting weights and watching TV. I hear neighbor come pounding down the stairs, and just as he does a car pulls up in front of the building and drives right up onto the curb (our street is only one lane wide, there's nowhere to pull over). Upstairs neighbor gets in the passenger side of the car and they just sit there. I can see very clearly into the car whenever the door is opened because the light comes on, but it's less clear once he closes the door. I see the driver go into the center console and pull out a ziploc bag. Upstairs neighbor rustles around in his pockets. After 30 seconds or so, the passenger door opens again and I see upstairs neighbor making sure what's in his pocket is definitely inside. He gets out and comes back in, car drives off. This explains the coming home at 3 or 4 AM and clomping around on the floor above my brain. I just wish he would get on heroin instead of coke.
9/16/2008
09-16-08 - 1
H.264 : the new video codec; a very simple 4x4 transform used with highly tuned quantizers and predictive coders, and very sophisticated motion estimation and bit allocation. Many different profiles support different bit rates. There are *TONS* of papers on this beast and every aspect has been highly tuned.
PPMii / PPMd / PPMonstr / Durilca : stuff by Dmitry Shkarin. This was a while ago, it's the successor to PPMZ and very good. Some code and papers are available. One of the things that he does that everyone does now is detect data types and do some special munging to make the data more compressible.
PAQ by Matt Mahoney : bitwise coder with lots of context models and weighting, sort of like CTW, but really more like PPMZ. Uses SSE similar to the SEE in PPMZ. Model weighting is well done, uses many types of models such as models with skips. The best compressor at the moment, very slow. Bitwise compressors like PAQ are somewhat easier to write than byte-wise, because you don't have the issue of escapes. Escapes and sparse contexts are really hard to get perfectly right, with bits you always have a 0 and 1, you just need to put a probability on each.
Preprocessors & dictionaries are super common now. Lots of people use a variant of an "LZP" preprocessor, Shkarin and Mahoney both have one. Most of the top compressors will decompress already packed data so they can recompress it, they do an E8E9 transform on exes, and most are conditioned with some training model. (E8E9 transform changes relative jumps to absolute jumps; that way function calls to "strcpy" or whatever always look the same in bytes, so they're much more compressible). Most compressors also detect binary data semi-automatically and use skip-contexts.
ROLZ : aka "reduced offset LZ" ; apparently this is in RKive and many of the Russians are using it, eg. I believe it's in 7-Zip. I haven't found any actual papers or good descriptions of how they do it exactly, but the basic idea is to do something like an LZP to build a list of only a few possible match indexes using context prediction, write a code to select one of those with context modeling.
There are a bunch of big packages with source code, such as 7-Zip, LZMA, FreeArc, BALZ, and I'm sure many more. While this is cool, they seem to be huge messes with little or poor documentation. Holy crap the Russians write really god awful unreadable code. A lot of them look they were intentionally written for an obfuscated C coding contest, stuff like :
m = H(++*p, i & ( -i >> DS )) && *++p | ( dec(m.[(i>>MS)&0xFF) + t );These are some actual lines of code from Shkarin's PPMD (and seriously, he is definitely one of the cleanest Russian coders) :
SubRange.HighCount=(SubRange.scale += (SubRange.LowCount=LoCnt));
psee2c += 2*(2*NumStats < t+NumMasked)+Flags;
(FoundState=p)->Freq=(HiCnt += 4); SummFreq += 4;
In the super high lossless compression area, there's a bunch of talk in the Russian forums such as encode.ru . It's super hard to actually find the good information in there though.
Lapped image transforms. Malvar's page at MS is a good place to start on this. Basically improves quality & compression of block based image coding by using basis functions that extend past the blocks. An alternative viewpoint is that a Lifting based smearing filter is run on the blocks at decompress, and the inverse filter is run at compress.
There's some new interesting stuff on blocksort. Jurgen Abel after BWT , and Michael Maniscalco has a bunch of stuff.
For lossless image compression, it seems to be stuff like GLICBALS and MRP , basically still DPCM but with rigorous linear predictor training.
There are some new arithmetic coders. One I found is "rc.c" in the MRP distribution which is called the "Russian Range coder". It's slightly different than the standard Schindler implementation that I use, but seems to preform almost identically. Another one is coder used by PAQ, the best one seems to be "fpaq0p". Again it's a very slightly different binary arithcoder implementation than the one I use but seems to perform almost identically. The big new one is the strange paper by Jarek Duda called "Asymmetric Binary System". The paper is totally unreadable IMO, it's more useful to try to find people's notes on it around the web, such as this .
9/13/2008
09-13-08 - 2
safeprintf type error; wanted (int), found (float) safeprintf err in fmt "got option : quantizer = %d
but then it just keeps rolling like nothing happened!
09-13-08 - 1
I'm doing a little lapped image transform. To do a 4x4 DCT you need an 8x8 lap window, but by symmetry and reversibility it's separable to 8 taps, 4 are just mirrored, and half of those are constrained by the mirrored sum = 1 rule, so there are only 2 free floats. So your lap window can be a kaiser or a sin(sin^2) window, or you can just parameterize by two floats and tweak them to maximize PSNR and subjective quality.
9/11/2008
09-11-08 - 3
09-11-08 - 2
A comprehensive list of (decent) color conversions for image compression :
General note : color conversions which involve floats should & can be used for lossy image compression, but require a whole pipeline made for floats; many color conversion scale the components so again your quantization system must be aware of that. Lossless color conversion that go int->int are easier to analyze. They generally use lifting of some sort.
YUV or YCbCbr like in JPEG ; this is lossy and crappy.
KLT : matrix multiply by the PCA of the color planes of the image.
fixed KLT : constant KLT matrix optimized on the Kodak image set; you can find this in the FastVDO H.264 paper.
DCT3 : 3-sample DCT. Whoah, when I thought of this it gave me a huge "duh" moment. It's actually a very very good float->float matrix color conversion. Implemented as just a 3x3 matrix multiply. In fact there's a theoretical result that the DCT optimally decorrelates any data which is just Gaussian noise with order-1 correlation, and in fact hey color is very close to that.
YCoCg : both lossless and lossy versions. See Malvar papers at microsoft research or his H.264 submission. BTW this is equivalent to doing Haar[R,B] then Haar[G,B] (the Haar acts in place like {x,y} <- { (x+y)/2, y-x })
CREW aka RCT aka J2K-lossless : older lossless transform; pretty much always worse than YCoCg
FastVDO : another lossless transform proposed for H.264 ; matches the KLT slightly better than YCoCg , but actually usually codes worse than YCoCg.
This last one leads me to a general issue that was somewhat confounding :
Decorrelation is not the same as real world coding performance. That is, the most decorrelating color transform (the KLT) is not the best for coding in most cases. In fact, the KLT was quite poor. I did come up with some heuristic tricks to make a pseudo-KLT that does code quite well.
There's a theoretical measure of "coding gain" and the KLT maximizes that, but when run through real coders it falls down. I'm not sure at this point exactly what's happening. I have some theories; one issue is that the original RGB is not a Gaussian float, it's integers, so things don't behave smoothly; for example, long long ago I wrote on here about how D(Q) is not smooth in the real world, that is the distortion for a given quantizer does not increase monotonically with Q; it has special peaks when Q hits rational numbers, because those values map ints to ints better. All the theoretical literature on rate-distortion is almost garbage because D(Q) and R(Q) are so non-smooth in the real world. My other theories are that the oblique rotations the KLT sometimes takes is essentially making the bottom bit random which is hurting the spatial prediction of later coding stages.
One interesting case for games is compressing images with an alpha channel. In that case, the alpha channel can be losslessly predicted from a linear combination of RGB, which is a very good model of many alpha channels, which leads to them being packed in only a few bytes.
09-11-08 - 1
9/10/2008
09-10-08 - 1
9/09/2008
09-09-08 - 1
-1/4 == 0 -1>>2 == -1 -5/4 == -1 -5>>2 == -2
To divide an int by 2 and preserve exact average of a bunch of ints, you need to push the odd values up half the time and down half the time; this is :
( val + ((val>>1)&1) ) >> 1
One way to get rid of signs is to interleave the positive and negative halves of the lines : (0,1,2) -> (0,2,4) and (-1,-2,-3) -> (1,3,5)
if ( val >= 0 ) return val*2; else return -val*2 - 1;I wouldn't actually code from this directly, however, what you can do is first code the 0 or not-0 flag, then code the sign bit, which is just (val&1) , then code the magnitude :
if ( positive ) mag = val>>1; else mag = (val+1)>>1;
To code a value with a Laplacian probability distribution ( P(x) = c * r^x ), simply using a Golomb code with W = average ; this code is :
top = value / W bottom = value mod W = value - top * W send top with unary send bottom raw
9/08/2008
09-08-08 - 3
One way is this magic sort of trick with where the LSB goes. You form
L = (A+B)/2 H = B - A
(all divides floored on integers); how do I get back my original values? The trick is getting the bottom bit right, and the key is that when H is odd it means you need to set a bottom bit on either A or B; if H is positive you should put it on B, if negative put it on A. This means :
A = L + (1-H)/2 B = L + (H+1)/2
An easier way to see it is via the generic lifting/predictor mechanism. You can always create a transform that is reversible by taking a series of predictor steps. One predictor step is :
A -= P(B)You predict A only using B, and subtract it off; you can invert it of course just by doing +=. To make the Haar transform this way you do :
B -= A; A += B/2;To invert you flip the signs and reverse the order :
A -= B/2; B += A;
P() the lifting predictor can actually be any function, it need not be linear to be reversible, but if we restrict ourselves to linear transforms, then it can be written as a 2x2 matrix multiply :
A = [ 1 X ] [ A ] B [ 0 1 ] [ B ]Any transform like this with 1's on the diagonal and half the matrix zero is a valid lifting transform. This is a "shear". One useful fact is that any rotation can be written as a shear. In particular :
A) [ c -s ] = [ 1 0 ] [ 1 -sc ] [ c 0 ] [ s c ] [ s/c 1 ] [ 0 1 ] [ 0 1/c ] B) t = (c-1)/s [ c -s ] = [ 1 t ] [ 1 0 ] [ 1 t ] [ s c ] [ 0 1 ] [ s 1 ] [ 0 1 ]Form A uses a scale first which is not reversible, form B is pure shears. Form B is good except when the sine "s" becomes small, in which case the temp value "t" becomes very large. Even ignoring divergence, a large "t" value is very bad for accuracy and bit expansion. You don't really want to be doing lifting shears with multipliers much bigger than 2.0 ; fortunately, when "s" is small, "c" is very close to 1 (or -1), which means the scale term in form A is nearly identity. In that case we can just approximate it as exactly one and not do the scale step at all. If we drop the scale a more accurate lifting rotation is just :
near s = 0 [ c -s ] ~= [ 1 0 ] [ 1 -s ] (* -1 if c < 0) [ s c ] [ s 1 ] [ 0 1 ]Using this, any matrix that can be written as rotations (eg. any orthonormal matrix) can be decomposed into lossless integer lifting steps.
There's a beautiful paper by Srinivasan called Modulo Transforms - An Alternative to Lifting . I don't think it's actually very practical because division and mod are so expensive, but it is a very nice explanation of integer transforms. The description of the quantization lattice and volume expansion is the best I've seen.
Let's look at the Haar again. It starts as a 45 degree rotation :
[ sqrt(1/2) -sqrt(1/2) ] = sqrt(1/2) * [ 1 -1 ] [ sqrt(1/2) sqrt(1/2) ] [ 1 1 ]This is orthonormal, which means it's L2 preserving, and BTW it's what the KLT would tell you to do if your data is 100% correlated; that is, the fit is through x = y.
To do this on ints, the scaling factor sqrt(1/2) ruins int->int , so the easiest thing to do is just don't do the scale :
[ 1 -1 ] [ 1 1 ]BTW this is called the "Hadamard transform". It's int->int, but it's expanding. It uniformly expands the L2 norm by *2 (the determinant of the matrix is the expansion factor). On ints, this expansion means that half of the output values in the valid range are impossible. In particular the two outputs always have the same bottom bit. In the Srinivasan paper he has some nice pictures of how int transforms give you a lattice of valid values with lots of blanks. Obviously we can get rid of the volume expansion by quantizing the output space. This just means dividing one of the two outputs by 2.
This is where it gets subtle. We can divide either of the two outputs by 2. Both choices eliminate the volume expansion, but neither choice is L2 preserving, and the L2 norm of the two choices is different.
V(x) = < x^2 > is the variance of x assuming zero mean < y > means average of y The Hadamard expansion is : V(a+b) + V(a-b) = 2 * ( V(a) + V(b) ) The normal Haar (quantizing the sum) gives variance : V( (a+b)/2 ) + V( a-b ) = 5/4 * ( V(a) + V(b) ) - 3/2 * < a*b > Quantizing the difference gives : V(a+b) + V( (a-b)/2 ) = 5/4 * ( V(a) + V(b) ) + 3/2 * < a*b >Note that quantizing the difference is the same as negating one of the values before doing the Haar.
You can see the Haar transform is not variance preserving. It reduces the variance if the correlation between the values is high enough, in particular :
Haar reduces variance if : < a*b > >= (1/6) ( V(a) + V(b) )
This is interesting; the Haar may actually be better than the original 45 degree rotation it approximates. There is some implicit data modelling going on in the Haar; like most wavelets it is not norm preserving, not orthonormal, and it behaves well (reduces norm) when the input data fits the predicted pattern.
Addendum : I found another nice paper ( PL Haar PDF ), on something called the "PL Haar" (Piecewise Linear Haar) ; the PLHaar transform is itself interesting, but aside from that they go over the same points I'm talking about here re: the Haar and range expansion, etc. , and they have some nice pictures.
BTW you can see the implicit modeling I'm talking about in the PL Haar very neatly as well. By design, the PLHaar preserves Linf norm, not L2 norm. L2 norm is (roughly) what matters for coding. In the square domain, stuff in the corners has the max L2 norm, and stuff along the sides has the lower L2 norm. By doing the PL 45 degree rotation, PLHaar takes the large L2 norm stuff along the x=y lines and rotates them to the sides of the matrix, giving them the lower norm.
09-08-08 - 2
1. Extra truncations that just throw away data. One way to say this is they basically do math in floats and then convert to int, and do it multiple times. Often they do their math "in ints" and right shift away extra bits, which is the same mistake. The easiest way to get this right is just to read in your input image to floats, do your color transform, your wavelet transform, all in floats. Then when you quantize, quantize float->int and do your coding on ints. (of course you could also just code the floats, instead of doing bit masks for bit plane coding you check if value >= thresh).
2. Quantizing wrong. Quantizing is pretty simple if you think about it in terms of buckets and medians. You're taking a certain range and labeling that with an int. When you dequantize back to floats, you should restore to the middle of the range. (actually you should restore to the most likely value in the range, which might not be the middle in a typical skewed distribution, but that's a very minor detail). Many wavelet quantizers intentionally use an ESZZ (Extra Size Zero Zone) quantizer, which is an artificial way of getting more zeros, which is generally a win because the coder likes lots of zeros - but an ESZZ is wrong on the DC LL band !. When you have zeros you don't need to send sign bits. One thing people frequently get wrong is handling negatives wrong; you have to be careful about trying to divide negatives and do float-to-ints and all that; I find it easiest to take the sign off and do my math then put the sign back.
3. Not accounting for norm scaling. If all your transforms were orthonormal you wouldn't have to do this, but most wavelet transforms are not. What that means is that a value of 1 in the LL and a value of 1 in the HH do not have the same L2 norm after untransforming. That means you can't quantize them with the same value, or use the same bitplane truncation. Quantizers need to be scaled, as do any rate-distortion functions.
4. Not sending the most important bits first (assuming you want an "embedded" aka truncatable stream, which you do). Obviously when you are planning to chop off the end of your data you need to sort it by importance. To really get this right you need to be able to shuffle in different bits. You want to send the Y LL first, then the U LL, V LL, then the Y LH, etc. but not necessarilly in a fixed order; in some images you might want to send the whole Y band before you send any U. That's global level stuff that even value-based coders should do. In the extreme case you can do things like the bit-plane coder EBCOT that splits the bits of a row into chunks to try to get the most important bits first in the stream. Note that each bit of the compressed stream always has the same amount of information; what we want to do is put the bits that have the biggest L2 norm significance first; that is to say, bits that wind up affecting a large region of the image with large value changes.
09-08-08 - 1
The primary principal component is just a linear fit to the data; then the other axes are perp and point in the next primary axes of variance. This is why a lot of people heuristically think of it as the "best fit" to the data.
For doing 2-means or BSP splitting, the primary principal component of course tells you the axis to split along. To find the best BSP plane, you can search along that axis in O(NlogN) by sorting all the data by their dot along that axis and trying the plane between each consecutive pair of points. Since you're incrementally stepping you can just increment to count the number of points on each side, as well as the mean & sdev of points of each side.
You should remove the means to calculate the covariance, but of course you don't have to remove the means to transform the data. The mean is just multiplied by the basis transform and becomes the mean in the new coordinate system. If you do remove the mean, then of course the mean in the new space is also zero.
You should *NOT* divide by sdevs in the covariance. The true "correlation" does divide by sdevs to give you a relative correlation measure that's scaled to [-1,1], where 1 = 100% correlation. The "covariance" without the sdevs divided out will have units of [value]^2 , just like the variance. A good thing to display is the *root* of the covariance, since that's in [value] units. I actually find root-covariance a more useful measure than the true correlation, because it tells you how significant the value correlation is in the original scale, and it won't exaggerate up small wiggles; obviously this depends on the problem you are solving. eg. [4040404] [1010101] has a root-covariance of 2, a correlation of 1.0 , while [808080808] [80808080] has a root covariance of 8.
9/06/2008
09-06-08 - 1
I miss the creature comforts of home at the office. The home is really a lovely place to be, I've got natural light, wood floors, windows that actually open for fresh air, and also crucially a kitchen. I miss being able to stop coding and go cook myself something. I've found myself feeling miserable at the office, and I think it's largely the light and air. I'm very sensitive to environments; of course it's my own fault for staying inside there too much.
I think the ideal work situation for me would be lounging in a lush courtyard like some greek philosopher, snacking on olives and almonds in the warmth and fresh air, chatting with other wise men, tended by attractive youths, and occasionally doling out words of wisdom to customers.
Universities have pretty sweet offices in the older buildings, courtyards or lawns around, lots of trees and windows, cozy rooms for the offices with lots of brick and wood and windows that open; they feel warm and cozy, where you could hole up for hours and puff on a pipe and think about general relativity.
9/05/2008
09-05-08 - 2
Ah, Won pointed out to me that Google has a filetype thing in image search so you can ask for big PNG's.
09-05-08 - 1
9/04/2008
09-04-08 - 3
09-04-08 - 2
Known Problems
(Bug #14194)
Error adding files already under source control to project:
"chmod: :
The system cannot find the file specified."
Workaround: Quit and relaunch the IDE, open the project,
and revert the file, then retry.
Translation : "our shit is totally fucking broken, but we're gonna release it anyway. Deal with it."
09-04-08 - 1
MediaPortal has a really nice crossfade when you fuck around with music; it's so obviously the right thing to do, like when you skip or mute or anything it's not abrupt. The default setting is like 4 seconds which is retarded awful laggy, but you can turn that down to like 0.5 seconds and it's nice.
The LEDs are retardedly bright; I need to just unplug them. All this makes me appreciate my old analog amp even more. It's got nice subtle LEDs. It's got a heavy smooth big knob for the volume. All the buttons are switches that actually flip so you can tell which way they're set and they feel solid. And there's never any incompatibility problem. Audio comes in on wires. Audio goes out on wires. That's it.
9/03/2008
09-03-08 - 3
"I have some good ideas of things to do" ... at work.
You get excited for a second, hey he has some good ideas, oh, it's at work, bah, boring.
"Something funny happened today" ... at work.
Oh dear god please no more work stories they are not funny.
09-03-08 - 2
09-03-08 - 1
9/02/2008
09-02-08 - 1
As usual the Wikipedia pages are good. I'm going to be a bit sloppy with my maths and just try to hand wave a bit so you get the concept.
After I posted this Ignacio pointed me at a very cool tutorial by Jon Shlens : his page contains that and some other very cool stuff; it's a good tutorial, similar to mine but more correct and it's much more readable since it's PDF and all that.
There are two ways of thinking about this. One is you have a bunch of K-element vectors, N of them, it's like a bunch of samples of an experiment; you can of course put them together in a KxN matrix; in this mode you are usually looking for K-element basis vectors. The other way to think about it is just that you have some KxN matrix and you wish to find an approximation of it. In either case we'll call this matrix of KxN things M.
You need K basis vectors to span your K element vectors, of course. You will also get K N-element "cobasis" vectors. Your basis transform on your vectors will be a KxK matrix. Obviously the identity matrix is no transform and it corresponds to using a basis like {1000},{0100},{0010},{0001}. That is, instead of thinking of your data element as {a,b,c} think of it as a*(100) + b*(010) + c*(001). That original basis is not necessarily the best, or even the "right" one.
We can think about this in terms of data compression. For example maybe you have a sound file with K tracks and N samples in each track. If you just compress the tracks one by one in that space, there's a lot of correlation between the tracks you're wasting. What you want is to find a different basis in which all the same parts of the tracks are mashed together and each track only has the new unique bits of that track. Then, for example if two tracks are exactly the same, a whole track will be zeros. In general you can model any linear relationship between tracks with a linear transform like a matrix multiply, eg. stuff like D = A + B - C can be completely modeled out.
The best possible linear basis transform is the PCA. The PCA finds the "principal components" of the set. These basis functions are orthogonal and completely decorrelate the dimensions (in the strict linear correlation sense; more on this later). We can figure out the PCA from this :
The "covariance" of dimension i with dimension j is :
Cij = Sum(n to N) { M'in * M'jn }
(M' = M with the mean of each dimension subtracted off, note this is almost the "correlation" but the sdevs are not divided out). I'm going to drop the prime on the M and just assume that you took out the mean from now, but make sure you do that.
Of course we can just write this same thing in matrix notation :
C = M'^T * M'
^T = Tranpose
The diagonal elegents of C are the variances of the elements (the true correlation with yourself is 1). C is symmetric, and the off-axis elements tell you the correlations of one dimension with another. We want a basis transform that makes the cross-correlations zero. That is we want to find :
D = B * C * B^T
where D is some diagonal matrix and B is a basis transform.
For those who know something about eigenvectors you can see that the basis transform we want is just the eigenvectors of C.
The eigenvector has the property (by definition) that
C * e_i = k_i * e_i
That is, multiplying by C gives you the same vector scaled by some number. That is, C does not rotate the eigenvector at all, it stays in the same direction. That means if the eigenvectors were our basis vectors, then multiplying by C on any vector would only scale the coordinates in each dimension, it would not ever munge one coordinate into another.
Any matrix can always be written as the outer product of a spanning set of vectors. That is :
C = Sum{ij} Cij * |i> *
If we write it in terms of the eigenvectors, then C must be diagonal because we know C turns e_i into e_i , thus all the off-diagonal terms must be zero, so
C = Sum{i} k_i * |e_i> *
So, we see that the eigenvectors of C are a diagonal representation of C, which means that the non-self correlations are zero, which is exactly what we want,
which means this B is exactly the basis transform we want for PCA.
Now, the eigenvalues also tell us how much energy is in each axis. If you look back at the definition of the correlation matrix, you see the diagonal terms are
just the L2 norm of that dimension. Now if we look at C in terms of eigenvectors we can see the eigenvalues of C are just equal to the L2 norms of the data in
that direction. If we sort the eigenvalues so k0 is the biggest, k1 is the next biggest, etc. this is the PCA.
Now let me look at the exact same thing a completely different way in terms of successive approximation.
You want to construct basis vectors which are orthogonal and decorrelating. You want the 0th to have most of the signal energy, then the 1st the next most, etc.
You can do this just by learning each one one by one.
First find a basis vector b such that the linear correlation of b to the vectors in M is maximized. That's actually the same as minimizing the least square error of a fit with
arbitrary coefficient :
Err = Sum{i in K,n in N} ( (Min - an * bi)^2 }
for some vector a ; b is a basis vector that has K element, a is a "cobasis" that has N elements. Either you can think about this as trying to find b and the a's
are just arbitrary coefficients, or you can think of it as trying to approximate the matrix with an outer product of two vectors.
This gives your first b. Then take |a>
The result that we get is a representation of M in terms of a sequence of outer products :
M = Sum{i} |ai> *
M = Sum{i} si * |a'i> *
or M = A * S * B
Where S is a diagonal matrix of si values, A is NxK and B is KxK. Now C the correlation matrix = M^T * M = B^T * S * A^T*A * A * B = B^T * S*S * B
This is exactly like our eigenvectors decomposition, except we have S*S instead of the eigenvalues. Well they must be the same thing. So these B that
we found by successive fitting are in fact exactly the same as the eigenvectors of the correlation.
Well this successive fitting is just the SVD, the Singular Value Decomposition. The S are the singular values - and we see now that the S are the
square root of the eigenvalues of the correlation matrix. Or rather, if you square the singular values you get the L2 norm of the data in each basis,
and they are sorted from largest to smallest.
What we have found is a K -> K orthogonal basis transform that takes us to a coordinate space where the dimensions are linearly decorrelated and as much
energy (in an L2) sense is in the lowest dimensions.
This is often done as a way of reducing the number of dimensions you have to deal with. You can see how much L2 energy is in each dimension, and just say
I'll use enough dimensions to capture 99% of the energy and just ignore the rest. It also lets you see when you have redundant dimensions. If one dimension
can be written as a linear combo of another dimension, that's equivalent to saying you actually have lower dimensional data embedded in higher dimension space,
like your data is actually planar but it's embedded in 3d space. The PCA vectors will be two primary vectors in the plane of the data, and one vector with
very low energy orthogonal to them.
When this is done in data compression, it's often called the Karhunen-Lo�ve transform (KLT). PCA/SVD/KLT is obviously good for data compression not just
because it docorrelates (linearly), but also because it puts as much energy as possible up front and then you can just truncate the stream.
The KLT is orthonormal, which means it's just a rotation. Orthonormal transforms are not required for reversibility; obviously any matrix
that has an inverse is a reversible transform. The nice thing about Orthonormal is that the L2 norm of the data is preserved, which means
that the magnitudes of the transformed data is in a space you can understand and compare to untransformed norms. That means you can measure
the distortion in transformed space, you can quantize in transformed space, etc. and not have to worry about how that scales to the original
space. In fact Wavelet transforms are not orthogonal, but are reversible, and the "short time fourier
transform" that's sometimes used for audio compression is not even reversible the way it's often implemented. I should also note that if your
input data is in integers, an arbitrary matrix may not map ints to ints and thus not be reversible. However, since the KLT is orthonormal
it can be turned into a lifting transform that does map ints to ints, at the cost of not exactly decorrelating.
(actually, most Wavelet transform implementations are not norm-preserving, which is fine, but the vast majority of implementations
that I've seen screw up and fail to account for that when doing quantization and measuring error).
It's interesting to compare the KLT to something like the Fourier transform.
The KLT has the property that the best subset of the data is the prefix. That is, the best
possible 4 value representation is just the first 4 values, not ever the 5th or the 6th. (this is true over the whole set that the KLT was made for, not
necessarilly true on any one given sample).
The Fourier Transform, or more correctly the DFT, is also a basis transform that can be written as a matrix transform. (we don't actually implement the DFT
as a matrix transform because a matrix multiply is N^2 and the DFT can be done in NlogN due to the magic of the FFT). The Fourier Transform also has the
optimal prefix property (called the "least squares property") - that is the first K terms of the DFT are the best possible K terms of any trigonometric
transform (that is, no later terms could be substituted). This is very useful because it means we can truncate the terms correctly, the first ones are always
the most important. However, unlike the KLT, the DFT is not necessarilly fully decorrelating, and that means that the energy in the lower dimensions is not
maximized.
You could imagine taking a ton of images and gathering all the runs of 8 pixels into samples, so you have K=8 and N samples. Do the PCA to find the best
spanning 8-vectors and use these instead of the DCT in something like JPEG. What you would find is that your PCA vectors would look very much like the DCT
basis vectors. And you broke the magic NlogN fastness of the FFT.
BTW people in video games have long used the eigenvectors of the correlation matrix (aka the PCA) as basis vectors to find bounding volumes around points.
This is complete crap. The PCA has absolutely zero connection to what axes would make good bounding volume axes. There are established very good algorithms
for fitting bounding volumes that work in totally different ways.
Why don't people use KLT transforms more in data compression? Well, for one thing, you have to transmit your basis. If
you use something like the DCT the basis is implicit, you're assuming some knowledge about the type of data you're working
on which saves you bits in the stream. Also obviously it's slower, though the slowness on encode isn't as big a deal as the
slowness on decode (there are lots of applications where slow encode would be fine). The main reason is that getting your
transform perfect is actually not that important to modern data compression. If you are using a fancy quantizer and a fancy
context coder on the back end, then mistakes in the transform can largely be hidden. For example, if your transform is not
perfectly decorrelating, your entropy coder can largely model any correlation that's left over and hide that mistake. The
simpler your back end is, the more your transform matters.
09-02-08 - 2
There seem to be a bunch of remote control apps that are sort of intended for this. GoToMyPC is the old craptastic one; the newer ones are LogMeIn, WebEx PCNow, and BeAnywhere.
Really I don't want a custom web interface; I want to hook up to my home machines over regular Windows file sharing. I guess I could have my home machines dial in to the VPN before I leave then I should be able to get to them, but that messes up the network at home and relies on me making sure I put them in the VPN whenever I want them.
9/01/2008
09-01-08 - 2
Holy fucking crap. g-lick-balls is not only one of the sweetest and most obviously right little compression algorithms ever, with the best acronym, it also has the absolute coolest source code . The original web page and PDF are gone from the net now, but fortunately these pieces remain. The hint text that went with the source code also contains some tidbits that weren't in the PDF article.
One of the other major lossless image compression algorithms is RkIm, part of the Rkive suite by my old compadre Malcolm Taylor. Malcolm was always a bit of a troubled weird character, and he seems to have disappeared off the face of the earth. His page is gone and I can't find any information about him or his work around on the net any more. Malcolm made the original Rkive right around the same time I wrote the original PPMZ; we were working in very similar territory, we had the top two text compressors in the world for a long time, he helped me work out many of the details of PPMZ and he made the best implementation of it. He was just a hobbyist, not employed, and I invited him to come out to Wild Tangent to interview when I worked there. He came out and was kind of weird, and then disappeared; he never replied to any followup emails or calls. I wonder what's become of him.
09-01-08 - 1
I set up a VPN to RAD on my laptop, and I've been having this problem that the fucking VPN connection dialog randomly pops up when I'm home browsing. Fucking Windows. This happens because VPN is treated like a dial up adapter, and Windows has junk in it to automatically dial your dialup connection whenever you're disconnected. The best fix is to go to Control Panel -> Internet Options -> Connections and select "Never Dial a Connection".
old rants
-
▼
2008
(697)
-
▼
September
(49)
- 09-29-08 - 1
- 09-28-08 - 1
- 09-27-08 - 3
- 09-27-08 - 2
- 09-27-08 - 1
- 09-26-08 - 2
- 09-26-08 - 1
- 09-25-08 - 3
- 09-25-08 - 2
- 09-25-08 - 1
- 09-24-08 - 2
- 09-24-08 - 1
- 09-23-08 - 1
- 09-22-08 - 1
- 09-21-08 - 5
- 09-21-08 - 4
- 09-21-08 - 3
- 09-21-08 - 2
- 09-21-08 - 1
- 09-20-08 - 2
- 09-20-08 - 1
- 09-18-08 - 1
- 09-17-08 - 3
- 09-17-08 - 2
- 09-17-08 - 1
- 09-16-08 - 1
- 09-13-08 - 2
- 09-13-08 - 1
- 09-11-08 - 3
- 09-11-08 - 2
- 09-11-08 - 1
- 09-10-08 - 1
- 09-09-08 - 1
- 09-08-08 - 3
- 09-08-08 - 2
- 09-08-08 - 1
- 09-06-08 - 1
- 09-05-08 - 2
- 09-05-08 - 1
- 09-04-08 - 3
- 09-04-08 - 2
- 09-04-08 - 1
- 09-03-08 - 3
- 09-03-08 - 2
- 09-03-08 - 1
- 09-02-08 - 1
- 09-02-08 - 2
- 09-01-08 - 2
- 09-01-08 - 1
-
▼
September
(49)