2/28/2011

02-28-11 - Some Car Shit

I wrote about differentials before . Since then I have learned a bit more about LSD's in practical use.

LSD's have a lot of advantages beyond just making your corner exit faster. When I first learned about them I was told they help corner exit speed, and that's like kind of "meh", I don't care about my lap time, so it's hard to get too excited about that. But they do much more.

They make power oversteer much more predictable and controllable. Your two rear wheels are semi locked together so you know how fast they are spinning and thus how much grip you have back there. With an open diff, if you push the throttle, you might just lose grip in one wheel, then when you apply power you spin that wheel but still have grip on the other wheel. If you want to fool around with drifts or donuts or whatever, an LSD is much more fun. It also is safer in the sense that your car is predictable, it's less random whether you oversteer or not.

Without an LSD you can get a nasty chirping/hopping through tight turns. What happens is one wheel lifts, and you speed it up, then it connects to pavement again and you get a sudden jerk and chirp. Through a tight turn (especially with some elevation change) you can get several lift-fall events so you go jerk-chirping around the corner. Not cool.

There are two main types of diff used in sports cars. One is Quaife/Torsen mechanical type, the other is clutch-pack friction disk type. Some random things I have read about these diffs :

Clutch-type diffs can have variable amounts of "preload" and "lockup" or "dynamic" locking. Usually this is expressed as the % of lockup (eg. 0% = open diff, 100% = solid axle), but sometimes it's expressed as the amount of force through the clutch pack. "preload" is the amount of lockup with no force coming through the drive shaft. If you lift the wheels up and turn one side and see how much the other side turns - this is the "preload". Preload can make the car hard to make tight turns - it eliminates the ability to turn one wheel without turning the other, so street cars often have 0% - 20% preload. The housing of the LSD has this ramp built into it and when force comes through the drive shaft, it pushes a pin against the ramp which forces the clutch plates together, creating more lockup. This is the "dynamic number". You will see LSD's described as 20/40 or something, which means 20% preload and 40% under force. Sometimes this is also described as the "ramp angle" because the angle of that ramp determines how quickly the LSD adds more pressure.

Race cars use LSD's with 40/60 or 50/80 settings. This is lots of lockup. This works for races because you are never actually making super tight turns on race tracks. For Autocross people generally use less preload or a Torsen diff. If you have a high lockup, you can only take tight turns by drifting the rear end. Preload also greatly increases low speed understeer. Most OEM LSD's max out at 20-40% (Porsche Cayman/911 non-GT3 LSD maxes at 30%).

Torsen diffs act like zero-preload , and also don't provide much lockup under braking. Clutch type LSD provide stability under braking - they keep the rear end straight, because a wiggling rear end can only happen if the rear wheels are spinning at different speeds.

Clutch type LSD's wear out and have to be serviced like any other clutch plate. TBD's generally don't need much servicing unless they are beat up hard, roughly like transmission gears. Many OEM LSD's have very low settings so you would want to replace them anyway. It may still be wise to tick the OEM LSD option because it can be easier to fit an aftermarket LSD if you had the OEM one in the transmission (details depend on the car). OEM clutch-type LSD's also often wear very quickly; when buying a used car "with LSD" it is often actually an open diff because the LSD is shot.

Gordon Glasgow LSD tech
Porsche GT3 LSD Buster Thread
Early 911 LSD setup

BTW Lotus doesn't fit LSD's, and the new McLaren has no LSD. If your goal is sharp steering and fast lap times, then an LSD is not a clear win. If your goal is the "driving pleasure" of being in good control of your rear end slip angle, an LSD is more obviously good. See Lotus Evora with no LSD for example.

In non-slip scenarios, LSD's increase understeer by binding the sides of the car to each other. (this is why Lotus prefers not to fit them). In some cases, it may be that ticking the optional LSD box actually makes a car worse. For example this may be the case with the Cayman S, where the LSD added to the standard suspension increases understeer too much ; in contrast, the Cayman R has LSD standard and the suspension was set up for that, which involves more camber and stiffer sways, both of which are anti-understeer moves.

Some commentators have been knocking the MP4-12C for not having an LSD, but those commentators "don't get it". The MP4 is a computer-controlled car, like the Nissan GTR or the Ferrari 599. It does not rely on mechanical bits for power transfer, stability under braking, control of over/under-steer / etc. Traditional mechanical bits like LSD or sway bars or spring rates or whatever just don't apply. You don't need an LSD to keep you straight under braking when the car is doing single-wheel braking based on the difference in steering angle and actual yaw rate. Normal car dynamics is a balance between twitchy vs. safe, or turn-in vs. stability; Lotus and McLaren have bent the rules and avoided this trade-off, however that means less hoon-ability. See for example : Lewis Hamilton fails to do donuts in the MP4 .


It's much easier to kick a drift and countersteer through it correctly if you don't have to think through it rationally. It's almost impossible to catch the countersteer fast enough and with the right angle if you are doing it with your conscious mind going through the steps. What you have to do is get your body to do it for you subconsciously. Fortunately there's an easy way to do that :

It just requires using the old adage "look where you want to drive, and drive where you look".

Step by step :

    Start steering into the corner.

    Choose a spot out in the distance that is in the direction you want to go after the corner.

    Stare at that target point and keep your eyes on it the whole time.

    Kick your drift (power-on oversteer, clutch kick, whatever you want to do)

    Point your steering at the target point - keep looking at the target point and just keep the wheel centered on it.

    You will automatically transition from steering into the corner to countersteer as the tail comes around.

This helps you countersteer quicker, and also helps you to not over-do your countersteer (ala Corvette club), because you aren't thinking "countersteer", you're just thinking "point the wheel where I want to go". The standard mistake to catch it too late, and then over-do it because you are thinking too hard about "countersteer" so then you fish-tail in the opposite direction. You do not have to "countersteer", all you have to do is point the front wheels in the direction you want to go.

(* note : this is not actually true, the correct steering angle is slightly past pointing the wheels where you want to go, but I find if I think this way, then my hands automatically do the right thing).

It really is true that you should "look where you want to drive, and drive where you look" ; it's one of those things they tell you in race school and you're like "yeah yeah duh I know, let me out on the fucking track already", but it doesn't sink in for a while. 99% of the time that I make a mistake driving, I realize after the fact that I wasn't looking where I wanted the car to go. You want to be looking far ahead, not ahead on the ground right near you. When I really fuck up I realize after the fact that I was looking at my steering wheel or my shifter, eg. looking down inside the cabin.

One thing that does help is to put a piece of tape on the top of your steering wheel (if you don't have a mark built in). Again this seems totally unnecessary, you will think "I know where fucking straight ahead is, I don't need a mark", but it does help. What it does is give you a *visual* cue of where straight ahead is, so it connects the eyes to what the hands are doing. You want that connection of visual to hand motion to be subconscious, to avoid the rational mind. Your hands perform better when they are out of conscious control.


I got a lot of bad information when I was shopping for cars. You have to ignore all the mainstream press, Edmunds, Motortrend, all that kind of shit. The professional automative writers are 1. lazy (they don't actually research the cars they write about, or how cars work in general), 2. corrupt (they praise cars so that they can get free testers), and 3. just generally stupid. They don't write about what you actually want to know or what would actually be useful. Furthermore, the idea that you can tell a lot from a test drive is I believe not true. One thing I have to get over is I'm afraid to really thrash cars in the test drive, and without a really hard thrashing you can't tell how they behave at the limit. For example, you'd like to know things like what does the car do if you mash the brake while cornering near the limit of grip? The auto writers won't tell you and you can't find out in a test drive.

A good source of information is web forums. Not your normal "I love my car"/"check out my bling" web forums, but the actual racer's forums, like the SCCA and other groups (spec miata, spec boxster, etc, lots of racer forums out there). There you can read about the things that really matter in a sports car, how it acts on the edge, how much prep it needs to be track-worthy, whether the engines blow up on the track, etc. Even if you don't track and have no intention of tracking, the guys who do tend to be a better class of humans than the normal web forum car people, who care more about "brand history" and rims and who has a faster quarter mile.

The vast majority of modern sports cars are set up for understeer (eg. Porsches, BMW's, etc.). The only ones I know of that aren't set up for understeer are the RX8 and S2000. Of course most of the retarded mainstream press says absolutely nothing about the intentional understeer (which you can fix pretty easily) and they say that the RX8 and S2000 have bad handling because they are "twitchy" or "bad in the rain". That's retarded. Those cars have great handling in the sense that it is predictable - if you go around a corner fast in the rain and jab the brakes, you will likely spin. That's not bad handling, that's a bad driver.

To some extent, "sharpness" or "liveliness" inherently go together with "twitchiness" or "dangerousness". For example cars that give you a nice little bit of oversteer will generally also give you lift-off oversteer, which you probably don't want. The ideal thing would be a car that you could press a button to make it milder and safer for normal use, or twitchy and sharp for fun times. This is sort of impossible with traditional mechanical suspension, you can only get a good compromise. This is why the new crazy cars from Ferrari and McLaren use computers to control the handling. The Ferrari approach is a bit like the "Euro Fighter" - they make a car that is mechanically inherently unstable, and then use computers to keep it under control. The McLaren MP4-12C approach uses crazy new techniques that nobody else is using.

02-28-11 - Game Branching and Internal Publishing

Darrin West has a nice post on Running branches for continuous publishing . Read the post, but basically the idea is you have a mainline for devs and a branch for releases.

At OW we didn't do this. Whenever we had to push out a milestone or a demo or whatever, we would go into "code lockdown" mode. Only approved bugfixes could get checked in - if you allow dev work to keep getting checked in, you risk destabilizing and making new bugs.

This was all fine, the problem is you can lose some dev productivity during this time. Part of the team is working on bug fixes, but some aren't and they should be proceeding with features which you want after the release. Sure, you can have them just keep files checked out on their local machines and do work there, and that works to some extent, but if the release lockdown stretches out for days or weeks, that's not viable, and it doesn't work if people need to share code with eachother, etc.

If I had it to do over again I would use the dev_branch/release_branch method.

To be clear : generally coders are working on dev_branch ; when you get close to a release, you integ from dev_branch to release_branch. Now the artists & testers are getting builds from release_branch ; you do all bug-fixes to release_branch. The lead coder and the people who are focused on the release get on that branch, but other devs who are doing future fixes can stay on dev_branch and are unaffected by the lockdown.

The other question is what build is given to artists & designers all the time during normal development. I call this "internal publication" ; normal publication is when the whole team gives the game to an external client (the publisher, demo at a show, whatever), internal publication is when the code team gives a build to the content team. It's very rare to see a game company actually think carefully about internal publication.

I have always believed that giving artists & designers "hot" builds (the latest build from whatever the coders have checked in) is a mistake - it leads to way too much art team down time as they deal with bugs in the hot code. I worked on way too many teams that were dominated by programmer arrogance that "we don't write bugs" or "the hot build is fine" or "we'll fix it quickly" ; basically the belief that artist's time is not as important as coder's time, so it's no big deal if the artists lose hours out of their day waiting for the build to be fixed.

It's much better to have at least a known semi-stable build before publishing it internally. This might be once a day or once a week or so. I believe it's wise to have one on-staff full time tester whose sole job is to test the build before it goes from code to internally published. You also need to have a very simple automatic rollback process if you do accidentally get a bad build out, so that artists lose 15 minutes waiting for the rollback, not hours waiting for a bug fix. (part of being able to rollback means never writing out non-recreatable files that old versions can't load).

Obviously you do want pretty quick turnaround to get new features out; in some cases you artists/designers don't want to wait for the next stable internal publication. I believe this is best accomplished by having a mini working team where the coder making a feature and the designer implementing it just pass builds directly between each other. That way if the coder makes bugs in their new feature it only affects the one designer who needs that feature immediately. The rest of the team can wait for the official internal publication to get those features.

2/27/2011

02-27-11 - Web Shows

My picks :

The old classic is Clark and Michael . It gets a bit repetitive, but not bad.

Wainy Days is pretty great. The super fast paced insanity of it is hard to take at first, but once you watch a few you get into the groove. It's really like a sitcom for the web generation; everything is super compressed, all the unfunny talky bits in a normal sitcom are squeezed out, and any pretense of logical consistency is abandoned, but otherwsie it's standard sitcom fare. I've watched some other Wain products before and found them intolerable (eg. Wet Hot and The Ten are just awful), I think he's much better in 5 minute doses.

I really enjoyed Thumbs Up! . Thumbs up America!

In related news, I now watch Youtube with no comments, thanks to CommentSnob. This saves me from going into a spiral of despair about how fucking awful all of humanity is every time I visit Youtube.

The next step is blocking comments from just about everywhere. I found these but haven't tried them yet :

YouTube Comment Snob Add-ons for Firefox
Stylish Add-ons for Firefox
Steven Frank shutup.css
CommentBlocker - Google Chrome extension gallery

The ideal thing would be for all web pages to have their comments initially hidden, with a "show comments" button I can click.

I would also like an "ignore user" button that I can apply myself to web forums, comments, ebay sellers, etc. that will make it so I never have to see the web contribution from certain sources ever again.

02-27-11 - Nowhere to invest

I believe there is basically nowhere good to invest any more. Bonds, commmodities, stocks, savings, etc.

I believe the fundamental reason is because of hedge funds, large automatic quant funds, cheap fed capital, and the easy flow of huge amounts of money. Basically if there is an edge anywhere in the market, the large funds will immediately pump tons of money into that sector, and that edge will go away. They move funds very quickly and they move enough to shift the profit margin, and they keep doing it until it's no longer desirable.

Put another way, why the hell would a company want your capital? The reason why investments return a profit is because somebody needs your money for their company. In exchange for your $10k they pay you back some interest. But when the fed funds rate is near 0%, the big banks can suck out any amount of money they want and give it to whoever needs capital. So why the hell would any company deal with paying a return to small individual investors when they can just get capital so easily from the big banks? The answer of course is that they don't. When capital is cheap, the rate of return on investments rapidly goes to zero.

I believe that the last 100 years or so (and mainly from 1950-2000) has been an anomaly - a period when individuals could easily get a nice return from investments, and that there's no reason to think we will have this in the future. In the last 100 years stocks have return 3-5% after inflation, which roughly tracks GDP growth (it's no surprise; total stock market value and dividend yield both directly track GDP growth).

Now, obviously if you look at the market recently it doesn't look weak. But I believe that we are in a very large bubble. It's an unusual bubble because it appears to be affecting almost every type of investment - including things that should normally track opposite of each other. That's very strange. A quick rundown of what I think is happening :

Stocks : I believe the current stock bubble is driven by these factors : 1. massive free money from the Fed has to go somewhere; when it turns into buy orders, stocks have to rise (obviously in theory the traders could say "no I don't want the free money because I don't see anywhere profitable to invest it", but LOL that doesn't actually happen). 2. massive free money makes the economic numbers look better than they really are. 3. under-counting of inflation and unemployment makes the economic numbers look better than they are. 4. individual investors and such have tons of money in 401ks and such and don't know what else to do with it, so it goes back into the market. I think everyone sane should realize that stock valuations right before the Great Recession were way out of whack - and values are right back to that level, which makes no sense since nothing good has happened in the economy, in fact we're fundamentally much worse now than before the GR.

Gold : historically, from 1850-2000, gold returned slightly *below* inflation, eg. real return was negative. The only period when it returned well was during the massive inflation era of the 70's (and the last 10 years). Gold should generally track against other economic trends. Yet gold has done very well for the past 10 years, even during times that stocks have gone up and inflation is nominally low. I don't think this makes any sense and I believe that gold is in a massive bubble. If nothing else, when you have people like Glen Beck touting gold, and all the "Buy Gold Now!" web sites pushing it, there's obviously some bubble aspect, and it's all very reminiscent of real estate 5 years ago.

Real Estate : residential & commercial are both bad. Oddly, REIT funds are up to levels matching before the GR, to me this clearly indicates they are at bubble levels. Residential values in most of the US never fell as much as they should have, and tons of the MBS's that the Fed continues to buy up are not worth what they claim. Some people think commercial real estate is due for a big crash; I'm not sure a crash will actually happen, but if it doesn't it will mean a weak market for many years.

Bonds - oddly bond values have gone up at the same time as stocks. Treasuries in particular have been hovering around zero yield, which only makes sense if you think there is a real risk of massive corporate failure or currency crisis or other reasons why you wouldn't just keep your money in a bank returning 1% or a AAA corporate bond returning 4%. (BTW I was always a little confused about the whole idea that bond values rise when yields drop, but really it's very simple. Yields are the return of newly issues bonds. If you have an old bond that returns 4% and the newly issues yield is 1%, then people who want a 1% return could buy your old bond at a 3% higher price and get the same yield ; that is, prices move to make the final return at maturity the same). Bond prices should generally track against stocks, but they've both been doing quite well since the GR.

I don't see anywhere good to put money. And furthermore I do think there is some risk of heavy inflation, so sitting with cash in the bank isn't great either. (a few ideas : I believe gold is so inflated it might actually be a good short; it's also possible that real estate in some areas might be a good investment, at least real estate is a hedge against inflation even if it doesn't return anything; unfortunately Seattle is one of the places where real estate hasn't fallen as much as it should which will lead to long term price stagnation, or even declines in real dollars).

2/24/2011

02-24-11 - RRZ On 16 bit Images

Jan Wassenberg sent me a 16 bit test image, so I got my old PNG-alike called RRZ working on 16 bit. (many old posts on RRZ, search for PNG).

The predictors all work on a ring, that is, they wrap around [0,uint_max] so you need to use the right uint size for your pixel type. To make this work I just took my 8-bit code and made it a template, and now I work on 8,16, and 32 bit pixels.

RRZ without any changes does pretty well on 16 bit data :


Original : (3735x2230x4x2)
66,632,400

Zip :
38,126,464

PNG : (*1)
26,002,734

JPEG-2000 :
22,404,146 

JPEG-XR :
19,783,184

RRZ default :  (-m5 -z3 -fa -l0) (*2)
24,169,080

My filter 4 + Zip : (*3)
21,880,451

RRZ with zip-like options : (-m3 -z4 -f4 -l0)
20,907,541

RRZ optimized : (-m3 -z5 -f4 -l1)
17,626,222

My filter 4 + LZMA :
16,011,226

*1 : I ran pngcrush but couldn't run advpng or pngout because they fail on 16 bit data.

*2 : min match len of 5 is the default (-m5) because I found in previous testing that this was best most often. In this case, -m3 is much better. My auto-optimizer finds -m3 successfully. Also note that seekChunkReset is *off* for all these RRZ's.

*3 : filter 4 = ClampedGrad, which is best here; default RRZ filter is "adaptive" because that amortizes against really being way off the best choice, but is usually slightly worse than whatever the best is. Even when adaptive actually minimizes the L2 norm of prediction residuals, it usually has worse compression (than a uniform single filter) after LZH because it ruins repeated patterns since it is chosing different predictors on different scan lines.

Note that I didn't do anything special in the back-end for the 16 bit data, the LZH still just works on bytes, which means for example that the Huffman gets rather confused; the most minimal change you could do to make it better would be to make your LZ matches always be even numbers - so you don't send the bottom bit of match len, and to use two huffmans for literals - one for odd positions and one for even positions. LZMA for example uses 2 bits of position as context for its literal coding, so it knows what byte position you are in. Actually its surprising to me how close RRZ (single huffman, small window) gets to LZMA (arithmetic, position context, large window) in this case. It's possible that some transpose might help compression, like doing all the MSB's first, then all the LSB's, but maybe not.

ADDENDUM : another thing that would probably help is to turn the residual into a variable-byte code. If the prediction residual is in [-127,127] send it in one byte, else send 0xFF and send a two byte delta. This has the disadvantage of de-aligning pixels (eg. they aren't all 6 or 8 bytes now) but for small window LZ it means you get to fit a lot more data in the window. That is, the window is a much larger percentage of the uncompressed file size, which is good.

As part of this I got 16-bit PNG reading & writing working, which was pretty trivial. You have to swap your endian on Intel machines. It seems to be a decent format for interchanging 16 bit data, in the sense that Photoshop works with it and it's easy to do with libPNG.

I also got my compressor working on float data. The way it handles floats is via lossless conversion of floats to ints in an E.M fixed point format, previously discussed here and here . This then lets you do normal integer math for the prediction filters, losslessly. As noted in those previous posts, normal floats have too much gap around zero, so in most cases you would be better off by using what I call the "normal form" which treats everything below 1.0 as denorm (eg. no negative exponents are preserved) though obviously this is lossy.

Anyway, the compressor on floats seems to work fine but I don't have any real float/HDR image source data, and I don't know of any compressors to test against, so there you go.

ADDENDUM: I just found that OpenEXR has some sample images, so maybe I'll try those.

ADDENDUM 2 : holy crap OpenEXR is a retarded distribution. It's 22 MB just for the source code. It comes with their own big math and threading library. WTF WTF. If you're serious about trying to introduce a new interchange format, it should be STB style - one C header. There's no need for image formats to be so complex. PNG is over-complex and this is 100X worse. OpenEXR has various tile and multi-resolution streams possible, various compressors, the fucking kitchen sink and pot of soup, WTF.

2/23/2011

02-23-11 - Some little coder things - Tweakable Vars

So a while ago I did the Casey "tweakable C" thing. The basic idea is that you have some vars in your code, like :

static float s_tweakFactor = 1.5f;

or whatever, and your app is running and you want to tweak that. Rather than write some UI or whatever, you just have your app scan it's own source code and look for "s_tweakFactor =" (or some other trigger string) and reparse the value from there.

So I put this in cblib a few years ago; I use ReadDirChanges to only do the reparse when I see a .c file is changed, and I actually use a C++ constructor to register the tweakable vars at startup, so you have to use something like :


static TWEAK(float,s_tweakFactor,1.5f);

which is a little ugly but safer than the prettier alternatives. (the parser looks for TWEAK to find the vars).

I thought Casey's idea was very cool, but then I proceeded to never actually use it.

Part of the issue is that I already had a text Prefs system which already had auto-reloading from file changes, so any time I wanted to tweak things, I would make a pref file and tweak in there. That has the advantage that it's not baked into the code, eg. I can redistribute the pref with the exe and continue to tweak. In general for game tweaking I think the pref is prefferable.

But I just recently realized there is a neat usage for the tweak vars that I didn't think of. They basically provide a way to set any value in my codebase by name programatically.

So, for example, I can now set tweak vars from command line. You just use something like :


app -ts_tweakFactor=2.f fromfile tofile arg arg

and it lets you do runs of your app and play with any variable that has TWEAK() around it.

The other thing it lets me do is optimize any variable. I can now use the generic Search1d thing I posted earlier and point it anything I have registered for TWEAK and it can search on that variable to maximize some score.

02-23-11 - Some little coder things - Loop

We talked a while ago about how annoying and error-prone for loops are in C. Well at first I was hesitant, but lately I have started using "for LOOP" in earnest and I can now say that I like it very much.

#define LOOP(var,count) (int var=0;(var) < (count);var++)
#define LOOPBACK(var,count) (int var=(count)-1;(var)>=0;var--)
#define LOOPVEC(var,vec)    (int var=0, loopvec_size = (int)vec.size();(var) < (loopvec_size);var++)

so for example, to iterate pixels on an image I now do :

for LOOP(y,height)
{
    for LOOP(x,width)
    {
        // do stuff
    }
}

the way I can tell that this is good is because I find myself being annoyed that I don't have it in my RAD code.

There are tons of advantages to this that I didn't anticipate. The obvious advantages were : less bugs due to mistakes in backwards iteration with unsigned types, reducing typing (hence less typo bugs), making it visually more clear what's happening (you don't have to parse the for(;;) line to make sure it really is a simple counting iteration with nothing funny snuck in.

The surprising advantages were : much easier to change LOOP to LOOPBACK and vice versa, much easier to use a descriptive variable name for the iterator so I'm no longer tempted to make everything for(i).

One thing I'm not sure about is whether I like LOOPVEC pre-loading the vector size. That could cause unexpected behavior is the vector size changes in the iteration.

ADDENDUM :

Drew rightly points out that LOOPVEC should be :


#define LOOPVEC(var,vec)    (int var=0, var##size = (int)vec.size();(var) < (var##size);var++)

to avoid variable name collisions when you nest them. But I think it should probably just be

#define LOOPVEC(var,vec)    (int var=0; (var) < (int)vec.size(); var++)

Though that generates much slower code, when you really care about the speed of your iteration you can pull the size of the vec out yourself and may do other types of iterations anyway.

02-23-11 - Some little coder things - Error cleanup with break

I hate code that does error cleanup in multiple places, eg :

    FILE * fp = fopen(fileName,"wb");

    if ( ! stuff1() )
    {
        fclose(fp);
        return false;
    }
    
    if ( ! stuff2() )
    {
        fclose(fp);
        return false;
    }

    // ok!
    return true;

the error cleanup has been duplicated and this leads to bugs.

In the olden days we fixed this by putting the error return at the very end (after the return true) and using a goto to get there. But gotos don't play nice with C++ and are just generally deprecated. (don't get me started on setjmp, WTF is libpng thinking using that archaic error handling system? just because you think it's okay doesn't mean your users do)

Obviously the preferred way is to always use C++ classes that clean themselves up. In fact whenever someone gives me code that doesn't clean itself up, I should just immediately make a wrapper class that cleans itself up. I find myself getting annoyed and having bugs whenever I don't do this.

There is, however, a cleanup pattern that works just fine. This is well known, but I basically never ever see anyone use this, which is a little odd. If you can't use C++ self-cleaners for some reason, the next best alternative is using "break" in a scope that will only execute once.

For example :


rrbool rrSurface_SaveRRSFile(const rrSurface * surf, const char* fileName)
{
    FILE * fp = fopen(fileName,"wb");
    if ( ! fp )
        return false;
    
    for(;;)
    {
        rrSurface_RRS_Header header;
        
        if ( ! rrSurface_FillHeader(surf,&header) )
            break;
    
        if ( ! rrSurface_WriteHeader(fp,&header) )
            break;
        
        if ( ! rrSurface_WriteData(fp,surf,&header) )
            break;
        
        // success :
        
        fclose(fp);
        return true;
    }
    
    // failure :
    
    fclose(fp); 
    return false;
}

Really the break is just a simple form of goto that works with C++. When you have multiple things to cleanup obvious you have to check each of them vs uninitialized.

(BTW this example is not ideal because it doesn't give you any info about the failure. Generally I think all code should either assert or log about errors immediately at the site where the error is detected, not pass error codes up the chain. eg. even if this code was "good" and had a different error return value for each type of error, I hate that shit, because it doesn't help me debug and get a breakpoint right at the point where the error is happening.)

ADDENDUM :

Another common style of error cleanup is the "deep nest with partial cleanup in each scope". Something like this :


  bool success = false;
  if ( A = thing1() )
  {
    if ( B = thing2() )
    {
      if ( C = thing3() )
      {
        success = true;
        cleanup C;
      }
      cleanup B;
    }
    cleanup A;
  }

I really hate this style. While it doesn't suffer from duplication of the cleanups, it does break them into pieces. But worst, it makes the linear code flow very unclear and introduces a deep branching structure that's totally unnecessary. Good code should be a linear sequence of imperatives as much as possible. (eg. do X, now do Y, now do Z).

I think this must have been an approved Microsoft style at some point because you see it a lot in MSDN samples; often the success code path winds up indented so far to the right that it's off the page!

02-23-11 - Some little coder things - Clip

I wrote a little app called "clip" that pastes its args to the clipboard. It turns out to be very handy. For example it's a nice way to get a file name from my DOS box into some other place, because DOS does arg completion, I can just type "clip f - tab" and get the name.

The other big place its been useful is copying command lines to the MSVC debugger property sheet, and turning command lines into batch files.

Clip is obviously trivial, the entire code is :


void CopyStringToClipboard(const char * str);

int main(int argc,const char *argv[])
{
    String out;
    
    for(int argi=1;argi < argc;argi++)
    {
        if ( argi > 1 ) out += " ";
        out += argv[argi];
    }
    
    lprintf("clip : \"%s\"\n",out);
    
    CopyStringToClipboard(out.CStr());
                
    return 0;
}

void CopyStringToClipboard(const char * str)
{

    // test to see if we can open the clipboard first before
    // wasting any cycles with the memory allocation
    if ( ! OpenClipboard(NULL))
        return;
        
    // Empty the Clipboard. This also has the effect
    // of allowing Windows to free the memory associated
    // with any data that is in the Clipboard
    EmptyClipboard();

    // Ok. We have the Clipboard locked and it's empty. 
    // Now let's allocate the global memory for our data.

    // Here I'm simply using the GlobalAlloc function to 
    // allocate a block of data equal to the text in the
    // "to clipboard" edit control plus one character for the
    // terminating null character required when sending
    // ANSI text to the Clipboard.
    HGLOBAL hClipboardData;
    hClipboardData = GlobalAlloc(GMEM_DDESHARE,strlen(str)+1);

    // Calling GlobalLock returns to me a pointer to the 
    // data associated with the handle returned from 
    // GlobalAlloc
    char * pchData;
    pchData = (char*)GlobalLock(hClipboardData);
            
    // At this point, all I need to do is use the standard 
    // C/C++ strcpy function to copy the data from the local 
    // variable to the global memory.
    strcpy(pchData, str);
            
    // Once done, I unlock the memory - remember you 
    // don't call GlobalFree because Windows will free the 
    // memory automatically when EmptyClipboard is next 
    // called. 
    GlobalUnlock(hClipboardData);
            
    // Now, set the Clipboard data by specifying that 
    // ANSI text is being used and passing the handle to
    // the global memory.
    SetClipboardData(CF_TEXT,hClipboardData);
            
    // Finally, when finished I simply close the Clipboard
    // which has the effect of unlocking it so that other
    // applications can examine or modify its contents.
    CloseClipboard();
}

(BTW note that the lprintf of my string class in main is not a bug - that's an autoprintf which handles everything magically and fantastically)

(I didn't remember where I got that clipboard code, but a quick Google indicates it came from Tom Archer at CodeProject )

2/13/2011

02-13-11 - JPEG Decoding

I'm working on a JPEG decoder sort of as a side project. It's sort of a nice small way for me to test a bunch of ideas on perceptual metrics and decode post-filters in a constrained scenario (the constraint is baseline JPEG encoding).

I also think it's sort of a travesty that there is no mainstream good JPEG decoder. This stuff has been in the research literature since 1995 (correction : actually, much earlier, but there's been very modern good stuff since 95 ; eg. the original deblocker suggestion in the JPEG standard is no good by modern standards).

There are a few levels for good JPEG decoding :

  • 0. Before even getting into any post-filtering you can do things like laplacian-expected dequantization instead of dequantization to center.

  • 1. Realtime post-filtering. eg. for typical display in viewers, web browsers, etc. Here at the very least you should be doing some simple deblocking filter. H264 and its derivatives all use one, so it's only fair.

  • 2. Improved post-filtering and deringing. Various better filters exist, most are variants of bilateral filters with selective strengths (stronger at block boundaries and ringing-likely areas (the ringing-likely areas are the pixels which are a few steps away from a very strong edge)).

  • 3. Maximum-a-posteriori image reconstruction given the knowledge of the JPEG-compressed data stream. This is the ultimate, and I believe that in the next 10 years all image processing will move towards this technique (eg. for super-resolution, deconvolution (aka unblur), bayer de-mosaicing, etc etc). Basically the idea is you have a probability model of what images are likely a-priori P(I) and you simply find the I that maximizes P(I) given that jpeg_compress(I) = known_data. This is a very large modern topic that I have only begun to scratch the surface of.

It's shameful and sort of bizarre that we don't even have #1 (*). Obviously you want different levels of processing for different applications. For viewers (eg. web browsers) you might do #1, but for loading to edit (eg. in Photoshop or whatever) you should obviously spend a lot of time doing the best decompress you can. For example if I get a JPEG out of my digital camera and I want to adjust levels and print it, you better give me a #2 or #3 decoder!

(* : an aside : I believe you can blame this on the success of the IJG project. There's sort of an unfortunate thing that happens where there is a good open source library available to do a certain task - everybody just uses that library and doesn't solve the problem themselves. Generally that's great, it saves developers a lot of time, but when that library stagnates or fails to adopt the latest techniques, it means that entire branch of code development can stall. Of course the other problem is the market dominance of Photoshop, which has long been the pariah of all who care about image quality and well implemented basic loaders and filters)

So I've read a ton of papers on this topic over the last few weeks. A few notes :

"Blocking Artifact Detection and Reduction in Compressed Data". They work to minimize the MSDS difference, that is to equalize the average pixel steps across block edges and inside blocks. They do a bunch of good math, and come up with a formula for how to smooth each DCT coefficient given its neighbors in the same subband. Unfortunately all this work is total shit, because their fundamental idea - forming a linear combination using only neighbors within the same subband - is completely bogus. If you think about only the most basic situation, which is you have zero AC's, so you have flat DC blocks everywhere, the right thing to do is to compute the AC(0,1) and AC(1,0) coefficients from the delta of neighboring DC levels. That is, you correct one subband from the neighbors in *other* subbands - not in the same subband.

Another common obviously wrong fault that I've seen in several paper is using non-quantizer-scaled thresholds. eg. many of the filters are basically bilateral filters. It's manifestly obvious that the bilateral pixel sigma should be proportional to the quantizer. The errors that are created by quantization are proportional to the quantizer, therefore the pixel steps that you should correct with your filter should be proportional to the quantizer. One paper uses a pixel sigma of 15 , which is obviously tweaked for a certain quality level, and will over-smooth high quality images and under-smooth very low quality images.

The most intriguing paper from a purely mathematical curiosity perspective is "Enhancement of JPEG-compressed images by re-application of JPEG" by Aria Nosratinia.

Nosratinia's method is beautifully simple to describe :


Take your base decoded image

For all 64 shifts of 0-7 pixels in X & Y directions :

  At all 8x8 grid positions that starts at that shift :

    Apply the DCT, JPEG quantization matrix, dequantize, and IDCT

Average the 64 images

That's it. The results are good but not great. But it's sort of weird and amazing that it does as well as it does. It's not as good at smoothing blocking artifacts as a dedicated deblocker, and it doesn't totally remove ringing artifacts, but it does a decent job of both. On the plus side, it does preserve contrast better than some more agressive filters.

Why does Nosratinia work? My intuition says that what it's doing is equalizing the AC quantization at all lattice-shifts. That is, in normal JPEG if you look at the 8x8 grid at shift (0,0) you will find the AC's are quantized in a certain way - there's very little high frequency energy, and what there is only occurs in certain big steps - but if you step off to a different lattice shift (like 2,3), you will see unquantized frequencies, and you will see a lot more low frequency AC energy due to picking up the DC steps. What Nosratinia does is remove that difference, so that all lattice shifts of the output image have the same AC histogram. It's quite an amusing thing.

One classic paper that was way ahead of its time implemented a type 3 (MAP) decoder back in 1995 : "Improved image decompression for reduced transform coding artifacts" by O'Rourke & Stevenson. Unfortunately I can't get this paper because it is only available behind IEEE pay walls.

I refuse to give the IEEE or ACM any money, and I call on all of you to do the same. Furthermore, if you are an author I encourage you to make your papers available for free, and what's more, to refuse to publish in any journal which does not give you all rights to your own work. I encourage everyone to boycott the IEEE, the ACM, and all universities which do not support the freedom or research.

2/11/2011

02-11-11 - Some notes on EZ-trees

I realized a few weeks ago that there is an equivalence between EZ-tree coding and NOSB unary less-than-parent coding. Let me explain what that means.

EZ-tree coding means coding values in bitplanes tree-structured flagging of significance and insignificance. "NOSB" means "number of singificant bits". "significant" at bit level b means the value is >= 2^b . (if you like, this is just countlz , it's the position of the top bit, 0 means there is no top bit, 1 means the top bit is the bottom bit, etc)

"NOSB" encoding is a way of sending variable length values. You take the number, find the number of signficant bits, then you send that number using some scheme (such as unary), and then send the bits. So, eg. the value 30 (30 = 11110) needs 5 bits, so first you send 5 (using unary that would be 111110), then you send the bottom 4 bits = 1110.

A few unary-NOSB encoded values for example :


0 : 0
1 : 10
2 : 110,0
3 : 110,1
4 : 1110,00
5 : 1110,01
6 : 1110,10
7 : 1110,11

Okay, now about EZ-trees. To be concrete I'll talk about a 4x4 image :


+++++++++
|A|B|b|b|
+++++++++
|C|D|b|b|
+++++++++
|c|c|d|d|
+++++++++
|c|c|d|d|
+++++++++

The traditional EZ-tree using a parent child relationship where the lower case quartets (bbbb,cccc,dddd) are children of the upper case letters (B,C,D). The spot B has its own value, and it also acts as the parent of the b quartet. In a larger image, each of the b's would have 4 kids, and so on.

In all cases we are talking about the absolute value (the magnitude) and we will send the sign bit separately (unless the value is zero).

EZ-tree encoding goes like this :


1. At each value, set 
significance_level(me) = NOSB(me)
tree_significance_level(me) = MAX( significance_level(me), tree_significance_level(children) )

so tree_significance_level of any parent is >= that of its kids (and its own value)

2. Send the max tree_significance_level of B,C, and D
  this value tells us where to start our bitplane iteration

3. Count down from level = max significance_level down to 0
  For each level you must rewalk the whole tree

4. Walk the values in tree order

5. If the value has already been marked signficant, then transmit the bit at that level

5.B. If this is the first on bit, send the sign

6. If the value has not already been sent as significant, send tree_significant ? 1 : 0

6.B. If tree not significant, done

6.C. If tree significant, send my bit (and sign if on) and proceed to children 
    ( (**) see note later)

In the terms of the crazy EZW terminology, if your tree is significant but the value is not significant, that's called an "isolated zero". When you and your children are all not singificant, that's called a "zerotree". etc.

Let's assume for the moment that we don't truncate the stream, that is we repeat this for all singificance levels down to zero, so it is a lossless encoder. We get compression because significance level (that is, log2 magnitude) is well correlated between parent-child, and also between spatial neighbors within a subband (the quartets). In particular, we're making very heavy use of the fact that significance_level(child) <= singificance_level(parent) usually.

The thing I realized is that this encoding scheme is exactly equivalent to NOSB coding as a delta from parent :


1. At each value, set NOSB(me) = number of singificant bits of my value, then
NOSB(me) = MAX( NOSB(me) , NOSB(children) )

2. Send the maximum NOSB of B,C, and D 
  this value will be used as the "parent" of B,C,D

3. Walk down the tree from parents to children in one pass

4. At each value :
    Send ( NOSB(parent) - NOSB(me) ) in unary
   note that NOSB(parent) >= NOSB(me) is gauranteed

5. If NOSB(me) is zero, then send no bits and don't descend to any of my children

6. If NOSB(me) is not zero, send my bits plus my sign and continue to my children

This is not just similar, it is exactly the same. It produces the exact same output bits, just permuted.

In particular, let's do a small example of just one tree branch :


B = 6 (= 110)

bbbb = {3,0,1,2}

Significance(B) = 3

Significance(bbbb) = {2,0,1,2}

And the EZ-tree encoding :

Send 3 to indicate the top bit level.

Level 3 :

send 1  (B is on)
  send 1 (a bit of B)
  send a sign here as well which I will ignore

Go to children
send 0000 (no children on)

Level 2 :

B is on already, send a bit = 1

Go to children
send significance flags :
1001

For significant values, send their bits :
1  1

Level 1 :

B is on already, send a bit = 0

Go to children
send significance flags for those not sent :
 01

send bits for those already singificant :
1 10

------------

Bits sent :

1
1
 0000
1
 1001
 1  1
0
  01
 1 10

but if we simply transpose the bits sent (rows<->columns) we get :

1110

0111
000
0011
0110

Which is clearly unary + values :

1 + 1110

01 + 11
000 (*)
001 + 1
01 + 10

* = unary for 3 would be 0001 , but there's no need to send the last 1
because we know value is <= 3

exactly the same !

(**) = actually at the bottom level (leaves) when you send a significance flag you don't need to send the top bit. The examples worked here treat the b,c,d groups as nodes, not final leaves. If they were leaves, the top bits should be omitted.

So, that's pretty interesting to me. Lots of modern coders (like ADCTC) use NOSB encoding, because it gives you a nice small value (the log2) with most of the compressability, and then the bits under the top bit are very uncompressable, and generally follow a simple falloff model which you can context-code using the NOSB as the context. That is, in modern coders the NOSB of a value is first arithmetic coded using lots of neighbor and parent information as context, and then bits under the top bit are coded using some kind of laplacian or gaussian simple curve using the NOSB to select the curve.

We can see that EZW is just a NOSB coder where it does two crucial things : set NOSB(parent) >= NOSB(child) , and transmit NOSB as | NOSB(parent) - NOSB(child) |. This relies on the assumption that parents are generally larger than kids, and that magnitude levels are correlated between parents and kids.

Forcing parent >= child means we can send the delta unsigned. It also helps efficiency a lot because it lets us stop an entire tree descent when you hit a zero. In the more general case, you would not force parent to be >= child, you would simply use the correlation by coding ( NOSB(parent) - NOSB(child) ) as a signed value, and arithmetic-code / model that delta ( using at least NOSB(parent) as context, because NOSB(parent) = 0 should very strongly predict NOSB(child) = 0 as well). The big disadvantage of this is that because you can send child > parent, you can never stop processing, you must walk to all values.

Of course we can use any parent-child relationship, we don't have to use the standard square quartets.

The NOSB method is vastly preferrable to the traditional EZ-Tree method for speed, because it involves only one walk over all the data - none of this repeated scanning at various bit plane levels.

A few more notes on the EZ-Tree encoding :

At step 6, when you send the flag that a tree is significant, there are some options in the encoding. If your own value is on, then it's possible that all your children are off. So you could send another flag bit indicating if all your children are 0 or not. Generally off is more likely than on, so you could also send the number of children that are on in unary, and then an identifier of which ones are on; really this is just a fixed encoding for the 4 bit flags so maybe you just want to Huffman them.

The more interesting case is if you send that your tree is significant, but your own value is *off*. In that case you know at least one of your children must be significant, and in fact the case 0000 (all kids insignificant) is impossible. What this suggests is that a 5 bit value should be made - one bit from the parent + the four child flags, and it should be Huffman'ed together. Then the 5 bit value 00000 should never occur.

It's a little unclear how to get this kind of action in the NOSB formulation. In particular, the fact that if parent is sigificant, but parents bits so far are zero, then one of the kids must be on - that requires coding of the children together as a unit. That could be done thusly : rather than using unary, take the delta of NOSB from parent for all of the children. Take the first two bits or so of that value and put them together to make an 8 bit value. Use parent bit = 0 as a 1 bit context to select two different huffmans and use that to encode the 8 bits.

Finally, a few notes on the "embedded" property of EZ-trees ; that is, the ability to truncate and get lower bitrate encodings of the image.

Naively it appears that the NOSB formulation of the encoding is not truncatable in the same way, but in fact it is. First of all, if you truncate entire bit levels off the bottom, you can simply send the number of bit levels to truncate off and then you effecitvely just shift everything down by that number of bits and then proceed as normal. If you wish to truncate in the middle of a bit level, that means only sending the bottom bit for the first N values in that bit level, and then storing 0 implicitly for the remaining values. So you just have to send N and then check in the decoder; in the decoder for the first N values it reads all the bits, and then for remaining values it reads NOSB-1 bits and puts a zero in the bottom. Now you may say "that's an extra value you have to send" ; well, not really. In the EZ-tree if you just truncate the file you are effectively sending N in the file size - that is, you're cheating and using the file size as an extra information channel to send your truncation point.

One thing I don't see discussed much is that EZ-tree truncated values should not just be restored with zeros. In particular, truncation is not the same as quantization at a coarser level, because you should sometimes round up and set a higher bit. eg. say you have the value 7 and you decided to cut off the bottom 3 bits. You should not send 0, you should send 8 >> 3 = 1.

A related issue is how you restore missing bottom bits when you have some top bits. Say you got 110 and then 3 bits are cut off the bottom so you have [110???] you should not just make zeros - in fact you know your value is in the range 110000 - 110111 ; filling zeros puts you at the bottom of the range which is clearly wrong. You could go to the middle of the range, but that's also slightly wrong because image residuals have laplacian distribution, so the expected value is somewhere below the middle of the range. I have more complex solutions for this, but one very simple bit-shifty method is like this :


To fill the missing bits
Add one 0 bit
Then repeat the top bits

So 110??? -> 110 + 0 + 11 , 101???? -> 1010101 , etc.

Of course the big deal in EZ-trees is to sort the way you send data so that the most important bits come first. This is like R/D optimizing your truncation. See SPIHT, EBCOT, etc. Modern implementations of JPEG2000 like Kakadu have some perceptual D heuristic so that they do more truncation where bits are less important *perceptually* instead of just by MSE.

old rants