10/31/2008

10-31-08 Junk

The new blog makes me feel like I can't just write drivel and nonsense like a normally do. I feel more aware that I'm being watched and the personal nonsense seems more inappropriate than ever.

My lappy's been acting kind of weird so I popped open the Event Viewer. WTF there's a ton of bizarre errors. Some thing about NetBIOS browser masters having elections !? Some stuff about NAT and internet connection sharing; WTF I have ICS disabled. Ugh. God damn I hate computers. The fucking HTPC keeps having retarded problems that I find so damn frustrating. Every time I look at it, just sitting there all smug lording over my living room I want to just beat it up. Fuck you for not showing media info on the LCD. Fuck you for randomly deciding to un-maximize the video player window. Fuck you for skipping ahead 4 fucking minutes when I hit the jump ahead button on video.

I'm so fucking sick of software and all its problems. My email is full of spam.

My favorite things have been shit recently.

The last Good Eats about meat pies : the good thing about Good Eats is when Alton drops some interesting science or techniques, the horrible part is the cheezy fucking skits and his cringe-worthy idea of humor. I know, why don't we make a whole episode that's 100% skit! Urg!!

Anthony Bourdain : No Reservations. The great part about the show is seeing the real cuisine of different countries, the travel, and in fact it's 90% about his guides, some of the episodes suck when he has a lame guide (like that Russian guy) but some of them rock when he has a really good guide (Greece, Korea). The worst part is Tony's pretentious pseudo-intellectual grandiose voice over. You just have to roll your eyes and laugh and try to ignore his philosophizing. I know, let's make a whole episode called "At the Table" where we remove all the good parts (the travel) and just have Tony sit and spew retarded cocktail party conversation for an hour.

And then the fucking Buzzing Fly show is fucking talk radio with Justin Martin. I like Justin's music quite well and I understand Ben wants to pimp his label but this is disgusting. The questions are like a Playboy Playmate interview. What are your turn ons and turn offs? How did you get into making music? What do you like to do other than make music? And his answers are even sadder, apparently he's a complete moron. I almost always hate seeing interviews with musicians because their total retardation ruins my illusion that they're these interesting cool smart people locked away in a dungeon creating great works. I don't like reading about movie directors or actors or writers. I want to enjoy the art that they create without my head being filled with their own thoughts on it, or the hollywood gossip about them, or anything except the art product itself.

10-31-08 Pirates

Yarrrr. Shiver me timbers. I just went pee and there was a pirate in the bathroom. Then I came back to work and was doing some searching for things and found this forum with an amusing discussion of UT3 Todos - the most awesome thing about it is that all the posters write like they're pirates! WTF!?

10/30/2008

10-30-08 Oodle Rev Tracking

I'd like Oodle to work really nicely with running the "compiled" version all the time, and server compiles and local changes, but I'm starting to think it's impossible.

The way I would like it to work :

A server somewhere builds nice optimized bundles of the levels from checked in content
The server puts those on a share somewhere

Clients (artists, etc.) sync to the server bundles

When a client runs the game, by default it loads from the server bundles

That has many advantages - they get fast loads, and the game runs like it will on ship, so the artists can
see paging problems, memory use, all that good stuff.  I'm a big believer that having your developers run as
similar to the real shipping product as possible is a huge win.

When clients have newer local copies of content, the game loads those individual files as patches
on top of the server bundles.

Thus clients can make changes and see them instantly in the game without rebaking the whole level,
but they still get most of the advantage of using the compiler server bundling.

That's all great in theory, but it falls apart when you start looking at this question of which files go in the patch. That is, which files should I load from the compiled bundles and which should I load singly as local overrides.

The first idea is to use modtimes. That falls down quickly even in very simple cases. Just using the modtimes of the bundles works fine on your local machine if you never get stuff from a server, but once you share files across machines the modtimes are worthless (particularly if you use something like perforce and have server modtime turned off, which you probably do; the perforce modtime option is strongly discouraged by perforce and by me).

One option is just to punt and not automatically load the client versions. Make the client take an extra step to get the local overrides to load. One way to do that is to just go ahead and use modtime, which will fail sometimes, but tell them to "touch" a file to get it in the game. You could provide helpers to make that easier, like for example looking at the p4 open for edit list and making sure those all come in as overrides.

10-30-08 C# - 1

C# is obviously a copy of Java, MS wanted complete control and didn't like the open standard (and Sun is a bunch of bitches so I don't really blame MS on this one). Anyway, there is a certain simplicity and cleanliness to both C# and Java, and I think it almost all comes from the fact that they force you to stop doing silly micro-optimizations and worry more about your higher level code.

Everything that C# does that makes it clean you could do in C++. Initialize every value. Use garbage collection. Make everything an allocated GC'ed object so that you can put it in containers. Make everything derive from Object so you can put them in super generic containers. Dynamic cast between objects & throw exceptions. I think all of that is quite sensible. It makes code clean and safe, easy to write and easy to use.

But the vast majority of C programmers resist those things. They insist on using things like objects on the stack or char [] buffers for strings for the sake of efficiency. They don't initialize values when they "know" they don't need to, for efficiency. Once in a while that's actually important, but most of the time it's a mistake.

The nice thing about C# is you no longer have those options - and best of all, you stop thinking about it ! Even when I'm writing C++ I find myself thinking "is this ok? is this fast enough? am I using too much overhead?". In something like C# you have no option so you just stop worrying about it and that frees up your mind to focus on getting things done.

10/29/2008

10-29-08 Blog Ho!

Sweet! Somebody at Google flipped my switch and my blog is now getting up. (URG; update : it looks like my limit is now 1000 posts a day and I just ran out). I did some fixes in TextBlog and it's running pretty nicely now. One fix was to try-catch every network op and retry them a few times, because the server can just randomly drop connection once in a while. The other fix was to only query the list of entries once instead of doing a stupid N^2 loop.

It's still a bit slow to just upload one post. The problem is that I have to get the list of all the blog entries to see if you're doing an update or a new post. Getting the list is slow because I can only get 25 at a time and I have to get 25, follow the next page link, wash rinse repeat. Just getting the list of all the posts takes something like 10 seconds. I guess if I was confident that my local disk cache of the posts was accurate, then I could just use the local cache and skip that completely, but I'm trying to be extra safe for now.

Speaking of extra safety, it's something I'm a little worried about in Oodle. The issue arises with "baked" content and original file syncing, and different versions of the game, different versions of the baker, and different versions of the original file. There are a huge host of possible problems.

The basic bad situation goes like this :

Artist updates a file on their machine

Oodle does an automatic local Bake of that file to get it in the game

They see the file in the game and are happy

At the same time, a server is running an auto-bake of the whole game using an older rev of the source file

The server checks in the bundles from the older file, but the bundle itself from the server is newer

Artist sync to the server, the newer bundle comes down

The game prefers the newer bundle, and the artist no longer sees their edit
The fix is for the bundles to track a lot of information about how they there made. The mod time of the files they were made from, maybe the Perforce revision #, who made them, what version of the code made them, etc. The disadvantage is that I can't be minimal about loading bundles then, I have to just go ahead and load all of them to look inside and who's got the newest version of everything.

I think during development it's better to be safe and have a slightly slower load that does scan everything and give you the best content.

10/28/2008

10-28-08 - 1

We need to get away to somewhere sunny and warm this winter. Hawaii makes the most sense from here, but Hawaii weather in the winter is unreliable, and maybe a bit expensive and of course full of damn Americans (big advantage : nonstop flights, no passport control, dear god I hate airports). The Riviera Maya is pretty convenient and cheap and the weather is great there in winter. It's nice and all but I've been there a lot and it's not amazing (plus : nonstop flights to Cancun, though very few of them and expensive).

I'd like to go to the British Virgin Islands maybe or something but it seems like a huge pain in the ass to get there. Not really worth it unless I can stay a week. Or a month. Or maybe a few years.

10-28-08 - 3

TextBlog : .NET C# app to edit your Blogger blog as Text : TextBlog.zip (64k)

Hopefully I didn't leave my password in there or anything.

The Simple Blogger layout I've got is pretty butt ugly, but it's the only one without a ton of shit in the side pane. I hate that fucking side pane, it takes up like half the screen and makes every blog all narrow and turns a tiny bit of text into five hundred pages.

10-28-08 - 2

I think the realtime raytracing guys are rather too optimistic. The thing is, first hit raytracing against solid objects is just not interesting. It gives you absolutely zero win over rasterization, and rasterization is way way faster. They brag about 100 M rays a second, but rasterization is doing many Billions of pixels per second now.

I should back up and say they are doing very fast raytracing for coherent rays using packets that they run through SIMD, and they can do stuff like break the screen into tiles for good parallelism on multi-core or multi-proc systems. Those kind of things are the exact same optimizations we do in rasterization, and they only work for plain old rendering a big image from the camera. Also the raytracing stuff is only really fast on static scenes which is a huge problem; we need to be moving towards everything-is-dynamic worlds.

In fact, if you like, you can think of rasterization as a very clever optimization for first hit ray-tracing. They share the exact same strengths and weaknesses - they both only work well when you're shooting a ton of "rays" from the same spot with lots of coherence. They both exploit parallelism by making quads and doing them at the same time down a SIMD path, and they both fall apart if you do lots of different objects, or rays in random directions etc.

Raytracing is of course very interesting, but the raytracing that you really *want* to do is also the exact thing that's very slow. You want N bounce raytracing, and for it to really be a win over current rasterization techniques you need to be able to do good sampling at the really hard edge-on cases where your rays diverge badly and you need to shoot lots of rays and do monte carlo integration or something, eg. to get better fresnel reflections on the edge-on surfaces of objects.

It also totally tilts me the way raytracing guys grossly overstate the importance of refraction and caustics. Improving the realism of our wine glass rendering is about 1000th on the list of things we need to make games look better.

Things that would actually help :

1. Better basic lighting & shadowing. This is something that hybrid raytracing could help in the near term. I mean eventually we'd like full realtime GI, solving the rendering equation every frame, but in the near term even just better shadow maps that don't have any sampling problems would be awesome, which is something hybrid raytracing could do. If you could render your scene with rasterization and also cast 100M rays per frame for shadows, you could probably get some semi-soft shadow area light source effects, which would be cool.

2. Volumetric effects, and light & shadow through objects. Dust clouds with lighting attenuation and scattering. Light through cloth and skin and leaves.

You don't really get the cool benefits from raytracing until you can do something like 1000 non-coherent rays per pixel (to cast around and get lighting and reflections and scattering and so on). At 1000x1000 at 100 fps, that's 100 Billion rays per second (and non-coherent). We're very far away from that. Even on something like Larrabee 3 you'll be better off rasterizing and maybe just using rays for shadows.

10/27/2008

10-27-08 - 4

God damn casts are the fucking devil. I just found a huge bug in Galaxy3 where I was assuming that one struct was a prefix of another struct. In particular I was assuming that a vertex with Diffuse, 2 UVs, and a Normal, had as its prefix a vert with just Diffuse and a UV. Eight years ago or so when I wrote it, that was true. Then four years ago I changed some of the structs and it became broken, and I only just found it now. Yuck.

Obviously you want to avoid casts, but sometimes you have to do them. One thing I've started doing is making little helper functions instead of just casting. Like :


gVertexDUV & gVertPrefixDUV( gVertexDUV2N & vertex )
{
	return (gVertexDUV &) vertex;
}

then use it like :

FuncOnDUV( gVertPrefixDUV( m_vertexDUV2N ) );

instead of just doing :

FuncOnDUV( (gVertexDUV &) ( m_vertexDUV2N ) );

Now, that on its own already does a little bit for you, because it means that if you change the type of m_vertexDUV2N you will get a compile error, unlike the normal cast which will just keep casting anything. That is, the function more narrowly specifies, I want to cast X to Y, not just a cast of any type to any other type (you could get the same constraint by using static_cast and specifying both template arguments manually instead of letting one be inferred).

But the real win comes if you start checking the cast in the function :


// MEMBER_OFFSET tells you the offset of a member in a type
#define MEMBER_OFFSET(type,member)	( (size_t) &(((type *)0)->member) )

gVertexDUV & gVertPrefixDUV( gVertexDUV2N & vertex )
{
	COMPILER_ASSERT( sizeof(gVertexDUV) <= sizeof(gVertexDUV2N) );
	COMPILER_ASSERT( MEMBER_OFFSET(gVertexDUV,diffuse) == gVertexDUV(gVertexDUV2N,diffuse) );
	COMPILER_ASSERT( MEMBER_OFFSET(gVertexDUV,uv1) == gVertexDUV(gVertexDUV2N,uv1) );
	return (gVertexDUV &) vertex;
}

Stuff like that. Now you have a semi-safe cast that will warn you if you change the structs and mess up the cast.

I've also been using a little utility that looks like static_cast<> but ensures the value is of the same size :


// same_size_cast like static_cast or whatever
template < typename t_to, typename t_fm >
t_to same_size_cast( const t_fm & from )
{
	RR_COMPILER_ASSERT( sizeof(t_to) == sizeof(t_fm) );
	return *( (const t_to *) &from );
}

That's okay. I really hate the compiler warning about settings signed to unsigned and all that, because it makes me put casts in to make the warning go away, and that actually makes the code much less safe, because now it's casting god knows what if I change the type.

10-27-08 - 3

So, I wrote a while ago about figuring out a way to VPN from work to home through my router and DHCP and all that. It seemed like a real pain in the ass. Then someone pointed me at Hamachi , and BLAMMO it just works and takes like 2 seconds to set up. In fact the slowest part was that I went through their fucking tutorial cuz I thought it might have something useful in it. No, it does, not, skip it. Just create a new network, give yourself a password, then log all your machines into that network. Super blammo it works fantastique! formidable!

10-27-08 - 2

The new cbloom rants at blogspot is almost working. If one of my Google buddies who reads this can hook me up with someone at Blogger to get past this 50 post a day limit I'll be up and running. It's not up to date yet because I can only put up 50 posts a day >:|

I'm going to post the source code to my TextBlog uploader. Even if you don't like blogging from Text it's a pretty handy way to back up your blog, because it can download the whole blog to files, then you could edit some of those if you want, and it can restore your whole blog from those files. Would be a very handy way to do something like a mass find-replace on all your blog entries.

It's been a good day. The Hamachi stuff works, and I got to listen to music at work finally. I've been thinking this HTPC project has been a huge pain in the butt and I wish I was still just listening to CD's, but then something like that comes together and it's all worth it.

I got my Oodle space shooter game sort of working. It's a little fly around in Galaxy, but it loads models and textures async through Oodle. As usual it's hard enough to actually find enough content to stress the async at all. It loads like 20 megs of models in less than a second. It's still just loading single-file packages; next up : some logical pagacking, tracking how the app loads data, defining load group markers, compressing bundles, bundle baking processes, etc. etc.

10-27-08 - 1

Oh duh. I just realized there's a much better way to cycle through the images in cbsaver. I should just enumerate all the images at startup, then do a random permutation of the whole set, and then just step through that list in a straight line. That gaurantees that you see every image once before you repeat, which is pretty much exactly what you want for this kind of random slideshow.

The only nagging imperfection with that is that it rebuilds the list each time it runs. I'd like the permutation to be persistent across runs. One cheezy way to accomplish that would be using the last access time.

10/26/2008

10-26-08 - 3

I found a really nice page on all the Windows Services . So that you can turn off all the rotten fucking nonsense that MS wants you to run.

10-26-08 - 2

The NFL officiating is really fucking impressive. The guys conference and try to get the call right, they have people all over the field, and if all that fails the replay system is really good. They actually seem to care about letting the play decide the game. Contrast to basketball and soccer which are both so retardedly horribly officiated (and the result of games depends so strongly on those bogus calls) that they're just unwatchable.

The only sport that maybe has it even better than the NFL is Rugby, where the head referee is almost an active player and has a huge amount of freedom to call or not call things. That's necessary because the rules of rugby are sort of vague and if you played by the letter of The Law the games would be retarded with penalties being called constantly. Rugby refs get to use their judgement and let a lot of things slide but try to keep things fair overall. They also will get right in there and yell at players instead of stopping play, like "let go of him!" "roll away!" "let it go!" (talking about the ball), etc. It's really a wonderful thing when the head ref is good. The bad thing is that it does give the ref a lot of power in affecting games, and it works horribly if the ref is not very good.

10-26-08 - 1

So I'm finally writing the Blogger API app to post my blog for me. It's going pretty well. It works like this :

download all the posts from Blogger

parse rants.html and break it into separate posts and try to give them unique logical names

compare all the posts from parsing with what I downloaded

update any changed posts, upload any new posts

and that's all working. I have run into a few problems :

1. Blogger puts a limit of 50 posts per day on you. For my initial upload of the old rants that is really fucking annoying. It also doesn't error at all when the limit kicks in, the posts just stop showing up, so I wasted some time going "WTF is happening".

2. Fucking Blogger by default converts slash-n (LF) into a BR tag. Ugh. Even when I tell it my content-type is HTML it still does the conversion. There's a "convertLineBreaks" tag in the feed but you can't put it, only get it. Again I was pulling my hair out on that one until I found that you can disable it from the web interface.

3. The documentation is actually pretty balls. It looks good at first because the "Getting Started" guide is really nice and I'm humming along all happy, and then I start actually wanting to know things, and there's just zero detailed documentation about what all these fields are and what values are okay and what they do to the server.

4. C# is a fucking pain in the ass. Sarcastic example of how to make int i = 1;


ValueHolder< int > i = System.Int.Clone( System.ConstantIntegerValue.1 );

10/25/2008

10-25-08 - 2

You can't use floating points in compression generally because it makes you non-platform-independent due to little differences in the internal precision and so on. But if you were okay with being locked to one platform and one particular FPU (for example, say on Xenon or Cell), I wonder if you could do a very fast arithmetic coder using floating points. A normal arithmetic coder does an integer divide which is usually slower than an FPU divide by quite a bit. And you might even be able to use the fast reciprocal estimate instead of the divide, as a standard kind of giving of some compression for speed. Of course you have to turn your float back into an int to get bits in & out of the code stream and such, but you may be able to do that quickly with a magic number addition or something like that, since you know your floats will be in a limited range. I think the hardest part would be making sure all the float->int parts were fast, not the range update.

10-25-08 - 1

The Maximum Compression guy has some bizarro efficiency measure . The right way to measure a compressor is the total time to compress, transmit & decompress over a channel of a certain speed. In my case I mainly only care about decompression, so I measure the quality in terms of the time to transmit the compressed data & the time to decompress.

Here's a CSV of his data with a reasonable efficiency rating. I also make a "True Decomp" time by subtracting off 3 seconds to remove the executable load overhead and the time to write the uncompressed data to disk. Copy-paste this to a file and load with excel as a CSV. The numbers that matter are Decomp+Load at 6 MB/sec and Decomp+Load at 1 MB/sec

number,archiver,switches,solid,size,ratio (%),Comp time,Decomp time,Efficiency (lower is better),,,True Decomp,,Load 6MB/sec,,Load 1MB/sec,,Decomp+Load 6Ms,,D+L 1MBs
99,CABARC 1.00.0106,LZX:17,N,102561882,67.58,175,7,14394,,,4, ,17.093647,,102.561882,,21.093647,,106.561882
45,THOR 0.96,e5,Y,112275112,64.51,24,6,5779,,,3, ,18.71251867,,112.275112,,21.71251867,,115.275112
18,WinRAR 3.71,Normal solid 4096Kb,N,90540300,71.38,152,10,3309,,,7, ,15.09005,,90.5403,,22.09005,,97.5403
73,WINIMP 1.21,1 BEST MM 16Mb B=1020Kb,N,97241095,69.26,204,9,9316,,,6, ,16.20684917,,97.241095,,22.20684917,,103.241095
59,WINIMP 1.21,1 BEST MM,N,99025199,68.7,137,9,7621,,,6, ,16.50419983,,99.025199,,22.50419983,,105.025199
36,WINIMP 1.21,Normal MM,N,100399936,68.26,82,9,5310,,,6, ,16.73332267,,100.399936,,22.73332267,,106.399936
76,GZIP 1.2.4,(none),Y,115442964,63.51,28,7,9570,,,4, ,19.240494,,115.442964,,23.240494,,119.442964
66,AIN 2.32,(none),N,115576936,63.47,25,7,8672,,,4, ,19.26282267,,115.576936,,23.26282267,,119.576936
98,BIX 1.00 b7,(none),N,99694432,68.49,237,10,14196,,,7, ,16.61573867,,99.694432,,23.61573867,,106.694432
80,LHARK 0.4d,(none),N,113439063,64.14,37,8,10132,,,5, ,18.9065105,,113.439063,,23.9065105,,118.439063
63,PKZIP 2.50,#NAME?,N,114022347,63.96,28,8,8178,,,5, ,19.0037245,,114.022347,,24.0037245,,119.022347
52,PKZIP 2.50,(none),N,114501263,63.81,22,8,6775,,,5, ,19.08354383,,114.501263,,24.08354383,,119.501263
85,ESP 1.92,(none),N,115164577,63.6,32,8,10605,,,5, ,19.19409617,,115.164577,,24.19409617,,120.164577
87,DZip 2.90,(none),N,115269882,63.56,33,8,11065,,,5, ,19.211647,,115.269882,,24.211647,,120.269882
86,File2Pack 2.0,(none),N,115305901,63.55,32,8,10772,,,5, ,19.21765017,,115.305901,,24.21765017,,120.305901
41,THOR 0.96,e3,Y,124588656,60.62,6,7,5638,,,4, ,20.764776,,124.588656,,24.764776,,128.588656

Don't put too much stock in these numbers, his timing is very rough (as is my "true decomp" time estimate). Also stuff like RAR and WINIMP is taking advantage of special multimedia recognition to do so well. Anyway, what you can see is that on modern DVD media that does between 1MB/sec and 6 MB/sec, you really cannot afford to spend much time in the decompressor. The slower the channel, the more important it is to spend a lot of time compressing really well, if the channel is infinitely fast then memcpy is the best option. Note that loading from a hard drive you really can't afford to decompress at all (and you shouldn't really be streaming or pointer fixing either, flat load baby!).

If you refer algebra, I ran these numbers :


6 MB/sec compressed data rate


4.0 bpb at 120 MB/sec

( 256 * 4.0 / 8 ) / 6 + ( 256 / 120 ) = 23.47 seconds

3.75 bpb at 80 MB/sec

( 256 * 3.75 / 8 ) / 6 + ( 256 / 80 ) = 23.20 seconds (** winner)

3.6 bpb at 40 MB/sec

( 256 * 3.60 / 8 ) / 6 + ( 256 / 40 ) = 25.60 seconds

3.5 bpb at 20 MB/sec

( 256 * 3.50 / 8 ) / 6 + ( 256 / 20 ) = 31.50 seconds

//--------------------------------------------------

2 MB/sec compressed data rate


4.0 bpb at 120 MB/sec

( 256 * 4.0 / 8 ) / 2 + ( 256 / 120 ) = 66.13 seconds

3.75 bpb at 80 MB/sec

( 256 * 3.75 / 8 ) / 2 + ( 256 / 80 ) = 63.20 seconds (** winner)

3.6 bpb at 40 MB/sec

( 256 * 3.60 / 8 ) / 2 + ( 256 / 40 ) = 64.00 seconds

3.5 bpb at 20 MB/sec

( 256 * 3.50 / 8 ) / 2 + ( 256 / 20 ) = 68.80 seconds

//--------------------------------------------------

4.0 bpb at 120 MB/sec is stuff like THOR,quicklz,LZRW,LZO
3.75 bpb at 80 MB/sec is stuff like LZX, and my Oodle-LZH
3.6 bpb at 40 MB/sec is stuff like RAR & Quantum
3.5 bpb at 20 MB/sec is stuff like LZMA/7-zip

Even though those efficiency numbers are very rough, I was surprised to see Zip was up there doing quite well. A lot of people claim to be "better than zip" , but they measure that in the useless metric of how much compression they get. The real measure is the full time to load compressed data into memory uncompressed.

I've been thinking about making my LZ optimal parser able to output Zip code streams. I think it would be the best Zip recompressor (7-zip also has a very good zip recompressor, I'd like to test myself against that). Unfortunately the source code for both 7-zip and InfoZip is a total mess and untangling that to be able to output a Zip code stream seems like an awful lot of work for a pointless curiosity project.

ADDENDUM : I should note that the total time is actually not really the best measure unless you are completely stalling on the load. Hopefully the actual IO is async and you do other work, which means actually that the decoders which spend more time in IO and less time on CPU work are preferred. The slower decoder with higher compression may look good in a pure race on blocked IO, but it's preventing other stuff from using the CPU. The only time a slow decoder really makes sense is when the CPU is very fast and the IO channel is very slow.

10/24/2008

10-24-08 - 2

Urg, I'm trying to get my home machine set up as a VPN server so I can get to it from work and I can't get it to accept a PPTP connection.

I did find the MS Support Tools which has a pptpsrv and pptpclnt to at least be able to test the ports.

you need to make sure TCP port 1723 (for PPTP) or UDP port 500 (for IPsec) is open for the VPN to communicate through the firewall. You also need to open up IP port 47 (for PPTP) or IP port 50 (for IPsec). The IP port is not TCP/UDP or ICMP it's IP.

WTF does that mean? IP Protocol 47 (GRE) ? WTF. Urg. Networking is such complete balls.

10-24-08 - 1

I've been daydreaming about Everquest lately. I vividly remember places in the EQ world as if they were real places where I spent time in my life. The giant wood bridge to the Karanas, the Aviak towers in South Karana with Elephants wandering around; the burrows full of gnolls; the desert south of freeport with the big giant who stomped me, and the desert next to that with all the crocodiles, the swamps full of frogloks and the city of the trolls, the dungeon of Guk, etc.

I played most of the races just to see their different areas and feel what it was like to grow up there. Barbarians in the frozen north, wood elves in the misty forest of giant trees. It was such a huge world, especially for the time, you could just explore for days. And it was really well designed with interesting hidden stuff that kept you finding new things all the time if you wandered about. Each area had cute little decorations and context and NPC's related to the stuff going on in that area, with lots of hints of stuff you would see at higher levels and hints about the history of the game world.

It was a whole alternate universe you could dive into and run around in for hours. A lot of the magic of that was because it was so open. There were basically no keys or doors in the entire game. You could go anywhere, and players of various levels did mix quite often. It's the most perfect execution I've ever seen of a game where your level is the "key" to various areas. You could go to the hardest dungeons right away if you want, you're just going to die quickly. Even at low levels you could cast invisibility and go on a risky raid into high level dungeons to check things out and it was fascinating and terrifying hoping your invisibility wouldn't wear off. One of the great hurdles early on was trying to make the run from Freeport to Qeynos at low level, you'd beg someone to give you fleet feet and invisibility so you could get past the minotaurs and beholders in the mountain pass.

Of course a lot of the fun came from having other people in it. Even if you were solo'ing you could discover new things and then talk to your friends and be like "man have you seen this!?". I haven't really played WoW but I feel like it has ruined the things I really loved about the game; I liked not knowing the right place to be or the right item to get, I didn't really like the sitting around power leveling; I liked the fear of dying and having it be a really singificant disaster; I liked that it took some effort to explore and get away from other players who were overcrowding the most ideal power leveling spots and all camping for the same item.

There are a few games in my life that really stick with me to this day. "Faery Tale" on the Amiga was one. It was very similar in that it was a huge open world you could explore, with neat special places, and odd magical fairy tale stuff, like blowing the shell to ride the turtle. Another was "Out of this World" / "Another World". The gameplay was pretty shitty, but the art style was just so cool, and the universe was strange and gorgeous, it was like the Half Life 2 of that era.

10/23/2008

10-23-08 - 2

I used to think about getting some kind of wacky physics or math related tattoo. OMG I'm glad I didn't. These Equation Tattoos are hillarious ! The best quote :

I�m just happy that the artist now understands what a Swartzchild Radius is.

Then there's this girl who starts out promising ...

Girl with Shrodinger equation

Someday i hope to be a wacky, flannel-sportin� physicist.

Um, cool. Wait, what? Flannel?

my tattoo is schroedinger�s equation for the wave function of a particle. i chose this equation because its elegance & symmetry reflect that of our multiverse, & also because it describes the fundamental source of quantum weirdness.

Um, okay, not really, quantum wierdness comes from the fact that the probability is the square of the shrodinger wave function, the actual wave function is a pretty ordinary wave function. But okay, close enough.

time travel, quantum computers no matter what happens in my life, there is an infinitely Glorious Plan swirling all about us

Alright, now, see, that's your problem. That um, Glorious Plan you got swirling all around you is why you ain't ever going to see inside the ivory tower of physics.

I would be honored to be included among the ranks of badass scientists all over the world. oh, & if you have any pull with any preeminent physicists, tell Brian Greene to return my fan mail! :]

I'm afraid my dear the best you could ever be for Brain Greene is a cum dumpster.

And of course the hot chick is a marine biologist, and she's dumb as nails. Umm, I got it cuz like, jellyfish are cool, or something.

These are actually pretty rad :

1

2

3

4

10-23-08 - 1

I bought an M-Audio Audiophile 2496 for my Media PC because the motherboard audio was ass quality. I wanted something that had direct connections to RCA out, none of that headphone jack bullshit, those connectors are so shitty, I hate that they've become standard for computer audio output (of course if you're on digital you should use SPDIF or HDMI or whatever and you're fine). The sound quality of this thing is very good, I can turn the speakers way up and don't hear any noise. It was kind of a pain getting the driver to work right because as usual with all this high end audio shit they have to do everything some weird nonstandard way that doesn't play well with normal windows apps.

Anyway, I have a weird complaint about it. It's really fucking loud! The signal level coming out of it is super high, I have turn the computer's software volume level way down and also turn my amp way down to get a reasonable volume level. I don't like turning the software volume down because it worries me that the driver is doing something awful like just compressing the dynamic range into a tiny portion of the 16 bit wave (I have no idea if that's actually true). I generally prefer to have the computer output full volume and just turn it down at the amplifier, but this card is too loud to do that.

Oh, most of you would probably hate this card, it does no 3d and no surround sound processing. Just stereo. Yum.

10/22/2008

10-22-08 - 7

The new Google features for research papers are very awesome. The "All N Versions" lets you find a PDF you can actually download; they've had it for a while but now there's the green triangle thing that points you straight at an actual copy you can get.

10-22-08 - 6

Some compression ideas I'll probably never work on :

Lossless image coder. See for prior art GLICBAWLS, TMW, MRP, CALIC, MEC. Lossless image coding has not really gotten the heavy work that other types of compression has and there's a lot of interesting stuff still to do. There are several fun issues :

When to use DPCM (LP, Linear Prediction) or not. For smooth part of the image you want ot use an LP, but there are digital parts and pattern parts as well that you could detect and do something better there. Eg. for text layed on top of a natural image, you want to adaptively per pixel identify the type of predictor and switch between a smooth/LP and a pattern/context type of coder. For example BMF has two separate models for smooth or digital images, you really should be picking the right one per pixel.

How to do the LP. There are a few pieces of this. For each pixel, you want the right LP for that environment, eg. some weighting of what neighbors to predict from and what shape of slope to predict with. Then you also need an error estimate. This is really just a machine learning problem. You are predicting a variable from a symmetric source and you need to predict the average and sdev, and your prediction should minimize entropy. There are lots of interesting pieces here. GLICBAWLS is very simple and close to the best, it trains an LP for each pixel based on the local neighborhood, but that is both very slow and also obviously can be beat in terms of compression by doing something like CALIC to characterize the local neighborhood and use different weights (GLICBAWLS always uses the same weights just based on distance). eg. in some areas you might only want a horizontal neighborhood because you detect that the vertical neighbor pixels are just random and not useful.

My idea is roughly to build a piecewise-regression model from the ML literature, using something like DT -> LP. The rough algorithm goes like this :


1. Start with some categories of predictors.  The fixed function CALIC gradient predictors might be good, something like :
	{N , W , NW , (N+W)/2 , N+W-NW , a larger neighborhood smooth predictor }

2. Classify each pixel by which predictor has the lowest error on it

3. Build a decision tree which maps each neighborhood to the desired predicition classification
	(this is a bit tricky because the error should not be classification mistakes, it should be entropy)

4. Now classify each neighborhood with the DT you made.  Construct optimal LP for each class.  Go to #2

This is rough and there are some missing pieces; for one thing you need to estimate the confidence in each neighborhood; again the confidence could be estimated by a function on the neighboring pixels, and also as a Markov model of the neighboring errors. You might also still want to do shape-context correction ala CALIC.

There are obvious things that current LP coders are not doing, but they're already super slow, so the question is how to do better and also be faster.

ADDENDUM : another interesting thing to explore would be using the other channels better. Linear correlation is not the place to look. I already did some work on finding optimal transforms for image coding, but in general you could do a lot more than just transforms. Even after a transform, the channels are still highly spatially correlated; that is, the places that are non-linear in one channel are very likely to be non-linear in another channel, that is LP errors occur in the same places.

High Compression LZ ; I'd like to take another shot at doing an LZ coder to beat LZMA on compression. The basics are obvious, arithmetic coding, context coding, forward optimal parse with "flexible parsing". The main new thing I'd like to try is an ROLZ with offset ordering. That is, rather than offsets just being 1 to window_size, make them 1= most likely, 2 = second most likely, etc. using some ordering that the decoder can reproduce pretty easily. PMR's make sense there, as does a simple ordering with context modeling. eg. contexts predict likely matches. A big deal is that a good flexible parse with the ROLZ can be a lot smarter about taking advance of those special offsets that code cheaply. You could also code lengths as deltas from 2nd best though I'm not sure that's worth the computation.

One idea is to not send big offsets ever. Long matches can be coded very efficiency by waiting for a context to predict them. That is, if you code from order-2 contexts with ROLZ, that means you give up 2 match len to get the prediction, so instead of sending {offset,Len} you send two literals and then {context,Len-2}. That's good for long matches. For short matches, you could have something special and local, like maybe only code up to offset 1024 or something. Short matches are only really important for binary files, which do a lot of 3 & 4 long matches, and those have very regular patterns.

This idea is not fleshed out, one problem with HCLZ is you can easily fall into the trap of just making it a context coder basically. Even LZMA has basically fallen into that trap, it's not much faster than Shkarin's PPMd.

Blocksort without MTF ; I've had this idea for a long time, and I know people have done it before and it's also sort of pointless just like HCLZ. The blocksort + MTF works well, but it destroys all your good extra information that you could use to get more compression. You'd like to get compression closer to PPM but keep the better speed of blocksort. Generally as you play with blocksort to try to get closer to PPM, you also become slower than PPM, but never get as good compression, so it's totally pointless. Anyhoo.

The idea is just to try to do the Blocksort without MTF. If you just look back at the previous symbols in the post-sort stream, those are the symbols predicted by your current context. You can almost recreate your context by looking backwards and looking for an inflection point where the local statistics rapidly change. Once you have that you have a local context and you can do "PPM". Obviously you can't do that exactly but maybe you can get close.

One bad thing about the MTF is that it keeps the tail ordering around too much. That is, the ordering of symbols past #16 or so is pretty much junk. You really want to just use the MTF to code the first 16 or 32 symbols, and then after that send an escape and code the remainder with their order-0 statistics, which doesn't change. When you transition from one major context group to another, you basically want to throw away your whole MTF order and restart with the order-0 order.

My basic idea is to use the power of "SE" (Secondary Estimation). It goes like this :


As you scan over the post-sort blocksort symbols, track the count of the last 8 symbols in one table using a sliding window,
in another table track the last 16, in another the last 32, etc.  (this is very fast with a sliding window update)

To code the current symbol, you code from the last-8 , if its not there you escape and code from the last-16, etc.
(with exclusion of course)

To code from the last-8 you do NOT just the counts.  Instead you use Secondary Estimation using the Markov *state* information of
where the symbols were seen.  That's basically the 64-bit word of what the last 8 symbols were.

Obviously 64-bits is too big to use in an SE table, but you can almost always pack that down.

if there was only 1 char in the last 8 -> that's one table entry
if there were 2 chars, then it's a binary pattern, and you know how many chars of each which gives you a permutation index.
eg. one char was seen 7 times, one char was seen 1 time, so you have
  00000001 = 0
  00000010 = 1
  00000100 = 2
  ...

it's obvious from those that they should have very different prediction characteristics, eg.

  10000000

should predict the 0 very strongly and have a tiny tiny weight for the 1, while

  00000001

should have a decent weight on the 1 since it might be the beginning of a context transition, while

  01010101

is actually clearly an alternating state and should strongly predict a 0
(though the adaptive SE will tell us if this is a fluke or not)

and

  00101101

should basicaly predict 50/50, it says we're in a context where 0 and 1 are both likely.

Once you get more than something like 4 different chars in the last 8 you stop doing this so carefully and just use something more like a plain order-0 coder because the number of permutations gets too large for the SE to have good statistics. But 2 and 3 chars in the last 8 is the most important case, as we know from PPM most of our compression comes from the very deterministic contexts.

10-22-08 - 5

Ugh. My fucking upstairs neighbors are killing me. They were dragging furniture across the floor again at 3 in the morning. I got no sleep yet again and feel like total ass. When I'm home I'm a tense ball of quivering nerves. I haven't really relaxed in weeks. All I want is peace and no stress. Normally if someone is making my life unpleasant I just cut them out, avoid them, but I can't do that with the neighbors. I'm kind of tempted to just move, but ugh moving is such a pain in the ass, and no resonable amount of throwing money at moving seems to make it any easier. I've talked to the upstairs neighbor, but I don't think it does any good. Maybe someone sweet and cute could talk to them and get them to change, but I'm no good at talking to people, I try to be nice and nonconfrontation but I always just come off as an asshole and they have no motivation to help me. People only change for two reasons : 1. the threat of some consequences, 2. if they like you. I have no stick to use against them and I can't charm them. I hate feeling powerless. Ugh fuck bleck.

10-22-08 - 4

I keep being told RefCounting is bad for threads. Really? I mean obviously if you do something retarded like mutex to touch the refcount that's bad, but you don't need to do that, you can just InterlockedIncrement. Also if you pass them around by value all the time so that the ref count goes up and down all time, that would be bad, but you don't need to do that either. I just want a way to have objects count the # of people that want them to stick around.

I'm thinking about making the strings in Oodle refcounted. Currently I just use char buffers everywhere, but that's awfully fat and I spend time strcpy'ing them around. So I could make them COW refcounted thingies and I think it just works. Another option would be to have a big static dictionary of every filename ever, and then I can just pass around read-only char *'s to that dictionary. That's actually a good solution *if* you are using most of those strings at any time, but could be very bad if you are generally only using a few of them at any one time.

10-22-08 - 3

I've been trying to write the Oodle APIs nicely as I go, in the simple RAD way that's very close to plain C, but it's really slowing me down to figure out the right API as I go. I think I've just decided to stop doing that. I'm going to just get it all working whatever way is easiest for me, and then once it's all working I'll rewrite the external APIs from scratch. That should result in much better APIs since I'll have the functionality done and really understand it, and I think it will be faster overall too since I can rough out the functionality more comfortably if I just stop worrying about what the API looks like.

10-22-08 - 2

We went hiking last saturday. Meh it was pretty but wet and cold, not really enjoyable. Anyway, I pulled over at some spot on the dirt forest service road to the trail so I could pee. I got out and stepped off the road a bit, and saw on the ground at my feet : 1) a bunch of shotgun shells on the ground, 2) some crushed cans of Miller Lite, 3) a used condom.

10-22-08 - 1

I'm not a chronic insomniac. Like many folks, I am prone to occasional nights of bad sleeplessness maybe once a month. These are awful. Nothing is worse than lying awake in bed for six hours, mind racing, all the while knowing you'll be useless the next day.

It's funny to me how some people's "awful, nothing worse" is other people's every day. I think most people who complain a lot just have no idea how many people suffer through great trials constantly.

10/21/2008

10-21-08 - 6

Holy crap Pushing Daisies is so awful. It's sickly sweet and cheesy, which would actually be okay if it had any heart or realism to it. It's very obviously trying to copy Amelie and other similar movies that copy the style of fables in modern settings with pumped up saturation and magical realism, but it's all sacharine with no body behind it. It's disgusting. I find garbage like that offensive, worse than nothing, because it makes it harder for me to appreciate the sweet fables that I do love.

10-21-08 - 5

So I benchmarked khash from Attractive Chaos ; it's a little bit slow, but the memory use is very low. It's basically a plain old flat mod-prime hash table like the old Knuth days. It uses a form of linear rehash reprobing. It has a separate flag array for empties which is a poor decision IMO, that means separate cache misses, and you pretty much always have key values that you can use for EMPTY and DELETED. It's better to just have the key array. He also has the data in a seperate array which is again bad for cache, but I'm not using that since I'm just doing a "set" not a "map".

On the plus side, the memory use is quite low. One thing I hadn't really thought of is that using mod-prime hashes lets you size your hash much closer to the size that you really want. Like in one test I have 8950 words, which with a pow2 means I jump up to 16384 and waste a lot of space. With mod-prime I can have a table of primes and get close to the fill size I want, like for 75% I can use 11933 which is almost perfect.

The hash for strings in khash is a variant of a "Bernstein" hash which is very special-cased to ASCII. It's actually a very poor hash, but it works totally fine when you mod-prime it. mod-prime just like magically fixes crappy hashes and makes them okay. I tried putting some better hash functions in khash and they don't really make a difference. I saw the same thing with STLport hash which also uses a mod-prime - it's very insensitive to the hash function.

Anyway khash performs similarly to the STLport hash but with lower memory use, so that's cool. Basically a good old fashioned reprobing hash is totally still a good thing. Sean's "stb.h" has a very similar hash implementation that is better in some ways. STB doesn't have seperate flags, it uses the EMPTY and DELETED keys. STB uses pow-2 sizes, but I'm not sure that's a win, the mod-prime is pretty nice for these hashes.


Very large benchmark :

num words : 775481
num queries : 1602134


jumplist             : 0.13 seconds, 249.35 clocks, rate= 12.04 M/s
mem use: 10398156 = 13.41/per

strhash              : 0.14 seconds, 255.68 clocks, rate= 11.75 M/s
mem use: 12582912 = 16.23/per

std::hashset         : 0.17 seconds, 311.93 clocks, rate= 9.63 M/s

khash                : 0.16 seconds, 303.12 clocks, rate= 9.91 M/s
khash mem use : 6684696 = 8.62/per

sorted               : 0.43 seconds, 809.12 clocks, rate= 3.71 M/s
mem use: 6203848 = 8.00/per

implicit bt          : 0.33 seconds, 621.21 clocks, rate= 4.83 M/s
mem use: 8388608 = 10.82/per



Small benchmark :

num words : 8960
num queries : 1602134


jumplist             : 0.06 seconds, 113.15 clocks, rate= 26.54 M/s
mem use: 137220 = 15.31/per

strhash              : 0.07 seconds, 138.33 clocks, rate= 21.71 M/s
mem use: 196608 = 21.94/per

std::hashset         : 0.08 seconds, 150.36 clocks, rate= 19.97 M/s

khash                : 0.11 seconds, 197.32 clocks, rate= 15.22 M/s
khash mem use : 52232 = 5.83/per

sorted               : 0.20 seconds, 367.69 clocks, rate= 8.17 M/s
mem use: 71680 = 8.00/per

implicit bt          : 0.16 seconds, 294.28 clocks, rate= 10.21 M/s
mem use: 131072 = 14.63/per

Very Large benchmark is all the unique words of 4 chars or more in the 10^8 byte enwik8 data set.

10-21-08 - 4

A few more hashing notes and then I have to move on :

1. Sean pointed out there are cases where the static jumplist thing totally kicks ass. For example, if your data element is a large object and you have them in a flat array anyway, like eg. its 1024 byte file information record. In this case you don't want a normal hash to hold the data by value because it's large and you don't want to waste a bunch of space on the empties. The jump table can index directly into the flat array of elements, so the only memory use is for the index table itself, which is very small. In fact, the number of indexes need is around (N/4) and the size of the index is 4 bytes, so for N elements the index is just N bytes. Note that you can only index the array on one key this way, because it requires reordering the original array for the jump index.

2. Classic style hashes like khash are really bad as "multimaps" in the STL lingo. When you put in a bunch of identical values it makes them locally N^2 because even fancy reprobing can't get identical values to separate. The STLport style of hash handles being "multi" a lot better. Basically just don't try to make classic hashes be "multi" ever. (to some extent any multi-hash is locally N^2 inherently, but the STLport style can insert quickly, give you an equal_range easily, and then once you have it it's ust a linear walk over all the equal key elements, not a bunch of jumping around the table).

3. For the sorted list you could do an interpolation search instead of a binary search. The hashes, which I store, are actually nicely linearly distributed, so you can get a very good estimate of where to jump in. A pure Interpolation search is very slow and kind of silly actually. Once you do your first interpolation to get an initial index, further interpolation doesn't make much sense. You can almost just do a first interpolation and them linear search from there. In fact I just tried that and it works very well :


Binary search on sorted list : 370 clocks

Full interpolation search : 450 clocks

One interpolation then linear search : 300 clocks

Implicit binary tree search : 290 clocks

STLport hash : 150 clocks

jump list : 114 clocks

So obviously there's something of value there, but the one interpolation is sometimes pretty far off and then it linear searches a whole lot. What it could do is do one interpolation, and then do a binary growing search, like +1,+2,+4 to find the range that the value lies in, and then once it finds the range do a binary or linear search to find the value, but I tried that and it was slower. Also note the fast version above is using >> 32 to divide by the hash range and I assume that the hash is using almost all of the 32 bit range. Divides are way too slow to be in the search.

10-21-08 - 3

I've lost my zeal for Obama. For a while there I was pretty high on Obama, because he seemed to be actually honestly addressing issues and speaking directly and intelligently about what's happening in the world. Obviously he's still better than McCain or W. but a few things are disturbing me. For one Obama seems to be promising free money to everyone these days. I know it's the campaign pressure, but it's sad to see him cave in to that and offer stimulus and tax cuts and money money hey vote for me I'll give you money!

The reason for America being fucked up and the positive change that's happening is going to be obscured. Obama is going to inherit a complete clusterfuck. He will have no budget to be able to do anything significant. In 4 years we're going to be in the shitter and Republicans will be able to say "Dems have been in control for 4 years and look at the shit we're in". Democrats will become complacent because "they won".

We're missing our chance to actually do something significant to fix the US government. Crises are one of the few chances that we have to actually get major changes through, when enough people are upset and the usual interests that stop any change are sidelined. Lately the Shadow Powers have been very slick about using crises to improve their positions. After 9/11 they quickly got the Patriot Act and all kinds of other nonsense through to be able to spy on americans and give more power to the executive and so on. With this financial crisis they get to give out masses of money. Getting Obama in will kill the momentum for real change.

A sampling of things we actually need :

1. Require transparency of the executive. Everyone in the executive branch except the president must answer congressional subpeonas. That's like required by the constitution or something, but I'd love to see a "Prime Minister's Minutes" thing for the President but we know that won't happen.

2. Require the executive to enforce the laws as intended. That means allocating budget to laws even if the President doesn't like them. Put an end to this idea of signing statements.

3. Require transparency of the Pentagon budget. No more multi-billion dollar secret budget for the Pentagon. The claim that it's crucial to national security is just nonsense these days. Our security from Al Qaeda is not going to be compromised if we disclose that an extra $6 billion went to Lockheed for "research".

4. Eliminate the Electoral College. The allocation of votes based on the senate gives unfair extra votes to tiny states with microscopic populations that provide very little to America. Each person's vote should have equal importance, this system gives much greater value to votes from tiny states. A vote in Alaska is worth 3* a vote in California. (Pop 670000 in Alaska = 3 electoral votes, Pop 36458000 in CA = 55 electoral votes). WTF, how are we the fucking majority of people tolerating this - hey your Californians, your vote is worth 1/3 as much as a vote in Alaska, step the fuck up and change that! This is the reason the Republicans have been able to win with the "small town" strategy for the last few elections.

5. Eliminate the Senate. Similar to the previous. Also, greatly reduce the number of people in the house, maybe like 150 is reasonable. It's too much, it's impossible for voters to keep track of who their congressman is and what they're doing.

6. Make lobbying illegal. All lobbying. Yes, there are some benefits to lobbying but the costs have grown to far outweigh the benefits.

7. Require all candidates to only use public financing. Similar to the previous.

8. Restore the media ownership laws we had until Clinton & Bush tore them up. Basically don't allow one company to control all the news media in a single market.

9. Restore the bank laws Clinton repealed. Give government insurance only to plain banks with lots of supervision and plenty of cash cushion. Banks with insurance are not allowed to be connected to any other financial institution.

10. Eliminate all the government-corporate complexes in which the government enforces monopolies and gaurantees pricing for companies that are weakly regulated and get to keep their profits. In some cases this would mean nationalizing more, in many cases it would mean removing government intervention in fixing markets. For example, airwaves and cable lines should be changed to an open bidding system. Transmission on the power grid should be opened up more, but management of the grid itself should be nationalized more. Get rid of the Medicare Prescription plan and instead require pharma companies to sell pills to the government at cost. If you want to have the privilege of selling pills for profit in America, you have to sell at cost to people on Medicare.

11. Require the power companies to fix their plants and get up to current environmental standards, eliminate the "old source" loop holes. The idea that this would be a massive cost is a complete fallacy. In most cases the improvements in efficiency would actually decrease long term cost to the consumer, they just don't want to do it because it's a big hurt to short term profit which is how executives makes their paycheck.

12. Stop subsidizing the automobile by pumping money into roads, gas companies, cheap oil leases, etc. Make drivers pay for their cars, AND tax them the cost of their pollution.

13. Spend more on the things that are going to save us. Technology infrastructure, public transit, education, etc. And by education I mean real substantive education for the brightest who really matter, not rubber stamp colleges and primary schools teaching memorization for standardized tests.

Anyway, there's a lot more. You can read my political rant from the 2000 election and pretty much all that still applies because we never fucking do anything significant that would actually make a big difference and structurally change how broken the system is.

Let me say that I do not dismiss the importance of competent beaurocrats. I think under W we got see just how important it is to have good people even in a horrible beaurocracy, and a bad president can really chase away the competent technocrats, while a good one can give people hope and bring a lot of good people into the horrible jobs in Washington.

10-21-08 - 2

Widespread stock ownership is destroying American corporations. The idea of corporate governance is that the executives work for the benefit of the stock-holders. That works well when the stock is held by a few magnates, like JP Morgan or Carl Icahn or Warren Buffet, they actually know what's going on, they can see when the CEO is raping the company and taking the profit for himself, but when the stock is owned by a million grannies with 401k's, they have no idea WTF is going on and the idea that they are holding the executives responsible is nonsense. It allows companies to be less transparent, to get away with more, and for incompetent executives to keep bouncing from job to job. Obviously there're advantages to having a liquid market where you can be evaluated by your merits rather than by the whims of someone like Morgan, but I contend that the market would function better if the minimum stock purchase was $10,000 or so. Stock analysts actually did honest due dilligence back when they were primarily doing analysis to make purchases themselves (with their bank's capital). When they started just pimping stocks for the public to buy all connection to reality went out the window.

10-21-08 - 1

Sometimes these days I start to feel like one of those nutters who says things like "it doesn't matter who I vote for, the whole system's broken, they're all crooks". I don't believe that at all, and in fact I think that's quite foolish. There's a huge difference between the crooks. Just because the system is horrible doesn't mean you can't still make a big difference. There's a huge huge difference between the numbers 1 and 10, even though they're both really far away from 100 where you'd like them to be, you'd be retarded to not exercise your right to choose between the 1 and the 10.

10/19/2008

10-19-08 - 2

I had like 8 transitions in cbsaver but I've disabled them all, and now it only does a slow gamma-correct cross fade. Gamma correction makes the brightness change linear so it's less distracting. IMO lots of flash and sizzle in screen savers is very bad, particularly for a living room HTPC screen saver, I want it to be very mild and inobtrustive. The latest version is in exe : cbsaver zip .

10-19-08 - 1

Hashing scares me. Any time I see a hash function I feel great skepticism. I have no confidence that a hash is actually working well unless I test it in the real situation it's being used. Hashing is fast *if* everything is working perfectly, if the hash table is actually sized right for the data, if the hash function is actually randomly distributing your actual keys, if the reprobe is not walking all over unnecessarilly, etc.

Preface : What I want to do is hash strings for lookup. I want low memory use. I'm going to store a (char *) and a 32 bit hash value in each element. I only do a full strcmp if the 32-bit hash values are equal. The time that I want to minimize is the combined time of the hashing function and the lookup using the hash. So the hash computation should be fast and also good enough so that the lookup performs well. Hashes that don't randomize well will result in table collisions and also unnecessary strcmp's being done which is slow. The 32-bit hash value is used in the elements, but generally a much smaller hash size is used for the table. Ok, now let's get on with it...

The STLport hash is a linked list of all the elements, and the hash table gives you jump pointers into that list. When you make a query with a given key, it makes hash(key) and then you know that your element is between

begin = jump_table[ hash(key) ]
end = jump_table[ hash(key) + 1 ]
where begin and end are pointers into the big linked list. This lets it easily make an iterator that can walk over all the data in a deterministic order (though not sorted order, unless the hash function is lexicographic which it probably isn't). It also makes insert and remove very fast because you just find yourself then unlink the node. STLport grows the hash table pretty aggressively, and just uses a linear list within each segment. You could grow the jump table more slowly, and then use a sorted list on each segment and do a binary search. This is equivalent to the conceptual model of a hash table where each bucket points at a linked list, ala PkZip.

A few things of note :

1. STLport grows the hash table based on *total occupancy* not the max occupancy of any bucket. That means if your hash function is retardedly broken, it can perform really badly, lookups can become O(N). That's really your fault. On the plus side, because it's not a reprober, if it gets all screwed up in some part of the hash, the other parts will still function file, eg. screwups don't leak all over the way they do in reprobers.

2. The STLport hash is both faster and uses less memory than map<> or almost any self-balancing binary tree type thing. Pretty much any time your key is something that can be easily hashed, you should use a hash.

3. The STLport hash always makes the table prime number in size and uses a mod to convert your hash() into a table index. Basically they don't trust you to be making decent hashes (wisely so). For example if you just have an integer index, you can return that as your hash and it will work fine.

4. For static data you could of course do something neater by just keeping the elements in a vector<> and making a jump table that indexes into the vector. Also if you know you can make good hashes you could use a pow two size and a mask instead of a mod.

A normal table-based hash with reprobing on collision can be faster. I tried making one that's minimum size. I want to use a pow2 to avoid a divide, so I just find the next pow2 up and use that, meaning I have very little empty space. A normal reprobing hash can perform very badly when there's not much empty space because you have to keep walking until you see empty to know you've visited everything that goes to your bucket.

The first thing I tried as reprobing with a step of 1. There's a lot of fancy stuff on reprobring with rehashing and whatnot, but I've read that those things are terrible for cache and it's better just to reprobe by stepping linearly, so I'm trying that. ADDENDUM : more on this later.

The next is to count how many elements really belong in each bucket. When you insert something in a bucket, you bump its count, and then if its full you reprobe and add it to the actual first empty spot you find (and don't bump that count). Then when you lookup you only need to keep reprobing until you've seen count elements that hash the same as your query. For very full hash tables this means much less reprobring, it was about 2X faster for me (though it makes the hash table bigger because you have to store that count).

Test of finding all the words in "book1" and "book2"

sorted list                   : 537.19 clocks
reorg as implicit binary tree : 348.14 clocks

std::set                      : 664.89 clocks
std::hashset                  : 253.01 clocks

hash with 1-reprobe & count   : 212.33 clocks
static vector jump table hash : 152.29 clocks

The winner for me is the "static vector jump table hash" , which is basically the STLport hash modification #4 that I described above; the elements are just stored flat in an array, and the hash bucket tells you the start index of the data that goes in that bucket. I found that a linear search is much faster than a binary search within the bucket, but I do still sort within the bucket so that I could early return as soon as I see a value larger than the query in the linear search walk. Note that the jump table is actually *smaller* than the hash because it doesn't need any empty space.

Memory use :

num words : 41509

sorted list lookup:
mem use: 332072 = 8.00/per

	sorted list = 8 bytes per element (zero overhead)
	element = 4 byte hash + 4 byte pointer

implicit btree lookup:
mem use: 524288 = 12.63/per

	implicit btree = sorted list with empty spaces to make perfect pow 2 tree
	worst case almost doubles memory use when size = pow2 + 1
	best case is zero overhead when size is exactly a pow2

jumplist lookup:
mem use: 594220 = 14.32/per

	jumplist = 8 bytes per element + 4 bytes per hash entry; 16 bit hash

strhash lookup:
mem use: 786432 = 18.95/per

	hash is 12 bytes per element + overhead for empty space

I should note that I'm hashing words here, which are quite short, which means the overhead of the hash is crucial. Hashes with expensive rollups don't work. If you were hashing something very very long, it might be wise to use a hash like Bob Jenkins' which rolls up to a loop that works on 3 32-bit ints at a time.

Links : Paul Hsieh hash ; I found this to be slow. It's certainly not magic.

Bob Jenkin's page ; nice page with a summary of tons of hash functions and discussion of what makes good hash functions. His "lookup3" is very complex and slow on small blocks but apparently fast and good on large blocks.

Murmur Hash good simple hash, works on 32-bits at a time so probably fast for big blocks, but was slow for me (though a lot faster than Paul Hsieh).

FNV Hash is the winner. This is the most basic hash and it's also the best for this application. It's just multiply by a prime and xor in the next byte. Apparently the primes he lists here are especially good, and you should take some care in how you make your final hash value from the 32-bit hash. In my experiments it really doesn't matter what prime I use, any prime bigger than 256 seems to work fine.

I think hashes where you do a bunch of shifts are probably pointless. The reason is that one multiply by a good magic number is equivalent to a whole mess of shifts ! eg. x*5 = (x << 2 ) + x , etc. On almost all modern chips a multiply and shift are very nearly the same speed, so hashes with one multiply are faster than hashes that do a bunch of shifts. (sadly there are some major new architectures where that isn't true; multiply and shift by a variable are both slow, while shift by a constant is much faster)

Okay, I counted collisions too.


	// 220 clocks :
	// StringHash collisions : 4620821
	//return rrCRC_Block((const U8 *)string,(S32)strlen(string));
	
	// 190 clocks :
	// StringHash collisions : 5520089
	//return PaulHsiehHash(string,(S32)strlen(string));

	// 180 clocks :
	// StringHash collisions : 4372337
	//return MurmurHash2(string,(int)strlen(string),0x12345678);

	// 171 :
	// StringHash collisions : 4815568
	//return oneAtATimeHashStr(string);
	
	// 152 clocks :
	// StringHash collisions : 4192267
	//return 4001 * stb_hash((char *)string);
	
	// 146 :
	// StringHash collisions : 6157277
	//return bernstein(string);

	// 144 :
	// StringHash collisions : 6603083
	return alphaNumHashStr(string);
	
	// 142 :
	// StringHash collisions : 5892286
	//return FNVHashStr(string);
	
The best hash in terms of avoiding collisions is the STB hash, which is :

unsigned int stb_hash(char *str)
{
   unsigned int hash = 0;
   while (*str)
      hash = (hash << 7) + (hash >> 25) + *str++;
   return hash + (hash >> 16);
}

The top three best hashes are some of the worst in terms of avoiding collisions. But they're also super simple. The time I'm measuring is the total time to compute the hash & use it for lookup. The lookup is pretty damn fast so any extra work in computing the hash has got to be worth it.

BTW conventional wisdom says the hash value in stb_hash should not be initialized to zero, because that makes all the lengths of pure-zero input all map to the hash value zero. Of course that is irrelevant if you're hashing strings. In fact, using anything but 0 there seems to hurt the number of collisions. Also note the shift 7 & 25 is a rotate around the 32 bits which can be done in one instruction on x86, which would make it very fast. This hash has some bad theoretical degeneracies, and Bob Jenkins actually lists this exact hash as an example of a bad hash, but on my test data it has the fewest collisions of any of them! (I haven't tested Bob's full strength hard core lookup3).

Again the murmur looks very good; if you're just going to take a hash and drop it in, which I strongly discourage, murmur might be the way to go. Also I should note that the big flat hash table with reprobing can be very fast if you let it be very big and have lots of empty space. In my experiments it needs around 50% empty space to be fast, even at 25% empty space it gets very slow, beyond 50% empty space doesn't help much more.

I also generally don't like checking performance in ways like this. The real application is queries that will happen randomly in the middle of other application processing. A hash functions with a table lookup might perform very well in an isolated test, but when you drop that in the middle of a real app, the table is not in cache and your hash function also dirties the cache which slows down the rest of your app.

ADDENDUM on collisions : So handling collisions well is actually very important. In fact, with good collision handling the reprobing hash can be almost as fast as the static-jump-list hash. In latest test static jump list is 114 clocks and reprobe hashing is 119 clocks. That's with a 75% full hash table. (you may see authors around the net using 25-50% full hash tables; those people are memory wasters and their speed numbers are lies!).

So the +1 reprobing is in fact great for cache but it does suffer badly in areas of degeneracy. Areas of degeneracy arise where a bunch of values get put in the same bucket. This can happen even if your hash is a perfect hash in the sense that random input = random output, but you might just give it input that has many values that hash to the same spot. When you do +1 probing in those spots it's locally N^2 because for each value you walk on all the other values trying to find empty space.

One good way to reprobe that I got from Sean is to use bits of the hash that were not used in the hash indexing. That is, you computed a 32 bit hash value, but then you only used 16 bits to index the hash table, usually using something like xor folding :


index = (h >> 16) ^ h;

So, try to make a reprobe that uses the *other* 16 bits of the hash. For example :
reprobe = (h>>8) * 173 + 1;

or

reprobe = h + (h>>6) + (h>>19);
or something from the FNV page. What this does is take two elements that has to the same bucket and makes them reprobe to different buckets (hopefully) so they no longer collide. Yay. But just using that as a linear reprobe is not the best thing.

Another thing you can do is a quadratic reprobe. Your reprobe step starts at 1 and then grows quadratically. Warning : some quadratic steps do not ever cover all values, so they might be an infinite loop. The Google Sparse Hash guy claims that triangular steps do cover all values but I haven't seen anyone else claim that so YMMV. (ADDENDUM : apparently it's in Knuth and also on Wikipedia - yes Triangular numbers do in fact cover all steps). A triangular number is n*(n+1)/2 , that's {1,3,6,10,15,...}. Anyway, what the quadratic step does for you is it makes neighbors not stomp on each other. That is if you have one bucket at hash "h" with 10 elements in it and a bucket at hash "h+1" with 10 elements, if you just do +1 reprobing they'll be all up in each other's business and make 20 solid elements with no space. The quadratic step makes big spaces between and they don't run into each other.

The best reprobe that I found is a hybrid, basically it goes like this :

reprobe with +1 twice

then reprobe with a rehash like (h>>8) * 173 + 1;

then reprobe with a linearly growing step : 1 then 2,3,4
This is faster than any other reprobe that I tried by a few clocks. Annoted what we get is :
reprobe with +1 twice
	-> very good for cache in the cases where we get lucky and have empty space right next to us
	-> basically this is so fast its free if we got lucky, helps more for more empty tables

then reprobe with a rehash like (h>>8) * 173 + 1;
	-> breaks the original hash degeneracy, but it's a very evil random cache miss
	-> (btw you could prefect this address before doing the two +1's which would probably make us even faster)

then reprobe with a linearly growing step : 1 then 2,3,4
	-> that's a triangular number from the original index
That's the reprobe that makes us almost as fast as the static jump list, even at 75% fill. BTW the rough numbers were :

Reprobe +1 : 170 clocks
Reprobe with rehash : 130
Reprobe with quadratic : 127
Reprobe with rehash then quadratic : 125
Reprobe with +1 then rehash then quadratic : 119

Static jump list : 115

ADDENDUM more links : Google Sparse Hash Implementation Details is a nice document; it's a cool idea. Basically it's a big flat put the values in the table and reprobe kind of hash, but he uses a really neat sparse table thing instead of an array, which lets you have more empty space in your hash without wasting memory on the empties. Obviously it's not the fastest thing in the world but if you're in a memory-constrained situation it's hot (not memory constrained like small memory, but like hey my search cache machine is using 3.9 GB already).

Cuckoo Hashing looks pretty interesting. I definitely think this could be faster than regular reprobe hashing, and in fact it's ideal for my case of put-rare-get-common.

Python dictionary implementation notes Some good stuff including notes on cache locality in reprobing.

little page with numbers of reprobes for different load factors

Attractive Chaos has a big hash table comparison. His hash appears to be very ordinary, but he claims it's the best, so I'll benchmark his hash tomorrow for completeness...

10/17/2008

10-17-08 - 4

Links about Inigo Quilez and real time Ambient Occlusion. Some of the demo videos are quite beautiful, I like the art style.

Realtime Ambient Occlusion, part 2 - GameDev.Net Discussion Forums
Real-Time Rendering 2006 - DEMO
meshula.net � Screen Space Ambient Occlusion
Inyigo Quilez - home page
Inyigo Quilez - fractals, computer graphics, mathematics, demoscene and more
Inigo Quilez's videos on Vimeo
3d geometry + pixel effects cube field on Vimeo
WaveSlice on Vimeo

10-17-08 - 3

I've been feeling a little lost the last few days working on Oodle. The low level stuff is coming along well, but the high level APIs are a big scary void of possibilities. I feel like I need a more realistic test app to really stress the system and to see how easy the APIs are to work with and all that. I've got unit tests and such but that doesn't really do it, I want a mini game that loads data the way games do and pages data the way games do, with stuff streaming as you walk around, and then big loads at area boundaries with a prefetch and a transition area. I'm uncomfortable writing libraries without a user; I at least like to have *me* as a user, but more would be even better.

One of the tricky things about the high level is how easy should I make it for games to do the broken stuff they do now, like for example the Unreal style of having everything in giant packages. I generally want to make the APIs encourage people to do things the "right way", and make that as easy as possible, but at the same time I do have to provide some reasonable fallbacks for more broken behavior.

Basically the "right way" is having static resource usage information and letting me make bundles of resources however I see fit, and then letting me load the bundles in and out of memory based on the game's usage and prefetch hints (or, you know, if you want to define bundles in logical units that are reasonably small and atomic that's okay too). There are two main kinds of "wrong way". One is a system that just says "okay load everything" at the start of the game. The other "wrong way" is just randomly requesting content synchronously in the middle of frames with no static resource analysis or prefetch to help me out, like just randomly going "texture * cur = load( rand() ); int a = cur->bytes[3];"

I've got a few ideas of little fake games to make. One idea is to make a "Sparse Texture Buffer" (STB) (aka SVT) test app that does the whole texture-page loading thing. I could cook up 1 GB of texture and put it on a quad and fly around on that and page texture. That might be amusing and make people say "oo" but it's not really a very good test of general game content loading. I might do it anyway just because it's easy and sounds fun.

Another idea is to try to do some geometry loading for an actual walkthrough type of thing. One problem with that is getting enough geometry to actual be too big for memory and stress and the paging. I'd have to somehow find a hell of a lot of geometry, or just fake it by duping a few files and calling them different names, or do some procedural geometry generation which I guess could be fun.

One idea that popped in my head was doing a little real-time raytracer. The fun thing about doing a realtime raytracer for paging is that I can do the infinite geometry load-on-demand thing. The basic idea goes like this :


Fire all the rays from the camera
	(group in packets of 4 or 16 or something)
	(process in a coherent tile order for locality)

Intersect rays with objects in front-to-back order

When a ray collides with a resident object its done

If a ray hits a non-resident object, request that object to page in,
	and push that ray to the deferred queue

After you process all the rays you can, see if any of the objects we requested became resident
	if so, collide them
	if not, wait a little if we have enough frame time

If we're running out of frame time and the objects still aren't resident,
	then collide the lower LOD version that's in memory, or some other fallback

this gives you perfect object residency which is just-in-time (if the IO returns fast enough and you don't stress it too hard). Because the rays tell you what to load you get perfect occlusion residency information. There's other paging stuff to make it work better, such as prefetching objects that are very likely to be hit at the start of the frame and also prefetching in the direction of camera translation to anticipate.

I could do the Id oct-tree thing, or just do a more normal objects that are made of kd-trees and triangles kind of thing. Writing a little packet kdtree raytracer is actually pretty easy, but I still might not do this because it seems like getting off track a bit too much. And I still have the problem of getting enough geometry to actually stress the pager.

ADDENDUM : hmm , yeah I should probably just find some reasonably simple open source game to plug into and replace its IO. Anybody have a good idea of a game I can adapt whose code base won't break my brain?

10-17-08 - 2

I really think the poor should rise up, and I encourage them to do so. Fancy cars should be stolen. Ostentatious mansions should be broken into. People walking down the street in fancy garb should be robbed. It is illogical to not do so. It's a necessary check on society.

The upper class simply does the logical thing which gives them the most immediate benefit - stomp all over the lower class and all those weaker than them. They will continue to do so until they have a reason not to.

One of the things I learned in poker over time is that you should never stop running over someone. That is, your gut will tell you to slow down. You think there's no way you should be able to get away with the absolute obvious larceny that you're setting up. You gut is wrong. When someone lets you take advantage of it, you just do it, and do it again, and again and again, your gut will be screaming "they must see what we're doing, slow down! try to hide it more, ease up a bit" , no, your gut is wrong, just keep robbing them. Until they actually give you some clear sign that they are going to respond correctly your to overplaying, just keep on overplaying. (specifically in poker this goes for being super tight just as much as it does for doing lots of stealing; when you're playing super tight waiting to trap some maniac, you starting thinking "god there's no way he's going to give me action, I'm playing so tight, I need to pretend to be looser to trick him", no, no you don't, until he actually shows you that he can slow down and stop being a maniac because you're so tight, you just keep on waiting to trap him).

The upper class is like "um, we're gonna just cut all the taxes for the rich" and the lower class is like "um, shucks, okay, sounds good". WTF, you retards, you deserve it. When somebody overplays a certain style, it's only wrong because it opens up a possible correction. If that correction is not taken, then the overplay is not wrong.

10-17-08 - 1

For static data (or semi-static) and very low memory use, one of the best data structures for lookup is just a sorted array. I've written about this a bunch in the past long ago, so you can look back if you like, but hey you basically sort & then you binary search, and it has zero memory use for the lookup because you're just holding the data you had to hold anyway. Part of the reason it's good is that in the common case of small lookups, hey all your memory is right there in a row and it all comes in to cache, and you never have to jump around pointer chasing, and you can easily do things like once you get down to a range of 8 elements or so in your binary search you switch to linear.

While that lookup is log(N) it can be a little slow for big trees, partly because logN gets big, but mainly because you jump all over in memory and cache miss at every branch. Rather than using a plain sorted list, you could do a binary tree style of reshuffle, eg. first sort, and then reorder the list to {N/2, 0,N, N/4,3N/4, N/8,3N/8,5N/8,7N/8 , ... } so that as you do a binary search the data you want to lookup is right near you in cache, of course this is just a binary tree with implicit child indexes, it's just a reshuffling of the array it's not creating any new nodes. Actually I guess the best layout is something like a van Emde Boas tree; it's intuitively obvious that you don't just want all the elements at the same depth next to each other, rather you want to gather subtrees near the leaves so they are consecutive chunks in memory. One problem I'm having is that I don't want to store indexes to kids, I want to implicitly index the tree, and it's not clear to me how to implicitly index a vEB tree.

BTW this is sort of like doing a B-tree but having 16 elements in each level of the B-tree and using a binary search to find your element in each step of the B-tree. So really you still just have a binary tree overall, but it's organized differently in memory. This is simple and all, and in fact pretty good, but the problem is that choice of # of elements at each level of the B-tree is very strongly tied to your architecture's cache line size, which is evil for libraries. See Cache Oblivious Binary Trees .

There must be lots of research on this but I'm having trouble finding it. What I'm interested in is speeding this up simply, or adding a small acceleration structure, something that's O( sqrt(N) ) in size, not O( N ) like a full hash.

For example it seems obvious that you should be able to do a jump-ahead structure to get past half the depth. One obvious thing would be to hash the lookup key down to log(N)/2 bits, then make a seperate sorted list for each of the hash values.

10/16/2008

10-16-08 - 3

I don't really understand why people like the MacBook. They're way overpriced (like the ThinkPad), about 50% over the price of a similar Dell and double the price of a similar no-name. They're fucking widescreen which I hate. Hello !? WTF, text pages are fucking vertical. 4:3 is about the widest usable aspect, and in fact I actually like 3:4 better; when I had a monitor that would rotate vertical I would do it a lot, and it's super nice for web pages or reading PDF's or something. Widescreen on a laptop is useless and retarded, it means I get like 50 lines of text vertical. I'd like a 16" laptop that does 1600x1200. Then they do the fucking retarded awful thing of having a wide form factor, but not using all that width for the keyboard ! WTF ! The tiny laptop keyboard is one of the worst usability problems and you're making the thing fucking widescreen, make the keyboard wide! Oh no, we need one and a half inches of worthless aluminum on each side. No comprendo.

10-16-08 - 2

I've seen a lot of bad movies recently (such as "That Obscure Object of Desire" and "A Scene at the Sea" , wow, huge wastes of time). Fortunately that streak ended last night with "Ratcatcher". It's a simple contemplation of growing up in squalor. It's full of quiet moments of painful beauty.

It follows in the line of "Kes" by Ken Loach or the early TV stuff by Mike Leigh; there are quite a few movies about poor kids in the north of the UK; another good one is "Small Faces". But the imagery of "Ratcatcher" is very poetic.

10-16-08 - 1

I've been reading through the papers from Interactive Raytracing 2008 . The talk slides posted there are all pretty worthless, but you can copy-paste the names of the talks into Google and find the actual papers. The one on BSP's is pretty good, there's a lot of solid stuff. ADDENDUM : Here are all the papers

I like the paper by Andrew Kensler on tree rotations to optimize trees with simulated annealing. (there's a lot of good stuff at his site if you click his name). It reminded me of this old talk by Sedgewick on Left Leaning Red Black Trees ; I can't remember if I linked that up when I first read it, it's very elegant and a nice clean way to understand self-balancing binary tree algorithms. Not really super related to the Kensler paper, just fired that neuron for me.

The simulated annealing thing is a fun approach. It can be used on any kind of problem that you can quickly find an approximate greedy solution to, but where the true global optimum search is very hard, but you can locally measure how a change affects the global cost. It should now be obvious to the reader that the LZ optimal parse we just talked about is one of these problems. If you think of it as a shortest path search, you can see that you can seed the path with a greedy normal parse to get an initial decent solution, then you could consider making local changes and see if they improve it, like replacing any match with a shorter match and then filling in the blank with the best match there. If you just make all the local improvements that improve data rate, you will get closer to the true optimum but you won't reach it because you'll be stuck in a local minimum, so simulated annealing to the rescue to let you agitate out of the trough and temporarily screw up your LZ code stream before you settle back down.

10/14/2008

10-14-08 - 2

Urg; Nifty Perforce looks pretty cool, but I can't use it because its VS 2005 and I'm stuck on 2003 because the new version of Visual Assist is like completely different than the one I'm used to. I really fucking hate relying on anybody else's programs because they always fucking suck. The only decent thing in the world are the things I make myself. Or something less extreme and crotchety.

Anyway, if you're on VC 2005 then Nifty Perforce looks like it fixes a lot of the P4SCC awfulness, basically by just being much less obtrusive. It has the only integration feature that I actually want, which is the automatic checkout on edit of readonly files.

10-14-08 - 1

You need youth to innovate. I used to think about all the many things that computers could easily do that they didn't; I would rant and rave about how we could so easily be doing this and that if only people would get their heads out of their asses. Now I just wish I could do the few simple things I like to do without it being a frustrating ordeal.

10/13/2008

10-13-08 - 3

"bool" in C++ fucking sucks. For one thing, its size is undefined, so you can't ever store it, and yet tons of people do, which makes huge bugs if you ever switch compilers and the size of bool changes. (that "size of bool is undefined" was a stroke of genius by the standards committee, assuming their goal was to cause maximum heart damage). Another problem is that it silently converts into "int" which is pointless and evil so far as I can tell. The last big problem is that if you're trying to make code that works in C++ and C there appears to be no good way to set up a "bool-alike" that works in C.

I want a fricking thing that is either on or off, and can only be converted between ints with an actual boolean operation, like

int x = b ? 1:0;

or

bool b = (x >= 3);

If I was only in C++ I could fix the size thing. The best way to do it is to make your own class bool, like :


class Bool
{
public:
	Bool( bool b ) : m_val(b) { }
	
	operator bool () { return m_val ? true : false; }
	
private:
	S32 m_val;
};

which makes you act like bool but gaurantees a fixed size. But you can't integrate that with C.

To integrate with C I see only two options, either typedef Bool to an int, or typedef Bool to an enum. The enum is different but not really much better; enums still silently convert to int, but at least it does prevent int conversion to bool, but then it also doesn't convert actual booleans to the enum.

Example of bool horribleness :


enum EBool
{
	eFalse,eTrue
};

void testb(bool x,int y);
void testi(int x,int y);
void teste(EBool x,int y);

static void test()
{
	testb(true,3); // good
	testi(true,3); // silently bad
	teste(true,3); // error

	testb(4,7); // warning
	testb(1,7); // silently bad
	testi(4,7); // good
	teste(4,7); // error

	testb(eTrue,3); // silently bad
	testi(eTrue,3); // silently bad
	teste(eTrue,3); // good
}

It's easy enough to say "don't use bool" but I don't see a way around it. Obviously I could not ever store bools in structs (though that's hard to enforce and just leads to evil things like BOOL). The worse situation is function args like in our example above. I want to be able to make a function that takes a bool arg, and actually *requires* the client to pass in either "true" or "false", not some random int (or 0 or 1). In particular, when I change a function signature, I want compile errors when people call with the old signature.

In C++ I can do this :


class Bool
{
public:
	explicit Bool( bool b ) : m_val(b) { }
	
	static const Bool T;
	static const Bool F;
		
	bool operator == ( const Bool & b ) const { return m_val == b.m_val; }
	bool operator != ( const Bool & b ) const { return m_val != b.m_val; }
	
	bool operator ! () const { return ! m_val; }
	
private:
	S32 m_val;
};

const Bool Bool::T( true  );
const Bool Bool::F( false );

#define True	Bool::T
#define False	Bool::F

Which actually works pretty well and accomplishes most of what I want, but it's a little clumsy and doesn't work with C at all.

10-13-08 - 2

OMG the VC perforce integration is such a fucking disaster. Any time I open an old project from Oddworld it's like 15 minutes of clicking dialog boxes that randomly pop up then waiting while it stalls out trying to find the fucking perforce server for those files. ARG.

old rants