10-09-09 - Misc

I just rebuilt my old LZP exes with VC 2005. No code changes at all. (the last build was 98-99 , I think that was VC6) I was a little shocked at the file size difference :

w:\exe>ziplist lzpexes_old.zip
Archive:  lzpexes_old.zip
 Length    Date    Time    Name
 ------    ----    ----    ----
  53248  10-20-99  11:18   lzp1asm.exe
  45056  12-08-98  13:24   lzp1.exe
  65536  12-08-98  13:25   lzp2.exe
  45056  12-08-98  13:29   lzp3.exe
  41472  10-10-98  18:13   lzp3o2d.exe
  60857  10-10-98  18:13   lzp3o2.exe
 ------                    -------
 311225                    6 files

w:\exe>ziplist lzpexes.zip
Archive:  lzpexes.zip
 Length    Date    Time    Name
 ------    ----    ----    ----
  90112  10-12-09  13:07   lzp1.exe
 102400  10-12-09  13:07   lzp2.exe
 102400  10-12-09  13:07   lzp2d.exe
 217088  10-12-09  13:15   lzp3.exe
 217088  10-12-09  13:15   lzp3d.exe
 192512  10-12-09  13:15   lzp3o2.exe
 192512  10-12-09  13:15   lzp3o2d.exe
 ------                    -------
1114112                    7 files

One of the first ads in the IEEE magazine this month is for NVidia Tesla rack-mount server blades. I'm not sure who exactly is buying these GPU computers but apparently somebody is. They're obviously awesome in terms of flops per watt, which is the real important metric, but there aren't that many applications that can get a good win from them currently (almost all of those applications are some kind of finite element simulation).

It is interesting the way parallelism and GPU computing have brought us back to brute force. It's sort of like how doing custom clever hardware keeps getting beat by cheap off the shelf cobbled together parts, similarly doing custom clever software can now be easily beat by just doing brute force and going wide. For example, I've been looking at some fancy high-dimensional nearest neighbor acceleration structures such as approximate KD tree searches, BD trees (box decomposition) and VP trees (vantage point) (see the nice free packages ANN or Ranger ). They in fact do work great on single cores, but you can beat them easily by just doing dumb brute force and going super wide.

People are so selfish and unashamed. There's this weird intersection where Rainier Ave hits Jackson and 14th and turns into Boren that I go through most days on my way home. Yes, it's kind of weird there, I grant that. But I've seen three different times now some jackass gets confused and is in the wrong lane, like he's in the left lane and suddenly realizes he needs to turn right - and the jackass either slams on his brakes and puts on his turn signal and stops in the middle of flowing traffic, or just cuts right across dense traffic to make the turn anyway.

WTF. You fucked up, you got confused, you were in the wrong lane - you don't get to turn now ! Fucking go past and pull out of traffic and get your shit together and come back when you can make the turn without fucking stopping in the road. I just don't understand how someone can think that's okay. You fucked up, now you have to eat the pain yourself. You don't inflict your fuckup on the rest of the world. I miss turns myself from time to time, but I fucking keep going and circle back around, I don't make other people suffer for my mistake. It's fucking selfish and inconsiderate.

Similarly, if you are a fucking tard who sucks at parallel parking, don't try to do it on a busy street where you jam up a million people waiting for your dumb ass to pull in and out over and over while you fail to park. You should be ashamed of your suckitude and you should pull off the arterial route so you can park in privacy and not inconvenience us all.


Jeff said...

Link with radlibc.c, and they will be small again...

Rubanets Myroslav said...

>you can beat them easily by just doing dumb brute force and going super wide
i think its not quite true for 2D and may be for 3D. I did recently similar task for electronics industry. It was a spacing check for 2D rectangles with distances that depend on neighbor count. It has all-nearest-neighbor inside with limited distance. It performs almost linear without trees because data already binned and sorted :) But with polygon soup a brute force will not fly. It will have to carry a lot of state around each interaction. Or today's brute force is much smarter that I can imagine.
just my 2 cents.

ryg said...

Certainly, just throwing around O(n^2) algorithms everywhere won't work. But the point is that with GPUs, you don't have to go very far past that to get good results: since GPUs love predictable linear access patterns and have tons of compute throughput but very high memory latency, linked data structures are very much at a disadvantage. A GPU will still be slow if you just test everything against everything. But you can for example put everything into a coarse uniform grid first, then do the N^2 tests between whole grid cells (limited distance means there's a limited number of adjacent grid cells to test).

If you look at GPU-based algorithms, a lot of them do exactly this. Having no acceleration structure is still slow, but the "low-tech" approaches with their predictable dataflows work very well on GPUs, while "tree-style" dataflows with lots of conditional branching and jumping through memory suffer.

old rants