12/30/2008

12-30-08 - More Technical Junk

SIMD Aware Raycasting is a really retarded name for this idea of using a polygon hull to accelerate volume ray casting. It's a good idea though. Basically you put a simple convservative outer hull around your octtree or whatever volume data. Then do your GPU raycast, you render the outer hull front & back and you rasterize out the ray start and end positions, then you read those to cast. This means casting many fewer pixels that just hit nothing, and means you don't have to march across empty space so much. It also is a good way to do the object-space transform. A lot of the volume rendering / GPU raytracing research is targetted to having one big giant rigid world that you just fly around in, which is silly. You at least need objects that can be rigid body transformed. This polygon hull method provides a good way to do that - when you generate the ray start/end points you can transform them into object space so that they are ready to go for the ray cast (you can also scale them to be inside the axial box of the object in object space so that they are ready to go as 3d texture coordinates).

I enjoyed the little "game" Reset by Roburky ; I use "game" in quotes because it's really an "experience". I don't really play many games any more, but I would enjoy playing little music-sync'ed experiences that have some interesting vision to them. (I'm also enjoying listening to the music of Trash80 as I work today). Sort of like the old Orisinal stuff in that it's really just an "interactive artistic moment". There's some interesting techniques in the Linger in Shadows demo; I'd love to see more creative and unusual render styles in video games.

I finally played a little Braid over the holidays. I didn't play enough to really feel like I have any opinion of it, but two things struck me : 1. I'm really going blind and need new glasses, because on a non-HD TV I was having trouble seeing WTF was going on (though my brother could see it just fine), and 2. it's really got a coherent artistic vision, and everything in the game, from the music to the render shaders to the art style, and the story and gameplay, all work together very well. It's something you almost never see any more. So many games these days are just a random hodgepodge of art and play elements and controls and GUIs that aren't coherent, don't match the universe, and don't contribute to an overall feeling.

I almost always write threaded code in a master/slave paradigm. That is, there is some master thread which is in charge of owning object lifetimes and creating slave threads and so on. I've never written anything significant which really does massively parallel cooperative multithreading, and I hope I never have to! Anyway, with almost all the Oodle stuff my threading paradigm is to make the library interfaces non thread safe, and then if you want to do things across threads you need to protect them for thread safety yourself; for example Files assume that only one thread is talking to them at a time. This is almost always the way I write threaded code - I don't like library calls that automatically do mutexes and such for me on every single call, I want to do it myself. I just realized today that I could actually use the single thread CRT in my multi threaded app. The MT CRT is really really really slow. For example, the MT CRT version of fgetc is 170 clocks instead of 10 clocks in the single threaded version. I can just use the single threaded CRT and protect it from bad threading manually. (I probably won't actually do this, instead I'll just avoid using the CRT altogether, but if for some reason you wanted to write a really high performance threaded app and still use the CRT, then I would recommend not using the MT CRT).

12-29-08 - Junk

Bill Clinton was perhaps the best Republican president this country has ever had.

Israel's overreactions to nuisance attacks from Hamas has made them a liability to US global interests. It is no longer a tenable political position to stand by them. This is not a question of who's right or wrong or morality, it's simply realpolitik.

Any reasonable investment these days returns far less than inflation (eg. a CD is around 3% which is not enough). Any money you have sitting around is losing value in real dollars. We may well be in a period similar to 1964 to 1984 where we could go 10-20 years with investments either showing no gain or a loss in real dollars. The only good option is to spend immediately on hookers and blow before the money loses more value.

Capitol Hill Snow Slideshow at Seattle Weekly ; most of these pictures are from right outside our house. I love the way people go nuts here when the weather's wacky. When it dumps snow nobody drives, everything is cancelled, and people just play. (people go nuts here in the summer too when the days are so long and it's too hot to sleep, everyone stays up late and wanders outside). We were flying that day, but I just wanted to cancel it and stay home and have snowball fights and drink cocoa. It's funny walking around and seeing the cars abandoned on the steep hills where people realized they couldn't make it (why wouldn't you just go back down to the bottom of the hill and park properly?).

I hate video games that change my resolution to go full screen. My laptop runs on a 1920x1200 external LCD. Any res except native looks like pure ass (the exception is 1600x1200 or 800x600 in center mode, not stretched). Furthermore, because I run over analog VGA, changing res causes the monitor to go through its autoadjust which takes a solid 30 seconds or os. That means absolutely any resolution change pisses me off (and it has to adjust when the game starts, then again when I quit, and when it crashes, which is like inevitable, it always leaves me in fucking 800x600 desktop mode and screws up all my icons; URG).

For my own 3d work I've always used the method of just grabbing the desktop res and going fullscreen with that res (then let the user down-res if they want, but never change the res unless they ask for it). But my GPU is too slow to run most games at full 1920x1200. The way that I'd like to see more games handle this is to let me stay in 1920x1200 but render only a portion of the screen. eg. 1600x1200 centered would be okay, but hell you could also just let me do a 1600x1000 wide aspect centered window and that would be fine too. Actually while I'm at it, I just wish most games and 3d apps didn't go fullscreen AT ALL without me telling them to. The default initial startup should always be windowed mode (since I'm going to have to watch it load for half an hour, I'd like to keep the use of my web browser thank you very much).

One of the weird things about Vegas is the way it's just a complete shambles as soon as you step off the strip. In fact, they don't even try to hide it much on the strip, like even at the Bellagio if you look behind a bush you'll see the walls are some kind of paper-mache plaster kind of thing and there are just gaps and dented spots all over. If you look out the back windows there will be the heating/cooling plants, piles of construction equipment, etc. Off the strip it really reminds me of Cancun or Mexico in general - piles of dirt, construction equipment everywhere, tons of stuff half built that nobody seems to be working on actively.

I hate that I'm writing all this nonsense. I hate sitting at the computer all the time, it gives me no happiness. I just don't know what else to do with myself. It's miserable outside. My shoulders are wrecked, probably forever, I can't work out and I'm fat. I feel like I'm not doing good work right now and I'm racking my brain trying to fix my coding process. When my work suffers I get depressed since it's about the only thing that gives me any sense of self worth.

I'm trying to avoid doing allocations in CINIT ; we've made our own allocator now just fail if you call it at cinit, but that's not catching everything for me because of the difficulty of making sure you've replaced every bit of code that tries to use "new" or "malloc" or whatever. One annoying thing I found : almost every STL container can be declared static in a file and it doesn't do any initialization, the constructor just zeros out some members - but hash_map actually does an alloc :


static std::hash_map<int,int> s_hash;

will allocate a 193 member table. That sucks. Empty containers should not do any allocations IMO since there are many usage patterns where you might declare a container but never actually put anything in it. (note that of course in real Oodle I don't use any STL, this was just in some test code).

Does the gmail "Report Spam" actually do anything? In any case I'm pretty happy with my gmail email pass-through.

When I see a "baby on board" sticker, it makes me really want to ram my car into the person. Is that wrong? Also, 1980 called, they want their sticker back.

I mentioned before how I think the bilateral filter is a pretty mediocre way of denoising or super-res'ing images, because it basically gives you a piecewise constant model. On the other hand, it's a great way to deal with things that actually ARE piecewise constant - and lighting in graphics is pretty close to piecewise constant. (a symmetric bilateral filter can do piecewise linear ; really what you want is piecewise quadratic). There are some good new papers around about new realtime GI techniques.

The new Fog Creek Office is pretty nice, and as usual Joel is right on about spending money to make programmers more productive. I disagree with some of the details, they don't really need to spend so much on fancy design and the common areas, but stuff like the mechanical height adjustable desks is super awesome. Of course you've got to offer a lot of perks to get people to work on FogBugz. ZOMG that's got to be some of the more boring programming work in the universe. Also, WTF how many people does it take to make fucking FogBugz !? We evaluated a bunch of bug trackers at OddWorld, and FogBugz was okay, but it's very simple and very restrictive, it's not overloaded with crazy features the way the competitors are; it literally looks like something written by 1-3 people (1 Casey or 3 normal people). I guess it's because if you're a programmer working on FogBugz you only spend 1 hour a day actually working and you spend the rest of the day dreaming about quitting or killing yourself.

12/28/2008

12-28-08 - Capitalism Reform

A lot of the problems of the corporate system can be traced back to the fact that the individual actors cannot be held responsible. Executives basically have carte blanche to commit crimes behind the corporate veil, and the liability is subsumed by the corporation. Corporations can be spun off with debts and legal liabilities, corporations can be dissolved and reformed, moved to other countries or states where the laws treat them better, etc.

The basic idea of capitalism is that if individuals make decisions in their own best interest, the total value of the system is maximized. That does not hold true when you have a government structure which allows individuals to take the profit and pass on the risk (such as all the mortgage brokers who got commisions on loans but passed on the actual loan risk to others ; if they actually had to offer the loan capital themselves they would have been more careful and this never would have happened).

I've been thinking about this a while, and something that occurs to me is that there's really no need for the whole idea of a "corporation", nor is a there a need for "stock". The corporation is an artificial construct which acts like an individual but actually has many more rights than an individual.

Who needs corporations? Instead the companies should just be owned by individuals. In the end, that individual owner is fully responsible for the actions of the company. The company can be sold to another, but the legal liability for actions during that time sticks with the single owner. Corporations can no longer do things like pay out dividends when they have massive debt. Sure, they can still pay their owner, but there is no "corporate veil" - the owner is liable for all the corporation's debts, so the money is not being hidden away in a place it can't be retreived.

Who need stocks? Stockholders (and brokers) don't have the ability to really analyze companies and pick stocks well, they're just gambling. What do companies need stocks for? To raise cash for expansion. Well they can do that just by issuing bonds. Individuals can buy the bonds if they have faith in the company. The company gets cash, and the individuals get a return on their investment and capital flows to where it's needed just fine with no need for stock. The good thing about this is the company has a straightforward debt to the bondholders, and the bondholders are offered a clear risk vs return in buying the bond.

Furthermore, bond issues should go through a rating company, and that rating company should be required to insure those bonds! We got rid of stocks so there's no longer a problem of all the bogus stock hawkers, but you still might have a problem of bogus risk ratings on bonds. That's easy to fix - make the bond rater take the penalty for mis-rating the bonds. Boom rating qualities will be far more accurate and more conservative. The insurance goes like this : when you buy a bond you can choose to just buy it normally without insurance, but if you choose, the bond rater is required to give you insurance at a fixed fee based on the rating that they gave, eg. AAA= 1% fee, Baa = 3% fee or whatever; the point is that by assigning a rating they are basically picking how much vig they would need to profitably insure that bond.

You can still have things like "hedge funds" - the hedge fund manger just personally owns the funds, various people give him big loans, he runs the money and pays back the profit.

Now I'd also like to get rid of the Fed entirely and get rid of the FDIC but there may be more complaints about that. Getting rid of the FDIC is relatively easy, you just require banks to purchase insurance at market rate from various insurers instead of giving it away for basically free. Also getting rid of corporations means the owners are personally responsible for debts if the bank defaults so they would be less likely to default.

ADDENDUM : yes I know this is not practical or realistic, but I do think it's interesting as a thought experiment. I believe most of the reasons that people cite for why "we need stocks" are bogus.

1. "We need stocks to create enough aggregate wealth to fund big projects" ; not true, in fact even today those kinds of big capital-intensive projects don't raise money through stocks, they raise money from hedge funds and other large private investors; it's always been that way.

2. "We need stocks to reward entrepreneurs and early employees" ; if you mean that you need stocks to randomly massive over-reward some people and not others, then perhaps, but really this is just a big randomizer. Without stocks entrepreneurs can still get a huge payday by selling their company, or if it's actually a profitable company, they can just keep ownership of it and make a profit with it! NO WAI. In fact, the real benefit of stocks here is for people who create bogus non-profitable companies and somehow manage to convince others that it has good prospects and the stock should be worth more than it is. Early employees could easily be given profit-sharing agreements or conditional bonds which would reward them handsomely. In fact, making all these rewards more accurately tied to the real value of the company would improve the efficiency of capital markets.

3. "Stocks provide a way for the average individual to invest and grow their wealth" ; Individual selective stock ownership has been pretty widely shown to be a bad thing. People don't have the time or skill to invest wisely, and so are best off just investing in large funds. You could just as easily invest in large funds without stock ownership.

In fact, so far as I can tell, the biggest benefit of stocks and public corporations is in fact the exact thing I'm trying to eliminate - that is, responsibility for actions. If the individual running the company really had to take responsibility, they would be far less willing to take risks. Lots of exciting companies might never have happened because they were too risky. The ability to create a diffuse corporate veil is in fact very beneficial in some ways, because some amount of illogical risk is actually good for the economy. (it's an old saying that if people really understood how hard and risky it was to start a company, nobody would ever do it).


Let me illustrate foolish financial risk taking through an analogy to gambling. (Gambling with an edge is just "investing").

Consider a simple thought experiment. You are given the opportunity to bet on a coin flip where you an edge (in this example you win 53% of the time). You start with 1.0 monies. You get to bet a fixed fraction of your bankroll on each flip. eg. you could bet 1/4 of your bankroll on every flip, or 1/5 on each flip, but you aren't allowed to change the fraction. Note that as long as you pick a fraction < 100% you can never go broke. You must always bet all 100 flips.

What is the best fraction to bet ? Well, if you only care about the *average* return, the best fraction is 100%. That should be obvious if you think about it a second. Every extra dollar that you can put in this bet means more profit on average, so you should put as much as possible. But at the same time, if you bet 100% you almost always go broke. In fact only once in 2^100 do you make any profit at all.

It's a little unclear exactly what metric to use to decide which strategy is best, so lets go ahead and look some numbers :

winPercent : 53% , num bets = 100, num trials = 262144
fraction average sdev profit% median
1/ 2 19.22 1051.56 1.69% 0.00
1/ 3 7.24 422.89 13.48% 0.02
1/ 4 4.43 58.10 24.17% 0.18
1/ 5 3.30 20.66 30.92% 0.44
1/ 6 2.70 10.43 38.30% 0.67
1/ 7 2.35 5.79 46.10% 0.85
1/ 8 2.11 3.94 46.03% 0.97
1/10 1.82 2.30 54.18% 1.10
1/12 1.64 1.61 53.96% 1.17
1/14 1.53 1.24 61.69% 1.19
1/16 1.45 0.99 61.75% 1.20
1/19 1.37 0.77 61.94% 1.19
1/22 1.31 0.62 61.76% 1.18
1/25 1.27 0.53 62.10% 1.17
1/29 1.23 0.43 69.36% 1.16
1/33 1.20 0.37 69.22% 1.15
1/38 1.17 0.31 69.21% 1.13
1/43 1.15 0.27 69.40% 1.12
1/49 1.13 0.23 69.25% 1.11
fraction = fraction of bankroll placed on each bet
average = average final bankroll
sdev = standard deviation
profit = %% of people in the trial who had any profit at all
median = median average bankroll

(note the sdev for the 1/2 and 1/3 bet cases is very inaccurate ; I foolishly did a monte carlo simulation rather than
just directly counting it which actually is easy to do with the binomial theorem; the average return is exact)

It's easy to see in the table that the average profit is maximize for very large (risky) bets. But if you look at the profit% to see what portion of the population is benefiting, you see that in the risky bet cases only a tiny part of the population is benefiting and the high average is just because a very few people got very lucky.

It seems to me just intuitively that maximizing the median seems to produce a pretty solid strategy. For a win percent of 53% that corresponds to a bet fraction of 1/16. Another sort of reasonable approach might be to choose the point where > 50% of the population shows a profit, which would be at 1/10. (these both also occur right around the area where the sdev becomes smaller than the mean, which might be another good heuristic for picking a strategy). Using much smaller fractions doesn't really increase the profit% very much, while using larger fractions very rapidly increases the variance and descreases the profit%.

In any case I believe the analogy to finance is amusing. In a simple experiment like this it's trivial to see that when everyone is taking too much risk, it might be very good for the *whole*, but it's bad for almost every individual. It can be very profitable for a tiny portion of the population. It's also trivial to see here that any measure of the sum of the population (such as the GDP) is pretty useless when really measuring how well your strategy for the economy is working.

ASIDE: I used the fast median code from here . There are basically two sane fast ways to find a median. In most cases the fastest is the histogram method, which is basically just a radix sort, assuming you have 32 bit values or something reasonable like that. Basically you do a radix sort, but rather than read the values back out to a sorted array, you just step through the histogram until you've seen half the values and that's your median. If you can't do a radix for some reason, the other sane way is basically like quicksort. You pick a pivot and shuffle values, and then rather than descending to both sides you only need to descend to one side. This is actually still O(N) , it's not N log N, because N + N/2 + N/4 ... = 2*N which is of course O(N). Both of these methods can find the Kth largest element, the median is just a special case.

12/27/2008

12-27-08 - Temper

Temper is one of the stranger words in English, since it can mean both something and its opposite. This struck me while reading Lonesome Dove, which uses the expression "out of temper" to mean that someone's patience was exhausted, as in "dern, this bronc been trying to throw me all day and I'm plum out of temper".

In this usage "temper" means something like "composure". But when someone is said to "have a temper" it means they are often "out of temper" or often "lose their temper". (this obviously suggests a riddle, something like - "what can a person have even when they lose it?").

There seem to be a variety of unrelated usages : (only giving examples of usages still common today)

Temperate climate
Temperance movement
The Well Tempered Clavier
Tempering Steel or Chocolate
Hold your temper / lose your temper / out of temper / have a temper / be in good temper / temper tantrum

Interestingly, temper the verb used to be more common than temper than noun, and the primary meaning was to "mix" or "moderate" or even "compromise", such as in tempering the sweet with a little sour. This usage is now rare.

Here are some modern dictinary definitions of the archaic usage :

9.  to moderate or mitigate: to temper justice with mercy.
10. to soften or tone down.
11. to bring to a proper, suitable, or desirable state by or as by blending or admixture.

And some from the 1828 Webster's :

    1. To mix so that one part qualifies the other; to bring to a moderate state; as, to temper justice with mercy.

    2. To compound; to form by mixture; to qualify, as by an ingredient; or in general, to mix, unite or combine two or more things so as to reduce the excess of the qualities of either, and bring the whole to the desired consistence or state.

    Thou shalt make it a perfume, a confection after the art of the apothecary, tempered together, pure and holy. Ex.30.

    3. To unite in due proportion; to render symmetrical; to adjust, as parts to each other.

    God hath tempered the body together. 1 Cor.12.

    4. To accommodate; to modify.

The "Tempered Clavier" of Bach seems to stem from this usage; it's a way of mixing the ideal tuning for the different keys to create a single tuning that's not quite right for any of them, but is okay enough for all of them; it's a "well mixed tuning" if you will.

Tempering Chocolate obviously comes from analogy to tempering metal, and in fact they are sort of similar, in both cases you are controlling the crystal formation by raising and lowering the temperature carefully through a small range. Understanding the origin of tempering metal is not obvious, but maybe it comes from the "compromise" origin like Bach's usage. Tempering metal is a way to acheive a good mix of hardness and softness that keeps it from being too brittle. (BTW many of the standard dictionary definitions for temper in the metallurgy usage are just wrong; you'll see definitions like "the degree of hardness" ; The Barbarian Keep has a nice little thing on tempering and a rant about misuse).

We're still left with the problem of the meaning of "temper" in reference to moods. The Webster's 1828 definition of temper has the two opposing meanings for the noun :

3. Calmness of mind; moderation.
4. Heat of mind or passion; irritation.

For laughs Wordia has the same two meanings but in opposite order :

3) noun,  a tendency to exhibit uncontrolled anger; irritability
4) noun,  a mental condition of moderation and calm

It seems to me that the meaning with respect to moods was originally "moderation", and perhaps just misunderstanding of the expression made it flip.

Another posibility involves another meaning of "temper". Temper can also just mean "mood or state of mind", thus you could have an ill temper, a good temper, a magnanimous temper, a generous temper, etc. Over time temper may have been mainly used in the form "bad temper" and "ill temper" and thus simply become "temper" meaning "bad mood". ( some people still use temper with other adjectives, but this is no longer standard English; presumably phrases like The Heroic Temper would now mainly be rendered as The Heroic Temperament, at least in America; I find a lot of usage in Australia of "Temper" in the more general sense).

Anyway, this leads to funny possibilities, such as : To temper his image of having a temper, Giulani shows good temper . When confronted with intemperance, to show you don't have a temper you must keep your temper.

12-27-08 - Financial Quackery

I know very little about finance. It's time to play Financial Quackery. Put on your tin foil hat.

I believe the US economy is still in far worse shape than the mainstream media or politicians are saying. The reality is that we have been sustaining a period of artificial growth by taking out massive loans against our future. We have not been spending on education or research or infrastructure, and we now have a huge population that thinks it deserves to own a house and drive a nice car with absolutely no skills or hard work. There is hardly any sector of the US economy that can be said to be a real healthy engine of profit and economic growth. The biggest parts of our economy are phantoms : 1. Finance has been propping up the S&P for the last ten years and it's all an illusion. 2. Real Estate was just a bubble funded by inflationary dollars printed by the Fed. 3. Health Care is a huge part of GDP but is really just a drain. 4. Consumer Spending is just sending borrowed dollars back to China with interest, and consumer spending can collapse. 5. Service jobs in general rely on big spending from a small part of the population and high-paid service industry is not sustainable.

To get out of a recession you need real productive industry that makes something the world wants, and we don't have it (or only < 1% of us do). As I've said before, the long-term low interests rates which the Fed has used for the past ten years has put us in a trap where Stagflation is just about the only possible outcome. Our economy is already pumped full of free cash which has nothing productive to do, we've been running on empty and borrowing to keep up standard of living, and now we have tons of worthless paper.

Basically every financial number that's reported in the mainstream media these days is cooked to make it look better than it really is. Everybody knows the unemployment number has been bogus for some time; it's not 6%, it's really more like 12%. All the investment return numbers look way better than they really are because they don't include inflation, fees, & taxes. Furthermore, the widely reported inflation number is bogus. The other big thing they do is fail to adjust for population changes; so you'll see things like the number of jobs increasing, but it actually descreased as a percentage of population.

I've written about how the standard inflation measure is bogus here before. Anybody who has been alive for the past 10 years and has a brain should know that it's nonsense. The nominal 3% inflation would be 34% over the last 10 years. I think it's more like 100% over the last 10 years if you actually compare apples to apples. The normal CPI inflation measure assumes that you gave up the nice house you had in Travis Heights 10 years ago for which you paid $500 a month and you moved to Detroit in order to keep the inflation measure low.

(BTW 100% over 10 years = 7% annual, which is pretty close to the measure under the "pre-Clinton CPI")

Some reference :
SafeHaven article on the CPI by John Mauldin - see also his other newsletters ,
ShadowStats article on the CPI by WJ Williams - also other good articles by the same guy on other misleading stats

Even the people who do show inflation adjusted numbers tend to do so against the CPI for lack of a better accepted measure. Some graphs of "real returns" (sort of - some of these don't count dividends, and they all use the CPI which is bogus, and most don't count taxes or fees) :

Simple chart of DJIA
Intelligent Bear detailed chart
More optimistic log scale chart - where he accounts for reinvestment of dividends.

There have basically been two primary sources of the false growth in the last 10 years. One is just lies. Distorted figures and misreporting make the growth look good even though it's not really there. The Fed basically prints a bunch of money, the financial industry books it as profit, and then they lie about the inflation and we all think we're getting richer. The other are risky leveraged bets.

Basically the corporate system is completely broken. I've written a bit about this before and will probably again, but if you think about it in a game theory sense from the standpoint of an executive - why in the world would I ever do what's good for the company or the country? I can get huge bonuses based on short term profit, and then I can leave the company or sell it or just let it go bankrupt, I get to keep my profit, what do I care if I wrecked the company? My only motivation is to maximize short term returns, and of course that is exactly what they do. We've always known this, and executives have screwed up many companies by not having long term vision, cutting R&D, laying off too much staff to cut costs, etc. What happened in the last 10 years is they just got much more clever about it. When your core business can't make money because the entire economy is in the shitter, what do you do? You take out loans and make massive leveraged bets in risky markets to possibly get a big profit. The actual EV of these moves is maybe zero or negative, but the thing is for the executive they get a big payout if they win the bet, and they get almost zero penalty of they lose the bet. It's gambling for free, or rather gambling where they get the profit and the shareholders and taxpayers pick up the loss. I can't blame them for taking that bet, obviously it's a winning proposition for them.

12-26-08 - In Defense of Less Typing

In the C++ rant we got a bit into the discussion of "that just reduces typing". It's become a common wisdom these days that "anything that just reduces typing is not of significant importance in programming". This is sort of a reaction to the bad old days where developers put too much emphasis on reducing the time it took to make a first draft of the code. Now people like to repeat the plathitudes that "first draft is only 10% of dev time, debuggability and readability and maintainability are what really matter".

Yes, yes, that is all true, but I think people miss the forest for the trees here in some cases.

Every character you type is a chance for a bug or a mistake. For one thing there are typos, but there are also just brain farts. The less you type when writing a piece of code, the more likely it is to be correct.

(I should emphasize the fact that reducing code duplication is very good for many reasons that I don't go into detail much in this rant, and those are mainly the main reason to merge duplicate code; I'm talking about cases where the code is not exactly duplicated, but is similar, or where you have a choice between making a very simple API which requires a lot of typing by the client, or a more complex API which has very simple usage for the client).

A lot of good programmers now are adopting the idea of exposing simple minimal C-style APIs that leave usage up to the client. There are a lot of things to like about that (for example, Sean's stb.h style thing for simple functionality is in fact wonderfully easy to integrate and steal snippets from), but there are also bad things about it. I think good programmers overestimate their ability to write simple usage code without mistakes. For example, you might think that you don't need a class to encapsulate a 32-bit Color, you can easily just use a 4-byte array or pass around a dword and do the shifts by hand - but I have seen bug after bug from small mistakes in that kind of code, because if you write the same thing over and over, or copy-paste and try to change a bunch of code by hand, there is some small chance of mistake each time you do it.

It's funny to me that good programmers in game dev are going in two directions at the same time. One direction is to make code very easy to follow by simple visual inspection. Many major people in game dev are pretty high on this these days. The basic idea is to use C-style imperative programming, make the action of each line obvious to simple visual inspection, reduce segmentation of code into function calls and prefer simple long linear code blocks (rather than factored out functions, conditionals, objects, etc). There are certainly merits to that. The other direction people are going is custom metaprogramming and local language redefinition. Many of these people want coding languages where you can locally redefine the rules of the language and do custom markup for things like network mirroring, thread fiber spawning, local-global state memory variable differentiation, etc. This kind of stuff would make code completely impossible to understand by simple visual inspection without intimate undestanding of how all the extra meta-language rules work. These ideas also have a lot of merit, because writing micro-threaded massively parallel code in plain C-style C++ is really difficult and ugly, and having custom features would make it much better - but these two directions are totally conflicting.

While I'm ranting about opposite directions, let me also rant about the idea that something is a good idea for "library design" but not for within your app (or vice versa). IMO Coding = Library Design. Most of the complex code that I write is in "libraries" for myself. Libraries are just chunks of functionality that you want to expose in a compact interface. Well, that's what you should be doing all the time. Coding is just writing a bunch of libraries, then the "app" is just tying together the "libraries".

So, for example, Casey's excellent talk about good library design (things like exposing multiple levels of interface from very simple to nitty gritty, and not forcing a usage pattern on the client) are just good ways to write code *always*.

I don't trust the me of one year ago, nor do I trust the me of one year in the future. I need to write API's for myself that make me write the right code. Part of that is all the things I've often written about before (self-validating API's, API's that are impossible to use wrong), but part of it is just plain less typing. If the API makes me (the client) write a whole bunch of code to do the simple things I often want to do - that makes it far more likely I will do it wrong.

Also I believe the illusion of choice is a horrible thing. If there's really only one or two reasonable ways to use a system, then just expose that to me. Don't give me what looks like a bunch of flexible components, but they only really work right if you do one specific thing.


Addendum : okay I'm bored of this topic and I'm sure you are too, but I feel like I started it so I should wrap it up a bit more.

Paul Graham has this thing "Succinctness is Power" that's sort of similar to this rant. As usual he writes it well, but I think he's a little bit wrong. The issue that I believe is important, which is what I'm trying to talk about here is :

Reducing the number of choices that the programmer has to make in order to write correct code.

Part of that is reducing typing - but the crucial thing is reducing typing when there is actual choice in the alternatives. That is, if it's something you *have* to type, that's not bad. For example a very verbose language like COBOL is not inherently bad due to its verbosity (cobol is horrible for other reasons).

Making code that works correctly with minimal typing (and makes compile errors if you use it wrong) is the goal. So part of what I'm getting at here is using short common words that it's easy for the programmer to get right, using highly constrained classes instead of very general ones, etc.

Part of the credo goes like this :


remove the option to do things that you never want to do

make it hard to do things that you rarely want to do

make it easy to do the right thing

As an example - iterators are cool even when they save almost no work. Say for example you have something like a doubly linked list class. Many of the simple C guys would say "hey you can just write code to traverse the linked list", and you write client code like :

for(Node * n = list.head; n != list.head; n = n->next)
{
    ...
}

That's easy right, you're a good programmer, you can just write that loop. No. You can't, that's what I'm trying to say with this rant. I mean, yes, you can, but you had a 1% chance of introducing a bug because you had to write code.

Writing code is bad.

Instead you could make your Node look like an iterator. You could give it standard names like begin() and end(). Now instead of writing code you can just use for_each , or even just copy-paste or spit out a loop that you've done a million times :

for(Node::iterator it = list.begin(); it != list.end(); ++it)
{
    ...
}

Is safer because it's standard. On a similar note, using a constrained object like an iterator is safer than using an int, because every time you use an int people get tempted to do evil things. How many bugs have I seen because people try to get clever with their loop iteration? Maybe they count backwards for efficiency and use and unsigned type by mistake. Or they pull the ++i out of the for() and then forget to do it due to a continue. Or they use the "i" outside of the for loop and bugs get introduced.

Lots of people are anti-OOP these days. I love OOP ; no, not deep inheritance trees and lots of data inheritance, and whatnot, but the basic idea of coding in terms of objects that impose constraints and conditions on their use. The best way for me to program is to build components and helpers which make expressing the program easy.

12/18/2008

12-18-08 - Open Source Just Doesn't Work

There's a Netflix Plugin for Media Portal . I dare you to even try to figure out how to install it and get it to run. And then there are some big problems with control and integration once you get it running. I'm trying to get the developer to fix it; fingers crossed, it would be nice to have that. I keep hearing the Xbox Netflix integration is really awesome. I might have to get an Xbox just for that. Yay.

12-18-08 - Christmas

It dumped snow last night; it's very pretty. Everyone is staying home and playing in the snow.

We're flying to see Alissa's parents tonight. Hopefully the snow doesn't screw up the airports. I'll have no computer. We can't get a cab because of the weather so I guess I'm going to have to try to drive in this. We're definitely going to die on the way.

Bleck I want to just make snowmen and throw snowballs! God I hate Christmas travel.


In calendar year 2008 my portfolio has done about -50%. That sounds bad, but it's actually worse than that. To get back its value it needs to go +100%. The loss amount looks smaller because it's measured relative to the larger initial value, not the small ending value. In fact, the percents are a bad way to show finance changes. A better way is the log (base two) of the multiplier (and then x100 to make it look nice).

So 0% = 1.0 multipier, log2( 1.0 ) = 0 ; no change

-50% = 0.5 multiplier , log2( 0.5 )*100 = -100

+100% = 2.0 multiplier , log2( 2.0 )*100 = +100

Under this counting, if you have a -100 year then a +100 year you're back where you started, which is more intuitive. (obviously, appreciation amounts combine by multiplication, so taking the log changes it to addition).

Addendum : in case you weren't sold on the log scale, it also gives you the easy intuitive way to do compounding (obviously). Say you have something that makes 10% one year. What does it do over two years? Well, even though we know it's wrong, our brain intuitively jumps to "20%". The real answer is of course 1.10*1.10 = 1.21 = 21%. In log scale 1.1 = 100*l2(1.10) = +13.75. What does it do over two years? 2*13.75 = 27.5

That may not seem like a help, but it's more significant if you look at monthly compounding or something. Say you have something that does 10% annually, what's the monthly compound? It's 1.10^(1/12) ; in log scale it's just 13.75/12. The daily compound is just 13.75/365. Nice. The problem is that getting between log scale and normal scale is annoying and non-intuitive, so as long as most reporting is in normal scale, log scale is hard to use.

BTW also one of the things finance people do which messes everyone up is they don't usually report inflation-adjusted numbers. Inflation adjusting over time is annoying in normal percents, but of course in log scale you just do -5 (3.5% inflation is about a 5 in log-scale).


So the boiler room in our apartment is usually locked, but the landlord (being the irresponsible screwup that he is) accidentally left it unlocked a few days ago. Alissa discovered it and checked out the settings on the boiler. It's programmable and all that where it comes on with different time schedules. It was set to 63 degrees.

12-17-08 - Manifesto

1. The State only listens to violence and destruction of property. Peaceful protest accomplishes nothing without the threat of action. (Supposed successful peaceful protesters like Ghandi or MLK were in fact empowered by the state's fear of violent civil unrest).

2. The State is primarily concerned with enriching and maintaining the wealth and power of the Elite. The State only changes when something threatens the Status Quo which benefits The Few.

3. The Rich are in collusion with The State. They have power over The State, they benefit from its actions. They are complicit in all acts of evil performed by The State.

4. It is the right of those being trampled upon to fight back with the means available to them. They may fight or protest to secure their ability to survive, or their freedom.

5. Theft and destruction of the property of The Rich is one of the valid ways for The Trampled to take action against The State.

6. The Rich earn the right to keep their posessions and power by making the system function well enough for the poor. If the poor become desperate enough to risk jail or injury, the rich have broken their obligation and the poor are morally justified in taking action.

12/17/2008

12-17-08 - Junk

tmpnam() and tmpfile() are making files in the root of my machine. According to the docs they're supposed to put files in your temp dir as set by "TMP" , but all I ever see is stuff like "\s38c.1" . The windows functions GetTempPath / GetTempFileName seem to work fine.


Is this legal ?


static int x = func(&x);

I did the Casey-style tweakable var thing where you watch your C files, and I want to be able to initialize a variable and also grab its address in just one statement. It works perfectly fine in MSVC, I just wonder if it will break on some platform; I can't seem to find anything in the C standard about whether the address of a variable is defined before the variable is finished initializing. (I know for example that you aren't "supposed" to use "this" during a constructor, but everybody does it and it works fine in every compiler I've ever seen).


It sucks that MSVC doesn't have fmemopen(). It would let me page data into a memory buffer and then fmemopen on that and give it back to clients who want to just read bits with stdio because they have functions already written that use stdio.

More generally, I wish stdio FILE had actually been defined to use function pointers the way it does in some GCC POSIX implementations . Then you could plug in anything and FILE would be a true virtual file. That way people could just write code to talk to stdio, and I could secretly pass them Oodle file handles and it would all just work.

Instead I have to make my own virtual file layer, and then maybe some #defines or something if you want to just stomp your stdio calls to Oodle.

The function pointer indirection is really not a performance cost, because getc will still be a macro that goes straight to a buffer, and the function pointer only needs to get called to fill the buffer. The stdio buffer these days is not actually a file system buffer - it's way too small, that's just not its job. The file system is buffering the disk in 256k chunks, stdio has a little 4k buffer whose purpose is to reduce the number of times you need to talk to the OS, just to cut down on function call overhead. I'm planning on using the same method for the Oodle virtual file, but with just a 256 byte or maybe 1k buffer, which is just there so you can do fast macro getc() and only rarely jump through a function pointer to do big reads or refill the little buffer.


I'm working on my laptop, which is way way slower than my awesome work machine. My Oodle test games runs at 10 fps on the laptop and 200 fps at work. It's letting me find lots of little paging bugs. Yikes. For real release testing I'm probably going to have to put in a mechanism to run at artificial slow framerates, and maybe also some randomized frame durations to really try to stress all possible orders of things occuring.


I'm annoying by having "Oodle" and "rad" in front of all my function names. It's nice to expose it to clients that way because it makes it so you never have possible conflicts with their definitions, but it's annoying for me during dev. It ruins typing autocomplete and browse info. I think "oh, I want to open this file" and start typing "ood.." and it gets me nothing, whereas I could just be typing "open.." and it would complete for me.

I might have to do the thing Casey did where I use nice short names internally for myself and then run it through a bunch of wrappers to expose long names to the outside world.

Back on Galaxy and all my old libraries, and back when I did C-style OOP at Eclipse I would always make the names of things show what system they're in. So all the Galaxy stuff is gFile, gVec, etc. I now realize that's bad. For one thing it's bad for the autocomplete and so on. It's bad for file name browsing. It's bad for pronouncability. But more importantly it doesn't help. In C++ you can just wrap all your junk in a namespace to prevent conflicts, and then within your namespace you can rock on with short names.

Furthermore, short generic names is better for metaprogramming and templates. If your functions are named Quat_Length() and Vec_Length() and so on, I can't write generic functions that work on both. But if it's just Length() and Distance() and such, I can write lovely generics. Even without templates this is a win because it lets you copy-paste code, or change your data types. Like say you have a chunk of code that works on Vec3. You decide it needs to work on Vec4 instead. If your functions are Vec3_Add and Vec4_Subtract then you have to change a ton of code. If it's just Add() and Subtract() you change the data types and it just works.

Algorithms are seperate from the data they work on. That's the big epiphany of Stepanov. In fact, Stepanov is a weirdo and thinks the most important thing is to find the absolutely minimal constraint that the algorithm places on the data it works on.

12/16/2008

12-16-08 - Libraries and Cinit

I need a kind of mini class-factory for Oodle. This is for when I load a paging bundle that's full of various resources, I need to make each one. (BTW there will be a "low level" usage pattern for Oodle where you just load a paging bundle and the game gets a handle to the bundle and does whatever it wants with it. This is for the "high level" layer that automates a lot of things for you but is optional).

I guess what I'm going to have to do is require the game to give me a creator function pointer that knows how to make all the objects. eg. it would give me something like


void * (*fp_factory) (int objectType);

and fp_factory would point to something like

void * GameObjectFactory(int type)
{
    switch(type)
    {
    case OBJECT_PLAYER : return (void *) new Player;
    case OBJECT_ENEMY: ...
    }
}

Or something. As a game developer I hate that kind of global registry, because when you add a new object type you have to remember to go update this global registry file, which becomes a frequent merge disaster for source control, etc. I really like having everything you need to make a game object in a single CPP file. That means objects should be self-registering.

The way I usually do self-registering objects is with a little class that runs at cinit. Something like :


#define IMPLEMENT_FACTORY_PRODUCT(classname)    namespace { ClassFactory::Registrar classname##registrar( classname::Factory , typeid(classname) ); }

then in Player.cpp you have :

IMPLEMENT_FACTORY_PRODUCT(Player);

That's all fine and dandy, but it's not so cool for me as a library maker.

For one thing, doing work during CINIT is kind of naughty as a library. I've been trying to make it a rule for myself that I don't use object constructors to do cinit work the way I sometimes like to do. It's a perfectly normal thing to do in C++, and if you are writing C++ code you should make all your systems instantiate-on-use like proper singletons so that CINIT objects work - BUT as a library maker I can't require the clients to have done all that right. In particular if I do something like allocations during CINIT, they might run before the client does its startup code to install custom allocators or whatever.

For another thing, there are problems with the linker and cinit in libraries. The linker can drop objects even though they are doing cinit calls that register them in global factory databases. There are various tricks to prevent this, but they are platform dependent and it is a little evil bit of spaghetti to get the client involved in.

I guess I probably also shouldn't rely on "typeid" or "dynamic_cast" or any other runtime type information existing either since people like to turn that off in the compiler options for no good reason (it has basically zero cost if you don't use it). So without that stuff I pretty much just have to rely on the game to give me type info manually anyway.

Bleh, I'm just rambling now...

12/15/2008

12-15-08 - Monday Randoms

I want Windows to always automatically change its resolution to the native display resolution of the LCD that I plug in. I just never ever want to run an LCD at anything but its natural resolution. Nobody should. It seems like this is an obvious standard feature everyone would want.

Insurance is by design minus EV. That doesn't mean it's by definition -EV. In fact, you can "win" by being reckless and driving like a lunatic or just crashing on purpose.

C is the fucking devil. I had this in my code :


static s_frameTimeAccumulator = 0.f;
s_frameTimeAccumulator += dt;

No warning. WTF. (You may notice there's a word missing after "static").

12-15-08 - Denoising

I've been playing around with denoising images a tiny bit. There's a ton of papers on this and I've barely only scratched the surface, but it strikes me that the majority of the researches seem to be doing silly things that are ill-conceived.

Almost all of them work in the same basic way. They create a prediction of what the current pixel should be with a local smooth predictor, let's call that 'S'. Then they take the difference from the actual pixel value 'A'. If the difference is greater than a certain threshold, |S - A| > T , they replace the pixel value with 'S'.

That's just very wrong. It assumes that images can be modeled with a single-lobe Gaussian probability distribution. In fact, images are better modeled as a blend of several lobes with different means. That is, there is not one single smooth generator, but many, which are switched or blended between based on some state.

Any single-lobe predictor will incorrectly identify some valid image detail as noise.

I like to make it clear that the problem has two parts : deciding if a pixel is noise or not noise, and then filling in a replacement value if you decide that the pixel is noise.

My feeling is that the second part is actually not the important or difficult part. Something like a median filter or a bilateral filter is probably an okay way to fill in a value once you decide that a pixel is noise. But the first part is tricky and as I've said any simple weighted linear predictor is no good.

Now, ideally we would have a rich model for the statistical generation of images. But I wrote about that before when talking about Image Doubling (aka Super-Resolution), and we're still very far from that.

In the mean time, the best thing we have at the moment, I believe, is the heuristic modes of something like CALIC, or the Augural Image Zooming paper, or Pyramid Workshop or TMW. Basically these methods have 6 - 100 simple models of local image neighborhoods. For example the basic CALIC models are : {locally smooth}, {vertical edge}, {horizontal edge}, {45 degree edge}, {135 degree edge}, {local digital}, {pattern/texture}. The neighborhood is first classified to one of the heuristic models, and then a prediction is made using that model.

We can thus propose a simple heuristic noise detection algorithm :

Bayesian Noise Detection :

N = current local neighborhood
A = actual pixel value

P(M|N) = probability of model M given neighborhood N
P(A|M) = probability of pixel A was generated by model M

let

P(A|N) = argmax{M} P(A|M) * P(M|N)

then classify A as noise if

P(A|N) < T

for some threshold T

(I don't specify how the P's are normalized because it just changes the scaling of T,
but they should be normalized in the same way for the whole image)

Note that a very crucial thing is that we are using the argmax on models, NOT the average on models. What we're saying is that if *any* of our heuristic local models had a high likelihood of generating the value A, then we do not consider it noise. The only values that are noise are ones which are unlikely under *all* models.

In a totally hand-wavey heuristic way, this is just saying that if a pixel is within threshold of being a locally smooth value, or an edge value, or a texture, etc. then it's not noise. If it fits none of those models within threshold, it's noise.

I started by looking at the Median Filter and the Bilateral Filter. There have been some cool recent papers on fast Median Filtering :
Constant Time Median Filter
Weiss Log(r) Median Filter
Fast Bilateral Filter ; Sylvain Paris and Fr�do Durand + good references
Siggraph Course on Bilateral Filtering

Those are all very worth reading even though I don't think it's actually super useful. The fast median filter approaches use cool ways of turning an operation over a sliding window into incremental operations that only process values getting added in and removed as you step the window. Median filter is a weird thing that works surprisingly well actually, but it does create a weird sort of Nagel-drawing type of look, with nothing but smooth gradients and hard edges. In fact it's a pretty good toon-shading process.

BTW the fast median filters seem useless for denoising, since they really only matter for large r (radius of the filter), and for denoising you really only want something like a 5x5 window, at which size a plain brute force median is faster.

Bilateral filter actually sort of magically does some of the heuristic cases that I've talked about it. Basically it makes a prediction using a filter weighted by distance and also value difference. So similar values contribute and disparate values don't. That actually does a sort of "feature selection". That is, if your pixel value is close to other pixel values in a vertical edge, then the bilateral filter will weight strongly on those other similar pixel values and ignore the other off-edge pixels. That's pretty good, and the results are in fact decent, but if you think about it more you see the bilateral filter is just a ghetto approximation of what you really want. Weighting based on pixel value difference is not the right way to blend models, it makes no distinction about the context of that value difference - eg. it doesn't care if the value difference comes from a smooth gradient or a sudden step. As others have noted, the Bilateral Filter makes the image converge towards piecewise-constant, which is obviously wrong. Getting towards piecewise linear would be better, piecewise bicubic would be better still - but even that is just the very easiest beginning of what the heuristic estimator can do.

NL Means is a denoising algorithm which is a bit closer to the right idea; he's definitely aware of the issues. However, the actual NL means method is poor. It relies on closely matching neighborhoods to form good predictions, which anyone who's worked in image compression or super-resolution knows is not a good approach. The problem is there are simply too many possible values in reasonably sized neighborhoods. That is, even for a moderately sized neighborhood like 5x5, you have 2^8^25 possible values = 2^200. No matter how much you train, the space is too sparse. It may seem from the NL Means formulation that you're weighting in various neighbors, but in reality in practice you only find a few neighbors that are reasonably close and they get almost all of the weight, and they might not even be very close. It's like doing K-means with 2^200 possible values - not good.

There's a lot of work on Wavelet Denoising which I haven't really read. There are some obvious appealing aspects of that. With wavelets you can almost turn an image into a sum of {smooth}+{edges}+{texture}+{noise} and then just kill the noise. But I don't really like the idea of working in wavelet domain, because you wind up affecting a lot of pixels. Most of the noise I care about comes from cameras, which means the noise is in fact isolated to 1 or 2 adjacent pixels. I also don't like all the research papers that want to use 9x9 or larger local windows. Real images are very local, their statistics change very fast, and pulling in shit from a 9x9 window is just going to mess them up. IMO a 5x5 window is the reasonable size for typical image resolutions of today.

BTW one thing I've noticed with my camera noise images is that the fucking camera JPEG makes the noise so much harder to remove. The noise looks like it's originally just true single pixel noise, but when it goes through the camera JPEG, that single-pixel peak is really unfriendly to the DCT, so it gets smeared out, and you wind up having noise lumps that look like little DCT shapes. To specifically denoise photos that have gone through JPEG you probably have to work on 8x8 blocks and work directly on the DCT coefficients. (also the Bayer pattern demosaic obviously spreads noise as well; ideally you'd get to work on the raw taps before jpeg, before the camera denoise, and before the Bayer demosaic).

ADDENDUM : a lot of the denoise people seem to be trying to perfect the Playboy Centerfold algorithm, that makes photos look extremely smoothed and airbrushed. Often if you're not sure a pixel is noise it's best to leave it alone. Also, all the methods that use a pixel-magnitude threshold value for noise are wrong. The threshold for noise needs to be context sensitive. That is, in smooth parts of the image, you might be able to say that a pixel is probably noise when it's off from expectation by only 1 or 2 pixel values. In chaotic textures parts of the image, a pixel might be off by 20 values or more and you still might not be able to say it's noise. The correct parameter to expose to the user is a *confidence*. That is, I want to do something like replace all pixels which the algorithm is >= 90% confident it can improve.

Another problem I've seen with the majority of the denoisers is that they create structures from noise. If you run them on just staticy junk, they will form local flat junks, or linear bits or weird feathery patterns. This is because even in random noise there will be little bits that have similar values so they become seeds to create structures. This is very bad, the weird structures that are formed by this "denoising" are much worse than the original static, which is pretty inoffensive to the eye.

Marc sent me the link to GREYCstoration a free denoiser based on the image manifold PDE research. I don't like that this technique is becoming known as "PDE" - PDE just means partial differential equation; in this case it's a diffusion equation, in particular a variant of anisotropic diffusion with different diffusion rates based on the local curvature - eg. it diffuses across smooth areas and not across edges. (that's actually an old basic technique, the new thing he does is the diffusion follows contour lines (but is actually 2d, just weighted) and works on all components). It looks pretty decent. It's actually more exciting to me for super-resolution, it looks like it does a pretty good job of image super-resolution.

12-14-08 - Sunday Randoms

My iPods have been locked into certain songs sets for a while now because I just don't want to run iTunes. I moved the location of my mp3 dir a while ago when I got the HTPC, and iTunes is pointing at all the old stuff. There are some hacks around the net about ways to move the dir without reimporting everything, but I tried some and it didn't work, so I just gave up and deleted the iTunes database and am reimporting everything. I'm currently about 30 hours in to the import; it seems near done now...

Having iTunes running and slowing the machine down to a crawl lets you see weird bugs in Explorer. For example, when Explorer redraws the desktop, it actually redraws the whole thing top-left justified with no task bar. Then if your task bar happens to be in the top left, it clears the screen, draws the taskbar, and redraws the icons shifted over by the task bar amount. That is some crazy shitty programming and the kind of thing I always try to be careful about in games. It's a good trick in game development to test your game at 5 fps so that you can see all the one-frame glitches. People often have one-frame bugs in the whole input->action->rendering chain.

It snowed a lot last night. It's bitter cold inside my apartment (around 62 degrees, but it feels colder cuz of the drafts). Apparently neither my girlfriend nor the other tenants think this is a big deal, which has killed my momentum in trying to get the landlord to do anything. On the plus side, I'm surprised at how good the elecriticity in this old building is; I'm running space heaters and have yet to blow a fuse. My place in San Francisco would blow a fuse if the refrigerator happened to start a cooling cycle while I was vacuuming.

Maybe someone can help me find a new digital camera. What I want is :

  • Small point & shoot ; I really can't be bothered to carry around a DSLR , though I want most of the features of one.

  • RAW files. No jpeg. No in-camera noise correction. I want the original.

  • 10 or 12 bit intensity. So I can do software post exposure.

  • Manual controls for focus, exposure, etc. As close to plain old analog cameras as possible.

  • Very fast response time from button pressed to picture recorded; I hate the cameras that take forever (yes I know a lot of that is the autofocus and face recognition and so on).

  • Things I'd like : optical viewfinder (I hate looking at the LCD), very simple single-button switch controls, not a menu system you have to navigate through, long battery life (no LCD is huge for extending battery).

The Panasonic FX 150 looks okay, but it's 15 MP which is worrisome. The LX3 looks semi-ideal.

Also apparently there's this CHDK thing (see also ) which lets you hack any Canon Point and Shoot. Apparently the Canon P&S and the DSLR actually have the same chip, but for the consumer level stuff Canon just disables those features (like 10-bit RAW). Apparently the best for this was the Canon S70 which is not made any more.

I've read at many of the fancy photo places that you should get a physically large CCD (2.3" not 1.8" or 1.4") and that lower megapixels = less noise = better quality photos. That's sort of interesting to me theoretically.

Obviously larger CCD = better because more photons are coming in. Subdividing the CCD into more pixels gives you more resolution, but fewer real photons per pixel. I'm not sure what the right model for noise is, but I think it's pretty close to just constant intensity random noise that's independent from pixel size. That is, the signal measured in each pixel is :


Signal per pixel = # Photons in pixel + Random Noise of intensity N
S = P + N

Your signal-to-noise-ratio per pixel obviously gets better and better when you have fewer pixels, all the way down to the case of just one giant pixel which would be very accurate. However, obviously that needlessly decreases your spatial resolution. On the other hand, if you divide the CCD into too many megapixels you have too little signal per pixel and it becomes hard to tell real signal from the noise.

If we try to maximize the amount of real total information captured, there must be some sweet spot. The total information is like (# pixels) * (information per pixel). Information per pixel is maximum when # of pixels = 1, and it goes down to near zero when N >= P. So as # of pixels increases this must be some curve that goes up then comes back down and has a maximum at some # intermediate of pixels.

Anyhoo, supposedly the 10+ MP cameras make ugly noise. I think a lot of that perception is due to crappy denoising software.

I've read that the "RAW" coming out cameras isn't really raw; they still run denoise and stuff on it. What you really want is a floating point file that just gives you the voltage at each pixel. I'm sure the camera companies don't want to do that because places like "dpreview" would put up sample screens of what the raws coming out without processing would look like and they would look awful. Then some jackass would build in some denoising and their raw would look much better and retarded consumers would buy it because of that.

12-12-08 - Friday Randoms

It's really fucking annoying how macros are not namespaced. I've got the same macros in cblib, Galaxy, Oodle, etc. etc. and sometimes they conflict. Even worse is macros that have the same name but aren't identical; it's standard practice to either just undef the old one or not define the new one and god knows what the side effects of that are. The macro processor is one of the things that would actually be useful to fix in programming languages, instead we get a lot of shitty languages where they just get rid of the preprocessor completely, which is a huge step backward IMO. *Yes* you should make it so that I don't have much reason to use the preprocessor (by making inline functions actually work, by improving metaprogramming, by improving switches on constants and such to not even generate unreachable code) but that doesn't mean you should kill something the user likes.

Using "#ifdef" for toggles is really bad. I've always known that, but I keep doing it. The big problem with it is if you put in the wrong name to check for ifdef, it just returns false with no error. That hits you in two ways, one is just if you put in a typo in your ifdef it will never be true. Perhaps worse is when you change the name of the thing that you are toggling, all the old checks just fail. Of course what you should do is just "#if " for the toggle.

Musicians cover the wrong thing. Listen up musicians, let me straighten you out. They tend to do covers of songs that are good songs to begin with, or that are popular. This is the wrong criterion; you should cover songs that you can make *better*. In particular, songs where doing a different version would show the original song in amusing new light (the semi-comedy cover is okay, like Macha's "Believe" or Luna's "Sweet Child of Mine") , but mainly songs where the original execution was flawed and the true beauty of the song was not expressed through the first performance. Some good examples would be pretty much any Bob Dylan song or any Steve Merritt song because god those guys are awful singers, but they write great songs, so there's a lot of potential for covers. In any case, the new MGMT covers album is the epitome of what not to do. The original versions of the MGMT songs are the perfect expression of those songs. The best a cover can do is come close to the original, but mainly they just leave you unsatisfied and frustrated that you're not just listening to the better original version. (yes, I know that this is mainly due to stupid record companies; the cover albums are just created because the original sold well and the producer is trying to make some money off that)

I can easily be guilted or pressured into doing things. Not because I actually feel guilty or want to please the person (in fact, quite the opposite, when somebody tries to make me feel like I "should" so something it makes me want to *displease* them or just ignore them). Rather it's because I just find it so unpleasant I want them to stop and go away, and usually the easiest way is just to give them what they want.

I don't believe that people who claim to like super-snobby things actually like them. Examples - noise techno like DJ Spooky or Matmos. In movies, for example Igmar Bergman or Akira Kurosawa. With all of those I can recognize the high level of execution of what they're doing, but it's just so unpleasant, I can't imagine actually enjoying it. Anyway, I just assume that when someone says something like "oh, I adore Kurosawa" they're just lying to try to seem cultured.

I just came up with the perfect analogy for Popcap games today. They're like Television. Mildly amusing, somewhat engaging, not at all challenging or surprising, cute with lots of stimulus, but they let you turn off your brain and zone out. Most people love television. Most people love Popcap games.

One of the big problems with the Netflix ratings in practice is still selection bias. For example, the more obscure a movie is, the higher it gets rated, because the only people who rent it are people who probably know it or love it. Really old movies pretty much all have 4-5 star ratings, but that's only because the people who wouldn't like them don't rent them! In fact this isn't exactly a flaw in the ratings, because the ratings do not actually predict what you would think if you saw a movie - they predict how you would rate a movie *if you rented it*. It's a Bayesian kind of problem - their prediction is for the rating that you would make, given that you chose to rent it and rate it.

12/12/2008

12-11-08 - Thursday Randoms

It's supposed to snow over this weekend in a rare freak cold snap. I'm fully expecting total chaos, like this . ADDENDUM : ZOMG WTF I just noticed that video is actually from Seattle (addendum : or not...) ; it's an internet classic, I had no idea it was actually here. LOL Seattle bad-driveaments.


I really want to boycott fucking asshole websites that make you register just to view their content (such as Beeradvocate or Gamasutra or etc etc.) but I usually cave in. To get back at them I usually give a fake name and a junk email address.

Which reminds me, I've started just randomly giving fake info to people. When some fucking bank or stock trader or whatever wants my date of birth for me to open an account, I just tell them a random wrong year close to the right one. You don't need my info, and it's a lot faster to just give them fake bogus info than to refuse. It's satisfying. Even just random times, like when some fucking retailer like Bed Bath or Radio Crap asks you for your zip code, don't bother refusing, just make something up.


Beeradvocate could be really sweet if it let you do a Collaborative Filter, but as is with just the grades, I find it near useless, because I very much disagree with the average rankings. It's biased towards what the beer snobs love, which is shit like the Russian Imperial Stouts and Strong Ales and other such weird foul nonsense that's generally way too high in alcohol and way too fruity and too carbonated. They all secretely love Barleywine and cider and wish they were making. Just make beer. Deschutes knows what I'm talking about. BTW "Dogfish Head" has never made a single good beer.

I hate the beers that sneak-attack drunk your ass (mainly aforementioned snob beers). Like I have two or maybe three and then I feel trashed and I'm like WTF ? And then I look at the bottle and it's like 12% ABV and I'm like "oh snap" I basically just drank a whole bottle of wine unintentionally. Curse you Mr. Sneak Attack Beer.


One of the big differences between me now and me when I was doing "research" is just the patience and ability to stick on one subject for a long time. Back then I would stick on something like arithmetic coding for *months* ; I would gather and read every single paper on the topic. Back then "Teh Interweb" did not really exist so you actually had to go to libraries to photocopy papers; often I would have to request books to be transferred through the interlibrary system so I could get rare stuff like Frank Rubin's arithmetic coder paper (which presaged the modern "range coder"). I might spend weeks just waiting for a paper to come in. I would sit in the garden and write notes about ideas on different algorithms.

Now I just grab all the papers I can easily grab on the web. I read them all in a few hours, mostly just skimming and thinking "junk" or "too obscure" or "ldo, obvious" or "bad writing, bleck I can't bother with this". I work something up really quick that's close enough to state of the art, and then I have to move on, because it's just irresponsible to keep working on some little aspect of my work that isn't driving me towards shipping.


I'm super out of shape right now; I went jogging the other day and coughed up some blood. I went to the gym today and got nauseous and felt ill for an hour afterward. Yeah yeah, I know I'm pushing it too hard too fast, but it's hard for me to comprehend how far I have to dial it back; I was like a ultimate-fighting ex-navy-seal lumberjack, and now I'm like a 70 year old with polio. Working out absolutely sucks when you're out of shape. I can see why all you weak fatties hate it so much. You have to get over a certain hump and then it starts actually feeling good, but until then it's just miserable. When I run right now I picture an elephant running in slow motion. Each step is like a ton of bricks slamming into the earth, and then all my weight sinks down onto that leg and my joints and tendons howl in protest.


Guy LaLiberte' dropped around $3M online in the last few days (around $20M total in 2008). It makes me want to boycott Cirque de Soleil. $150 for a ticket? Outrageous! But I do love me some circus.

Burlesque shows would be pretty damn great if it wasn't always the fat Rosie-O'Donnell-esque women that wanted to be in them. No, I'm sorry, your "sass" does not make up for the fact that you are gross. I'm not asking for stripper type girls; in fact, the *lack* of stripper type girls is what makes burlesque appealing in the first place. I want to see girls who are actually having fun and enjoying the tease show, not soul-crushed strippers. I'm just asking for quirky, real girls that aren't disgusting.


"Why aren't there transvestite girls?"
"Huh?"
"You know, like how transvestite males dress up girls; there are no tranny girls that dress up as men."
"Sure there are, they're called lesbians."


God the winter here is really depressing. It's cold and grey and wet and always dark. I could easily slip into really bad patterns, not sleeping, eating badly, not going out, just staying home and fucking around on the computer. It takes a real force of will to stay positive and sane through the gloom. I mean, my sanity at any time is like a pendulum standing straight up; the slightest breeze can send it into a sudden fall and wild oscillations. Normally the slightest thing can send me into a funk for days (like doing a bad job of parallel parking for example). The winter here is like a gale trying to blow my pendulum over.


I hate the style of writing that has become standard for blogs all around the net. It's sort of smarmy, it usually involves an anecdote in which you mock the people or the event that you are telling about. It's very superior and often contains a "lesson". Even many of my favorite blog writers that I often read are very guilty of this, it is *not* funny or clever or amusing. Stop it. (I guess this is like, ironic, or something).

12/08/2008

12-08-08 - File Hashes

So I'm sticking a file hash in Oodle and thought I'd test some of the stuff out there. Test candidates :

SHA1 (Sean's stb.h implementation)

MD5 (OpenSSL implementation)

BurtleBurtle Lookup3 (hashlittle2)

Cessu's SSE2 hash (+ extra tail code I added)

CRC32

CRC32+32

In all cases I create a 64 bit hash. Hey, it's plenty of bits, it's easier to pass around cuz I have a 64 bit type, and it makes it a fair competition. SHA1 makes 160 bits (= 5 dwords), MD5 makes 128 bits (= 4 dwords), so I use Bob's Mix method to get that down to 64 bits.

A lot of people think SHA1 or MD5 or something is the "right thing" to do for file hashes. That's not really true. Those hashes were designed for cryptography which is not the same purpose. In particular, they are slow *on purpose* because they want to be hard to invert. They also make tons of bits, not because you need tons of bits to tell files apart, but again to make them hard to invert by brute force attack. I don't care about my file hashes being vulnerable to attack, I just want the chance of accidental collisions to be microscopic.

CRC32+32 means doing CRC32 on alternating bytes and jamming them together to make 64 bits. This is not a true "CRC64" but I might refer to it as CRC64 sometimes. (suggestion by "Joe Smith" ; Joe Smith? Is that a pseudonym?)

Just for background, if the 64 bit hashes are "perfect" - that is the 64 bit words coming out of them are random in every bit, even for input that is very non-random - then the chance of collision is indeed microscopic. (in fact you only need maybe 40 bits). The number of items you can hash in B bits is around 2^(B/2) , so B = 32 is not enough bits since 2^16 = 64k and you may in fact run on 64k files. But even at B = 40, 2^20 = 1 Million is a lot, and certainly B = 64, means 2^32 = 4 Billion items before you expect a collision. So, anyway, the point is to test whether these hashes are actually close enough to perfect on real data that they don't generate an unexpected high number of collisions.

I ran these hashes on every file on my hard drive. I threw out files that were exactly equal to some other file so there would be no false collisions due to the files actually being identical. I have 24963 files. I made 2 variants of every file, one by taking a random byte and changing it to a random value, and another variant by flipping 4 random bits. So in total 74853 arrays were hashed.

First the speed numbers :

sha1                 : 48.218018
md5                  : 19.837351
burtle               : 7.122040
Cessu                : 6.370848
crc32+32             : 15.055287
crc32                : 21.550138

These are in clocks per byte. The CRC numbers are a little funny because the CRC32+32 loop is a bit unrolled, but the CRC32 loop goes byte by byte. In any case, even though CRC is very simple, it is not fast, because even unrolled it still works byte by byte and there's a hard data dependency - you have to completely process each byte before you can work on the next byte.

Cessu's hash is only barely faster than Bob's lookup3 even though it uses SSE2 and works on 16 bytes at a time. Bob's hash is really damn good. When I tested it on strings it did not perform well for me because I'm so close to the bone on strings that the rollup & rolldown overhead killed me, but on larger arrays or even long strings, lookup3 kicks butt. ( Bob's page )

So... how many total collisions in the hashes do you think I had? (only testing collisions versus hashes of the same type of course). Remember I tested on 74853 different arrays, made from 24963 files and 24963+24963 more tiny modifications.

One.

One collision. Of course it was in CRC32. None of the 64 bit hashes had any collisions.

I then tried making 8 variants of each file by 8 different random byte jams, so I was running 199704 arrays. Again zero collisions for any 64 bit hash.

So, in an attempt to actually get a collision, I made 10,000,000 test arrays by sprintf'ing the digits from 1 to 10,000,000 , and then tacked on 2 random bytes. (note : you should not test hashes by making random arrays, because any decent hash will return random output bits from random input bits; the test I am interested in is how close the hash output is to random on highly *nonrandom* input). I ran the hashes on all those strings and got :

collisions on 10,000,000 tests :

sha1                 : 0
md5                  : 0
burtle               : 0
Cessu                : 0
crc64                : 0
rand32               : 11,530
crc32                : 11,576

Again none of the 64 bit hashes has any collisions. CRC32 had quite a few of course - but only as many as a 32 bit random number generator !! That means the CRC is in fact performing as a perfect hash. It is perfectly randomizing the nonrandom input.

So, I have no idea which of the 64 bit hashes is "best" in terms of randomizing bits and detecting errors. Obviously if they are actually perfectly making 64 bits, the chance of me ever seeing a collision is tiny. I thought maybe the "crc32+32" might not have 64 real bits of quality and might fail sooner, but it's not bad enough to fail in any kind of real world scenario apparently.

So, anyway, I'm gonna use "lookup3" because it's both super fast, plenty good, and it has the Bob Jenkins seal of approval which means it should actually be "robust".

HOWEVER : SSE4.2 has a CRC32 instruction. If you were in some application where you could rely on having that, then that would definitely be the fastest way to go, and a CRC32+32 appears to be plenty high quality for file identification.

BTW I keep hearing that CRC32 has degeneracy failures on real world data, but I have yet to see it.

12-08-08 - DXTC Summary

I thought I should fix some things that were wrong or badly said in my original DXTC posts :

DXTC Part 1
DXTC Part 2
DXTC Part 3
DXTC Part 4

On the "ryg" coder : there was a bug/typo in the implementation I was using which gave bogus results, so you should just ignore the numbers in those tables. See for correction : Molly Rocket Forum on Ryg DXTC coder . Also I should note he does have some neat code in there. The color finder is indeed very fast; it is an approximation (not 100% right) but the quality is very close to perfect. Also his "RefineBlock" which does the Simon Brown style optimize-end-from-indices is a very clever fast implementation that collapses a lot of work. I like the way he uses the portions of one 32 bit word to accumulate three counters at a time.

I also mentioned in those posts that the optimized version of the Id "real time DXTC" bit math was "wrong". Well, yes, it is wrong in the sense that it doesn't give you exactly the same indeces, but apparently that was an intentional approximation by JMP, and in fact it's a very good one. While it does pick different indeces than the exact method, it only does so in cases where the error is zero or close to zero. On most real images the actual measured error of this approximation is exactly zero, and it is faster.

So, here are some numbers on a hard test set for different index finders :


    exact : err:  31.466375 , clocks: 1422.256522

    id    : err:  31.466377 , clocks: 1290.232239
            diff:  0.000002

    ryg   : err:  31.466939 , clocks:  723.051241
            diff:  0.000564

    ryg-t : err:  31.466939 , clocks:  645.445860
            diff:  0.000564

You can see the errors for all of them are very small. "ryg-t" is a new one I made which uses a table to turn the dot product checks into indexes, so that I can eliminate the branches. Start with the "ryg" dot product code and change the inner loop to :

    const int remap[8] = { 0 << 30,2 << 30,0 << 30,2 << 30,3 << 30,3 << 30,1 << 30,1 << 30 };

    for(int i=0;i < 16;i++)
    {
        int dot = block.colors[i].r*dirr + block.colors[i].g*dirg + block.colors[i].b*dirb;

        int bits =( (dot < halfPoint) ? 4 : 0 )
                | ( (dot < c0Point) ? 2 : 0 )
                | ( (dot < c3Point) ? 1 : 0 );

        mask >>= 2;
        mask |= remap[bits];
    }

I should note that these speed numbers are for pure C obvious implementations and if you really cared about speed you should use SSE and who knows what would win there.


Now this last bit is a little half baked but I thought I'd toss it up. It's a little bit magical to me that Ryg's Mul8Bit (which is actually Jim Blinn's) also happens to produce perfect quantization to 565 *including the MSB shifting into LSB reproduction*.

I mentioned before that the MSB shifting into LSB thing is actually "wrong" in that it would hurt RMSE on purely random data, because it is making poor use of the quantization bins. That is, for random data, to quantize [0,255] -> 32 values (5 bits) you should have quantization bins that each hold 8 values, and whose reconstruction is at the middle of each bin. That is, you should reconstruct to {4,12,20,...} Instead we reconstruct to {0,8,...247,255} - the two buckets at the edges only get 4 values, and there are some other ones that get 9 values. Now in practice this is a good thing because your original data is *not* random - it's far more likely to have exact 0 and 255 values in the input, so you want to reproduce those exactly. So anyway, it's not a uniform quantizer on the range [0,255]. In fact, it's closer to a uniform quantizer on the range [-4,259].


I think it might actually just be a numerical coincidence in the range [0,255].

The correct straight-forward quantizer for the DXTC style colors is

    return (32*(val+4))/(256+8);

for R.  Each quantization bin gets 8 values except the top and bottom which only get 4.  That's equivalent to quantizing the range [-4,256+4) with a uniform 8-bin quantizer.

Now

1/(256 + 8) = 1/256 * 1/(1 + 8/256)

We can do the Taylor series expansion of 1/(1+x) for small x on the second term and we get ( 1 - 8/256 + 64/256/256) up to powers of x^2

So we have

    ( (32*val+128) * ( 1 - 8/256 + 64/256/256) ) >> 8

And if we do a bunch of reduction we get 

    return ( 31*val+124 + ((8*val+32)>>8) ) >> 8

If we compare this to Mul8bit :

        return ( 31*val+128 + ((val*31 + 128)>>8)) >> 8;

it's not exactly the same math, but they are the same for all val in [0,255]

But I dunno. BTW another way to think about all this is to imagine your pixels are an 8.inf fixed point rep of an original floating point pixel, and you should replicate the 8 bit value continuously. So 0 = 0, 255 = FF.FFFFFF.... = 1.0 exactly , 127 = 7F.7F7F7F..

BTW this reminds me : When I do Bmp -> FloatImage conversions I used to do (int + 0.5)/256 , that is 0 -> 0.5/256 , 255 -> 255.5/256 . I no longer do that, I do 0->0, and 255 -> 1.0 , with a 1/255 quantizer.

12/05/2008

12-05-08 - lrotl

Well I found one x86 ASM widget. I've always known you could do nice fast barrel rotations on x86 but thought they were inaccessible from C. Huzzah! Stdlib has a function "_lrotl()" which is exactly what you want, and happily it is one of the magic functions the MSVC recognizes in their compiler and turns into assembly with all goodness. (They also have custom handling for strcmp, memcpy, etc.)

Oh, I noticed lrotl in OpenSSL which seems to have a ton of good code for different hashes/checksums/digests/whatever-the-fuck-you-call-them's.

As a test I tried it on Sean's hash, which is quite good and fast for C strings :


RADINLINE U32 stb_hash(const char *str)
{
   U32 hash = 0;
   while (*str)
   {
      hash = (hash << 7) + (hash >> 25) + *str++;
   }
   return hash;
}

RADINLINE U32 stb_hash_rot(const char *str)
{
   U32 hash = 0;
   while (*str)
   {
      hash = _lrotl(hash,7) + *str++;
   }
   return hash;
}

stb_hash : 6.43 clocks per char
stb_hash_rot : 3.24 clocks per char

Woot! Then I also remembered something neat I saw today at Paul Hsieh's Assembly Lab . A quick check for whether a 32 bit word has any null byte in it :

#define has_nullbyte(x) (((x) - 0x01010101) & ( ~(x) & 0x80808080 ) )

Which can of course be used to make an unrolled stb_hash :

RADINLINE U32 stb_hash_rot_dw(const char *str)
{
   U32 hash = 0;
   
   while ( ! has_nullbyte( *((U32 *)str) ) )
   {
      hash = _lrotl(hash,7) + str[0];
      hash = _lrotl(hash,7) + str[1];
      hash = _lrotl(hash,7) + str[2];
      hash = _lrotl(hash,7) + str[3];
      str += 4;
   }
   while (*str)
   {
      hash = _lrotl(hash,7) + *str++;
   }
   return hash;
}

stb_hash_rot_dw : 2.50 clocks

So anyway, I'm getting distracted by pointless nonsense, but it's nice to know lrotl works. (and yes, yes, you could be faster still by switching the hash algorithm to something that works directly on dwords)

12-05-08 - 64 Bit Multiply

Something that I've found myself wanting to do a lot recently is multiply two 32 bit numbers, and then take the top 32 bit dword from the 64 bit result. In C this looks like :

U32 mul32by32returnHi(U32 a,U32 b)
{
    U64 product = (U64) a * b;
    U32 ret = (U32)(product >> 32);
    return ret;
}

That works fine and all, but the C compiler doesn't understand that you're doing something very simple. It generates absolutely disgusting code. In particular, it actually promotes a & b to 64 bit, and then calls a function called "_allmul" in the Windows NT RTL. This allmul function does a bunch of carries and such to make 64-by-64 bit multiply work via the 32+32->64 multiply instruction in x86. You wind up with a function that takes 60 clocks when it could take 6 clocks :

U32 mul32by32returnHi(U32 a,U32 b)
{
    __asm
    {
        mov eax, a
        mul b
        mov eax,edx
    }
}

Now, that's nice and all, the problem is that tiny inline asm functions just don't work in MSVC these days. Calling a big chunk of ASM is fine, but using a tiny bit of ASM causes the optimizer to disable all its really fancy optimizations, and you wind up with a function that's slower overall.

What I'd like is a way to get at some of the x86 capabilities that you can't easily get from C. The other thing I often find myself wanting is a way to get at the carry flag, or something like "sbb" or "adc". The main uses of those are Terje's cool tricks to get the sign of an int or add with saturation

It would be awesome if you could define your own little mini assembly functions that acted just like the built-in "intrinsics". It would be totally fine if you had to add a bunch of extra data to tell the compiler about dependencies or side effects or whatever, if you could just make little assembly widgets that worked as neatly as their own intrinsics it would be totally awesome for little ops.

ADDENDUM : urg ; so I tested Won's suggestion of Int32x32To64 and found it was doing the right thing (actually UInt32x32To64 - and BTW that's just a macro that's identical to my first code snippet - it doesn't seem to be a magic intrinsic, though it does have platform switches so you should probably use that macro). That confused me and didn't match what I was seeing, so I did some more tests...

It turns out the first code snippet of mul32by32returnHi actually *will* compile to the right thing - but only if you call it from simple functions. If you call it from a complex function the compiler gets confused, but if you call it from a little tight test loop function it does the right thing. URG.

Here's what it does if I try to use it in my StringHash interpolation search test code :


            int start = mul32by32returnHi( query.hash, indexrange );
004087A5  xor         eax,eax 
004087A7  push        eax  
004087A8  mov         eax,dword ptr [esp+38h] 
004087AC  push        eax  
004087AD  push        0    
004087AF  push        ebx  
004087B0  mov         dword ptr [esi],ebx 
004087B2  call        _allmul (40EC00h) 
004087B7  mov         ecx,dword ptr [esp+14h] 

You can see it's extending the dwords to 64 bits, pushing them all, calling the function, then grabbing just one dword from the results. Ugh.

And here's what it does in a tight little test loop :


      U32 dw = *dwp++;
      hash = mul32by32returnHi(hash,dw) ^ dw;
00401120  mov         eax,ecx 
00401122  mul         eax,edx 
00401124  add         esi,4 
00401127  xor         edx,ecx 
00401129  mov         ecx,dword ptr [esi] 

Which not only is correct use of 'mul' but also has got crazy reordering of the loop which makes it a lot faster than my inline ASM version.

old rants