4/30/2009

04-30-09 - Sex Roles

I certainly don't romanticize the historical oppression of women (neither the near-slavery of long ago or the social repression of 1950's America). However, there is something romantic and appealing about complementarity. Complementarity is the idea that you and your lover should be very different, have different skills, and together make a whole. It makes you value the other person, it makes you need them, and it lets you be better as a whole than you could be as individuals, because you can specialize and spend more time in each of your abilities. Plus there are many personality traits that are very useful but hard to have at the same time. For example, one of you could be very intellectual and rational and serious, the other could be very emotional and carefree. Those are both wonderful useful traits, but if you try to combine them you just sort of water them down and make them worse. One of you could be tough and demanding and aggressive, the other could be sweet and friendly.

In the past lots of people were forced into marriage by their sheer inability to survive alone. Women would have a hell of a hard time supporting themselves, and the men were widely incapable of doing basic things like dressing themselves or making a sandwich. That's a very strong bond; you see it still with old couples who despise each other, but have to stick together because the man doesn't know how to run a washing machine and the woman doesn't know how to drive.

Of course I think of people now as being pretty independent, but a lot of them are still totally incompetent. I'm amazed how many men still can't do basic cleaning or cooking (I'm sure they quite willfully keep themselves ignorant because they don't want to have to do that stuff). Lots of girls still couldn't get a decent job if they had to, because they were Communications majors or something and have zero skills. And of course all those suburban princess girls are just completely worthless all around, they can neither cook nor clean nor do any decent work. They only thing they know how to do is put on make up, wait tables, and give blow jobs.

Relationships without artificial glue are very difficult to sustain. As I've already mentioned, one form of strong glue is if you're incompetent in some way and simply can't live apart (or societal repression as still exists in much of the world). Another form of strong glue is children. Even if you aren't fully in the mode of "we hate each other but we'll stay together for the children", children are still strong glue in that it simply gives you something that you both care about and have to take care of all the time; it gives you a common mission, and it also gives you a focus of grief that's outside of each other.

Another form of strong glue is being a loser. By being a loser I mean the fear of being single or the belief that you can't do better, or inexperience dating. If you marry your highschool sweetheart, you're terrified of having to get back out there and date. You believe maybe this is the only person you can ever find, you're afraid of what single life would be like. That's a strong glue.

Modern society celebrates independence and consciously choosing a mate - that is, being in a relationship without all that glue. But with none of that glue your only reason to be together is because you think the life with this person is better than any other life you could have, and that is a very hard bar to meet all the time.

So, my advice to someone who wants to have a strong relationship : 1. make yourself incompetent, if you're a man never learn how to cook or clean, have your mom always pick your outfits for you. 2. marry the first girl you date or have sex. 3. have kids right away. 4. don't let your wife go to college or learn any skills.

4/29/2009

04-29-09 - QCD

Someone made me think briefly about QCD (Quantum Chromo Dynamics).

I never really got my head around the standard model in grad school. I think I understood QED pretty well, and Weak isn't really that bad either, but then you get into QCD and the maths gets really tough and there's this sea of particles and I had no idea what's going on. Part of the problem is that a lot of the texts go through a historical perspective and teach you the stages of understanding and the experiments that led to modern QCD. I think that's a big mistake and I discourage anyone from reading that. I was always really confused by all the talk of the various mesons and baryons. Often the classes would start with talking about K+ transitions or pion decay or scattering coefficients for "Omegas" and I'd be like "WTF are these particles and who cares what they do?".

I think it's way better just to say "we have quarks and gluons". And yes, the quarks can combine together into these various things, but we don't even really need to talk about them because nobody fucking cares about what exactly the meson made from (strange-antistrange) is called.

I much prefer a purely modern approach to QFT based on symmetry. In particular I really like Weinberg's approach in his textbook which is basically - we expect to observe every phenomenon in the universe which is *possible* to exist. If something is possible but doesn't ever happen, that is quite strange and we should wonder why. In particular with QFT - every possible Lagrangian which leads to a consistent theory should correspond to something in nature. When you start to write these down it turns out that very few are actually possible (given a few constraints, such as the postulate that relativity is required, etc.).

Anyway, I was never really happy with my intuition for QFT. Part of the problem is the math is just so hard, you can't do a lot of problems and get really comfortable with it. (David Politzer at Caltech once gave me a standard model homework problem to actually compute some real scattering coefficients that had been experimentally tested. It took me about 50 pages and I got it horribly wrong).

The whole gauge-field symmetry-group idea seems like it should be very elegant and lead to some intuition, but I just don't see it. You can say hand wavey things, like : electromagnetism is the presence of an extra U(1) symmetry; you can think of this as an extra circular dimension that's rolled up tiny so it has no spatial size, or if you like you can do the Feynman way and say that everything flying around is a clock that is pointing in some direction (that's the U(1) angle). In this picture, the coupling of a "charge" to the field is the fact that the charge distorts the U(1) dimension. If you're familiar with the idea of general relativity where masses distort spacetime and thus create the gravity force, it's the same sort of thing, but instead of distorting spacetime, charge distorts the U(1) fiber. As charges move around in this higher-D space, if they are pushed by variation of the U(1) fiber clock angle, that pushes them in real space, which is how they get force. Charges are a pole in the curvature of the fiber angle; in a spacetime sense it's a pinched spot that can't be worked out by any stretching of the space fabric. Okay this is sort of giving us a picture, but it's super hand wavey and sort of wrong, and it's hard to reconcile with the real maths.

Anyway, the thing I wanted to write about QCD is the real problem of non-perturbative analysis.

When you're taught QED, the thing people latch onto are the simple Feynman diagrams where two electrons fly along and exchange a photon. This is appealingly classical and easy to understand. The problem is, it's sort of a lie. For one thing, the idea that the photon is "thrown" between the electrons and thus exchanges momentum and forces them apart is a very appealing picture, but kind of wrong, since the photon can actually have negative momentum (eg. for an electron and positron, the photon exchanged between them pulls them together, so the sort of spacemen playing catch kind of picture just doesn't work).

First of all, let's back up a bit. QFT is formulated using the sum of all complex exponential actions mechanism. Classically this would reduce to "least action" paths, which is equivalent to Lagragian classical mechanics. There's a great book which teaches ordinary Quantum Mechanics using this formulation : Quantum Mechanics and Path Integrals by Feynman & Hibbs (this is a serious textbook for physics undergrads who already know standard QM ; it's a great bridge from standard QM to QFT, because it introduces the sum-on-action formalism in the more familiar old QM). Anyway, the math winds up as a sum of all possible ways for a given interaction to happen. The Feynman diagram is a nice way to write down these various ways and then you still integrate over all possible ways each diagram can happen.

Now let's go back to the simple QED diagram that I mentioned. This is often shown as your first diagram, and you can do the integral easily, and you get a nice answer that's simple and cute. But what happened? We're supposed to sum on *all* ways that the interaction can happen, and we only did one. In fact, there are tons of other possibilities that produce the same outcome, and we really need to either sum them all, or show that they are small.

One thing we need to add is all the ways that you can add vacuum -> vacuum graphs. You can make side graphs that start from nothing, particles pop out of the vacuum, interact, then go back to the vacuum. These are conveniently not mentioned because if you add them all up they have an infinite contribution, which would freak out early students. Fortunately we have the renormalization mechanism that sweeps this under the rug just fine, but it's quite complex.

The other issue is that you can add more and more complex graphs; instead of just one photon exchange, what about two? The more complex graphs have higher powers of the coupling constant (e in this case). If the coupling constant is small, this is like a Taylor expansion, each term is higher powers of e, and e is small, so we can just go up to 3rd order accuracy or whatever we want. The problem with this is that even when e is small, as the graphs get more complex there are *more* of them. As you allow more couplings, there are more and more ways to make a graph of N couplings. In order for this kind of Taylor expansion to be right, the number of graphs must go up more slowly than 1/e. Again it's quite complex to prove that.

Starting with a simple problem that we can solve exactly, and then adding terms that make us progressively more accurate is the standard modus operandi in physics. Usually the full system is too hard to solve analytically, and too hard to get intuition for, so we rely on what's called a perturbation expansion. Take your complex system that you can't solve, and expand it into Simple + C * Complex1 + C^2 * Complex2 + ... - higher and higher powers of C, which should be small.

And with QCD we get a real problem. Again you can start with a simple graph of quarks flying along passing gluons. First of all, unlike photons, there are gluon-gluon couplings which means we need to add a bunch more graphs where gluons interact with other gluons. Now when we start adding these higher order terms, we have a problem. In QCD, the coupling constant is not small enough, and the number of graphs that are possible for each order of the coupling constant is too high - the more complex terms are not less important. In fact in some cases, they're *more* important than the simpler terms.

This makes QCD unlike any other field theory. Our sort of classical intuition of particles flying around exchanging bosons completely breaks down. Instead the quarks live in a foaming soup of gluons. I don't really even want to describe it in hand wavey terms like that because any kind of picture you might have like that is going to be wrong and misleading. Even the most basic of QCD problems is too hard to do analytically; in practice people do "lattice QCD" numerical computations (in some simple cases you can do the summations analytically and then take the limit of the lattice size going to zero).

The result is that even when I was doing QFT I never really understood QCD.

4/28/2009

04-28-09 - Quadratic

I'm doing a little refinement of my old cubic interpolator ("Smooth Driver") thing. (see also : here and here and here ).

One thing I'm trying to do is fix up all the nasty epsilon robustness issues. A small part of that is solving a quadratic. "Easy!" I hear you say. Everyone knows how to solve a quadratic, right? Not so.

I found this page which has a nice summary of the issues, written by a sour old curmudgeon who just whines about how retarded we all are but doesn't actually provide us with a solution.

You can also find the Wikipedia page or the Numerical Recipes (5.6) snippet about the more robust numerical way to find the roots that avoids subtracting two nearly identical numbers. Okay, that's all well and good but there's a lot more code to write to deal with all the degenerate cases.

This is what I have so far : (I'm providing the case where the coefficients are real but the solutions may be complex; you can obviously modify to complex coefficients or only real solutions)


// A t^2 + B t + C = 0;
// returns number of solutions
int SolveQuadratic(const double A,const double B,const double C,
                    ComplexDouble * pT0,ComplexDouble * pT1)
{
    // first invalidate :
    *pT0 = FLT_MAX;
    *pT1 = FLT_MAX;
        
    if ( A == 0.0 )
    {
        if ( B == 0.0 )
        {
            if ( C == 0.0 )
            {
                // degenerate - any value of t is a solution
                *pT0 = 0.0;
                *pT1 = 0.0;
                return -1;
            }
            else
            {       
                // no solution
                return 0;
            }
        }
        
        double t = - C / B;
        *pT0 = t;
        *pT1 = t;
        return 1;
    }
    else if ( B == 0.0 )
    {
        if ( C == 0.0 )
        {
            // A t^2 = 0;
            *pT0 = 0.0;
            *pT1 = 0.0;
            return 1;
        }
        
        // B is 0 but A isn't
        double discriminant = -C / A;
        ComplexDouble t = ComplexSqrt(discriminant);
        *pT0 = t;
        *pT1 = - t;
        return 2;
    }
    else if ( C == 0.0 )
    {
        // A and B are not zero
        // t = 0 is one solution
        *pT0 = 0.0;
        // A t + B = 0;
        *pT1 = -B / A;
        return 2;
    }

    // Numerical Recipes 5.6 : 

    double discriminant = ( B*B - 4.0 * A * C );
    
    if ( discriminant == 0.0 )
    {
        double t = - 0.5 * B / A;
        *pT0 = t;
        *pT1 = t;
        return 1;
    }
    
    ComplexDouble sqrtpart = ComplexSqrt( discriminant );
    
    sqrtpart *= - 0.5 * fsign(B);
    
    ComplexDouble Q = sqrtpart + (- 0.5 * B);
    
    // Q cannot be zero
    
    *pT0 = Q / A;
    *pT1 = C / Q;
        
    return 2;
}

One thing that is missing is refinement of roots by Newton-Raphson. The roots computed this way can still have large error, but gradient descent can improve that.

4/27/2009

04-27-09 - Getting Old

Last weekend we went to the arboretum, and I was being retarded as usual and jumping around. I jumped right in a mud puddle and slipped and landed with my knee on a rock. It hurt a bit at the time, not a huge deal, but my knee has been bruised and swollen for the whole last week. Even minor injuries seem to last so long and be so crippling now.

I can't stand the music the kids listen to these days. There have always been youthy genres I've hated (like rap-rock and all the mainstream pop), but one that really blows my mind these days is the "pop-punk" that the kiddies seem to love. Stuff like : crap or ass or dear god .

I think all the piercings and tattoos and such are ugly, unsanitary, and completely non-unique and not rebellious. Kids seem to love it.

I think twitter and facebook and myspace and cell phones and texting and all that is just an awful waste of time that never conveys any real information or interesting conversation. It's just a masturbaturoy attempt to reassure each other and feel connected, but it's not a real connection to anything natural.

I still get a paper newspaper, think that blogs are an awful place to get news, and that reading on monitors is extremely unpleasant.

Girls my age just look really old to me, all wrinky and saggy and dried up, like a bunch of bones flopping around inside an uninflated balloon. But girls that are young seem like retarded aliens.

I think text-message abbreviations should be used only when texting, and even then used as little as possible. I think eloquent speech and good grammar and to be strived for at all times; though I often fail, I would never write "srsly" or "imo" unless I was being ironic.

I'm getting old and it fucking sucks.

4/24/2009

04-24-09 - Convex Hulls and OBB's

I wrote last month a bit about OBB fitting. I mentioned at the time that it would be nice to have an implementation of the exact optimal OBB code, and also the bounded-best OBB in reasonable time. I found the Barequet & Har-Peled work on this topic but didn't read it at the time.

Well, I finally got through it. Their paper is pretty ugly. Let me briefly explain their method :

They, like my old OBB stuff, take heavy advantage of the fast rotating-calipers method to find the optimal rectangle of a convex hull in 2d (it's O(n)). Also finding the convex hull is O(nlogn). What that means is, given one axis of an OBB, you can find the optimal other two axes in O(nlogn). So the problem just comes down to finding one of the optimal axes.

Now, as I mentioned before, the number of actual axes you must consider to be truly optimal is O(n^2) , making your total run time O(n^3). These axes are the face normals of the convex hull, plus axes where the OBB is supported by two edges and a vert (I don't know an easy way to even enumerate these).

It's now pretty well known around the industry that you can get a very good OBB by just trying a bunch of scattered initial axes instead of doing O(n^2). If you try some fixed number of axes, like say 256, it doesn't count against your big-O at all, so your whole OBB fit is still O(nlogn).

Well this is exactly what the Barequet & Har-Peled method is. They try a fixed number of directions for the seed axis, then do the rectangle fit for the other two axes. The main contribution of the paper is the proof that if you try enough fixed directions, you can get the error of the bbox within whatever tolerance you want. That's sort of intuitively obvious - if you try more and more fixed directions you must get closer and closer to the optimal box. Their construction also provides a specific method for enumerating enough directions.

Their enumeration goes like this :

Start with some seed box S. They use the box that is made from taking one axis to be the "diameter" of the point set (the vector between the two most seperated points). Using that box is important to their proof, but I don't think which seed box you use is actually terribly important in practice.

The seed box S has normalized edge vectors S.x , S.y, S.z (the three axes that define the box).

Enumerate all sets of 3 (non-negative) integers whose sum is <= K , that is {i,j,k} such that (i+j+k) <= K

Construct the normal N = i * S.x + j * S.y + k * S.z ; normalize it, and use this as the direction to fit a new OBB. (note that there are a lot of points {ijk} that generate the same normal - any points that are integer multiples of another; those can be skipped).

Barequet & Har-Peled prove that the OBB made this way is within (1/K) to some power or other of optimal, so as you increase K you get ever closer.

Now, this is almost identical to my old "OptimalOBBFixedDirections" which tried various static directions and then optimized the box from there. My OptimalOBBFixedDirections always tests 42 directions in the {+,+,+} octant which I made by subdividing an octahedron. I have found that the Barequet & Har-Peled method does in fact find better boxes with fewer tests, but the difference is very very small (thousandths of a percent). I'll show numbers in a second.

First I want to mention two other things.

1. There's a "common wisdom" around that net that while it is bad to make an OBB from the covariance matrix of the *points* , it is good to make an OBB from the covariance matrix of the *volume*. That is, they claim if you have a closed mesh, you can use the Mirtich (or Eberly or Blow/Binstock) method to compute the covariance matrix of the solid body, and use that for your OBB axes.

I have seen no evidence that this is true. Yes, the covariance matrix of the points is highly dependent on the tesselation, while the covariance matrix of the solid is more a property of the actually shape of the object, so that is intuitively pleasing. In practice it appears to be completely random which one is actually better. And you just shouldn't use the covariance matrix method anyway.

2. Barequet & Har-Peled mention the iterative refinement of OBB's using the caliper-fit. This is something I've known a while, but I've never seen it published before; I think it's one of those gems of wisdom that lots of people know but don't consider worth a paper. They mention it almost in passing, but it's actually perhaps the most valuable thing in their whole paper.

Recall if you have one axis of the OBB fixed, you can easily find the optimal directions of the other two axes using rotating calipers to fit a rectangle. The thing is, once you do that, you can then hold one of those new axes fixed, and fit the other two. So like, fix X, then caliper to get YZ, then fix Y, and caliper to get XZ. Each step of the iteration either improves your OBB or does nothing. That means you descend to a local minimum in a finite number of steps. (in practice I find you usually get there in only 3 steps, in fact that might be provable (?)).

Assuming your original seed box is pretty close to optimal, this iteration is kind of like taking your OBB and trying to spin it along one axis and pinch it to see if you get a tighter fit; it's sort of like wiggling your key as you put it into a lock. If your seed OBB is close to being right, but isn't supported by one of the necessary support conditions, this will wiggle it tighter until it is supported by a face or an edge pair.

The methods shown in the test below are :


True convex hull (within epsilon) :
    note that area optimization is usually what you want
    but volume shows a bigger difference between methods

Hull simplificiation :

    I simplify the hull by just doing PM on the triangles
    Then convert the triangles to planes
    Push the planes out until all points are behind them
    Then clip the planes against each other to generate new faces
    This is a simpler hull that strictly contains the original hull

k-dop :
    Fits 258 planes in fixed directions on the sphere
    Just pushes each plane to the edge of the point set
    Clips them all against each other
    strictly this is O(n) but in practice it's slower than the true convex hull
        and much worse quality
    seems pointless to me

The rating we show on all the OBB's is surface area

AxialOBB  :
    axis-aligned box

OBBByCovariance :
    vertex covariance matrix sets axes

OBBByCovariance+it :
    OBBByCovariance followed by iterative greedy optimization

OBBByCovarianceOptimized :
    like OBBByCovariance+it, but tries all 3 initial fixed axes

OptimalOBBFixedDirections :
    tries 42 fixed directions

OptimalOBBFixedDirections+it :
    tries 42 fixed directions, takes the best, then optimizes

OptimalOBBFixedDirections opt :
    tries 42 fixed directions, optimizes each one, then takes the best

OBBGoodHeuristic :
    takes the best of OBBByCovarianceOptimized and "OptimalOBBFixedDirections opt"

OBBGivenCOV :
    this is OBBByCovarianceOptimized but using the solid body covariance instead of points  

OptimalOBB :
    tries all face normals of the convex hull (slow)
    I'd like to also try all the edge-support directions here, but haven't figured it out

OptimalOBBBarequetHarPeled 5     
 kLimit : 5 , numBuilds : 19
    BarequetHarPeled method with (i+j+k) <= 5
    causes it to try 19 boxes
    optimizes each one, then picks the best
    very similar to "OptimalOBBFixedDirections opt"

And the results :


-----------------------------------------

dolphin.x :

    Made Hull with 206 faces
    hull1 volume : 1488557 , area : 95330

Making hull from k-dop planes...
    Made Hull with 142 faces
    hull2 volume : 2081732 , area : 104951

Making OBB...
    AxialOBB                         : 193363.109
    OBBByCovariance                  : 190429.594
    OBBByCovariance+it               : 179504.734
    OBBByCovarianceOptimized         : 179504.719
    OptimalOBBFixedDirections        : 181693.297
    OptimalOBBFixedDirections+it     : 181693.297
    OptimalOBBFixedDirections opt    : 176911.750
    OBBGoodHeuristic                 : 179504.719
    OBBGivenCOV                      : 178061.406
    OptimalOBB                       : 176253.359

     kLimit : 3 , numBuilds : 3
    OptimalOBBBarequetHarPeled 3     : 179504.703
     kLimit : 5 , numBuilds : 19
    OptimalOBBBarequetHarPeled 5     : 178266.047
     kLimit : 10 , numBuilds : 160
    OptimalOBBBarequetHarPeled 10    : 176508.109
     kLimit : 20 , numBuilds : 1222
    OptimalOBBBarequetHarPeled 20    : 176218.344
     kLimit : 50 , numBuilds : 18037
    OptimalOBBBarequetHarPeled 50    : 176116.156

-----------------------------------------

teapot.x :

hull1 faces : 612
    hull1 volume : 3284935 , area : 117470

simplified hull2 faces : 366
    hull2 volume : 3384222 , area : 120357

Making hull from k-dop planes...
    Made Hull with 234 faces
    hull2 volume : 3761104 , area : 129271

Making OBB...
    AxialOBB                         : 253079.797
    OBBByCovariance                  : 264091.344
    OBBByCovariance+it               : 222514.219
    OBBByCovarianceOptimized         : 220723.844
    OptimalOBBFixedDirections        : 219071.703
    OptimalOBBFixedDirections+it     : 218968.844
    OBBGoodHeuristic                 : 218968.844
    OptimalOBB                       : 218968.844
    OBBGivenCOV                      : 220762.766

     kLimit : 3 , numBuilds : 3
    OptimalOBBBarequetHarPeled 3     : 220464.766
     kLimit : 5 , numBuilds : 19
    OptimalOBBBarequetHarPeled 5     : 219540.203
     kLimit : 10 , numBuilds : 160
    OptimalOBBBarequetHarPeled 10    : 218968.000
     kLimit : 20 , numBuilds : 1222
    OptimalOBBBarequetHarPeled 20    : 218965.406
     kLimit : 50 , numBuilds : 18037
    OptimalOBBBarequetHarPeled 50    : 218963.109

-----------------------------------------

Some highlights :

OBBByCovariance is quite bad. OptimalOBBFixedDirections is the only other one that doesn't do the iterative optimization, and it can be bad too, though not nearly as bad.

Any of the methods that do the iterative optimization is perfectly fine. The differences are very small.

"OptimalOBBBarequetHarPeled 7" does about the same number of tests as "OptimalOBBFixedDirections opt" , and it's very microscopically better because of the way the directions are distributed.

OBBGivenCOV (the solid mass covariance) is worse than OBBByCovarianceOptimized (point covariance) on teapot.

Also - the Convex Hull simplification thing I did was just pulled out of my ass. I did a quick Google to see if I could find any reference, and I couldn't find any. I'm surprised that's not a solved problem, it seems like something right up the Geometers' alley.

Problem : Find the convex bounding volume made of N faces (or N planes) that strictly encloses the original mesh, and has minimum surface area (or volume).

In general, the optimal N-hull can not be reached by greedy simplification from the full-detail convex hull. In practice I found my hacky PM solution to work fine for moderate simplification levels. To make it more correct, the "push out to enclose" step should be done in each PM collapse to keep the hull valid as you go (instead of at the end). Also the PM collapse metric should be the metric you are trying to optimize - surface area or volume (I just used my old geometric error collapser).

The main thing I was interested in with convex hull simplification was eating away highly tesselated bits. The mesh I mainly tested on was "fandisk" because it's got these big flat surfaces, and then some rounded bits. If you imagine a mesh like a big cube minowski summed with a small sphere, you get a cube with rounded edges and corners. If the sphere is highly tessleated, you can get a hull with tons and tons of faces, but they are very unimportant faces. You want to sort of polygonate those corners, replaces the rounded sphere with a less tesselated one that's pushed out.

4/23/2009

04-23-09 - The Prurient Underbelly

Our cities are full of "massage parlors" that offer prostitution with a blatant store front and ads in the paper.

I wrote before about how every girl in LA does porn .

There are some widely spread numbers about $10 billion in porn movies per year, or 800 million rentals per year, or the number of porn movies that hotel guests rent. Unfortunatley those numbers are just made up ; (it's funny that even in his mea culpa about making up numbers he goes on to just make up other numbers out of thin air. good reporting, bub).

I was thinking about this because a few Sundays ago there was an article in the NYT Magazine about "SeekingArrangement.com". I didn't think much of it at first. SeekingArrangement is an online dating site like match or whatever, but it's specifically design for hooking up rich older men (often married) with young girls. It does toe the line of prostitution because it explicitly talks about dollar rates and what the girls would be willing to do. Still, these kind of arrangements have been around forever, I do find it rather disgusting, but it's not surprising.

But the sheer numbers are a bit surprising. They claim 300,000 girls have posted profiles. I assume they are exaggerating somewhat and some of the accounts are fake. Let's do some fake numbers like last time to get a surprising conclusion.

I assume most of the girls registered are in the age range 20-30. There are around 20M people in the US in the age range 20-30 (this number is correct BTW). About half of those are girls, so around 10 M. If the 300k girls on SeekingArrangement are from the US, that's 3% of the female population (!).

A more accurate number : there are around 200,000 girls in WA state in the target age group. There are around 2000 girls on SeekingArrangement from WA state. That's 1%. Still a rather high number.

Not really related, but also :

1. STOP TATTOOING YOURSELF !! My god, a whole generation of girls is ruining themselves. Fortunately I found a hot girl with no disgusting dark-green graffiti that breaks up the natural lovely smooth lines of the female body, but I still have to look at it on models and such. WTF. Oh yes, I know what would go great with this peach skin, this graceful natural arc of flesh and muscle, this magical body that is firm and yet soft, this curve that speaks right to my gut - a fucking dark green dolphin drawn by some high school dropout. Quit it.

2. Angelina Jolie is fucking revolting. She looks so bizarre, she's had so much plastic surgery, she looks like an alien. I'm so sick of her being used as the modern standard of beauty. She's the Garbo or Marilyn Monroe or Sofia Loren or whoever of today, and that just says volumes about the fucking awful taste of modern man. I hate that so many girls are trying to look like her and get that fucking punched-in-the-mouth lip injection I-cant-even-talk-right-cuz-my-lips-are-so-swollen.

04-23-09 - Telling Time

.. telling time is a huge disaster on windows.

To start see Jon Watte's old summary that's still good .

Basically you have timeGetTime() , QPC, or TSC.

TSC is fast (~ 100 clocks) and high precision. The problems I know of with TSC :

TSC either tracks CPU clocks, or time passing. On older CPUs it actually increments with each cpu cycle, but on newer CPUs it just tracks time (!). The newer "constant rate" TSC on Intel chips runs at some frequency which so far as I can tell you can't query.

If TSC tracks CPU cycles, it will slow down when the CPU speedsteps. If the CPU goes into a full sleep state, the TSC may stop running entirely. These issues are bad on single core, but they're even worse on multi-proc systems where the cores can independently sleep or speedstep. See for example these linux notes or tsc.txt .

Unfortunately, if TSC is constant rate and tracking real time, then it no longer tracks cpu cycles, which is actually what you want for measuring performance (you should always report speeds of micro things in # of clocks, not in time).

Furthermore on some multicore systems, the TSC gets out of sync between cores (even without speedsteps or power downs). If you're trying to use it as a global time, that will hose you. On some systems, it is kept in sync by the hardware, and on some you can get a software patch that makes rdtsc do a kernel interrupt kind of thing which forces the TSC's of the cores to sync.

See this email I wrote about this issue :

Apparently AMD is trying to keep it hush hush that they fucked up and had to release a hotfix. I can't find any admission of it on their web site any more ;

this is the direct download of their old utility that forces the cores to TSC sync : TscSync

they now secretly put this in the "Dual Core Optimizer" : Dual Core Optimizer Oh, really AMD? it's not a bug fix, it's an "optimizer". Okay.

There's also a seperate issue with AMD C&Q (Cool & Quiet) if you have multiple cores/processors that decide to clock up & down. I believe the main fix for that now is just that they are forbidden from selecting different clocks. There's an MS hot fix related to that : MS hotfix 896256

I also believe that the newest version of the "AMD Processor Driver" has the same fixes related to C&Q on multi-core systems : AMD Driver I'm not sure if you need both the AMD "optimizer" and processor driver, or if one is a subset of the other.

Okay, okay, so you decide TSC is too much trouble, you're just going to use QPC, which is what MS tells you to do anyway. You're fine, right?

Nope. First of all, on many systems QPC actually is TSC. Apparently Windows evaluates your system at boot and decides how to implement QPC, and sometimes it picks TSC. If it does that, then QPC is fucked in all the ways that TSC is fucked.

So to fix that you can apply this : MS hotfix 895980 . Basically this just puts /USEPMTIMER in boot.ini which forces QPC to use the PCI clock instead of TSC.

But that's not all. Some old systems had a bug in the PCI clock that would cause it to jump by a big amount once in a while.

Because of that, it's best to advance the clock by taking the delta from previous and clamping that delta to be in valid range. Something like this :


U64 GetAbsoluteQPC()
{
    static U64 s_lastQPC = GetQPC();
    static U64 s_lastAbsolute = 0;

    U64 curQPC = GetQPC();

    U64 delta = curQPC - s_lastQPC;

    s_lastQPC = curQPC;

    if ( delta < HUGE_NUMBER )
        s_lastAbsolute += delta;

    return s_lastAbsolute;
}

(note that "delta" is unsigned, so when QPC jumps backwards, it will show up as as very large positive delta, which is why we compare vs HUGE_NUMBER ; if you're using QPC just to get frame times in a game, then a reasonable thing is to just get the raw delta from the last frame, and if it's way out of reasonable bounds, just force it to be 1/60 or something).

Urg.

BTW while I'm at I think I'll evangelize a "best practice" I have recently adopted. Both QPC and TSC have problems with wrapping. They're in unsigned integers and as your game runs you can hit the end and wrap around. Now, 64 bits is a lot. Even if your TSC frequency is 1000 GigaHz (1 THz), you won't overflow 64 bits for 194 days. The problem is they don't start at 0. (

Unsigned int wrapping works perfectly when you do subtracts and keep them in unsigned ints. That is :


in 8 bits :

U8 start = 250;
U8 end = 3;

U8 delta = end - start;
delta = 8;

That's cool, but lots of other things don't work with wrapping :


U64 tsc1 = rdtsc();

... some stuff ...

U64 tsc2 = rdtsc();

U64 avg = ( tsc1 + tsc2 ) /2;

This is broken because tsc may have wrapped.

The one that usually gets me is simple compares :


if ( time1 < time2 )
{
    // ... event1 was earlier
}

are broken when time can wrap. In fact with unsigned times that wrap there is no way to tell which one came first (though you could if you put a limit on the maximum time delta that you consider valid - eg. any place that you compare times, you assume they are within 100 days of each other).

But this is easily fixed. Instead of letting people call rdtsc raw, you bias it :


uint64  Timer::GetAbsoluteTSC()
{
    static uint64 s_first = rdtsc();
    uint64 cur = rdtsc();
    return (cur - s_first);
}

this gives you a TSC that starts at 0 and won't wrap for a few years. This lets you just do normal compares everywhere to know what came before what. (I used the TSC as an example here, but you mainly want QPC to be the time you're passing around).

04-23-09 - iPod Hardware - the harbinger of suck

Is god fucking awful. Stop touting it as the greatest example of product design in this century. Yes, yes, the screen is nice, and the basic shape and weight of it is appealing. If it was just a paperweight I would be pretty pleased with it. But when you actually try to *use* it, it's rubbish. (it's the Angelina Jolie of product design if you will - it's the canonical example that everyone uses of something that's great, but it's actually awful).

Try to actually browse through the menus with that fucking wheel. Scan through a big list of artists, back up, change to album view, scan down, it's awful.

The wheel is a disaster for volume control. The right thing for volume is a knob, or a dial. Something physical that you rotate. And it shouldn't be a fucking digital dial that just spins like all the shit that they're giving us on computers now. It should be an *absolute* dial with an actual zero point, so that I can turn the volume down to zero when the thing is off. Hitting play on an iPod is fucking ear drum roulette, you never know when it's going to explode your head.

You're playing a song, you pause it, you go browse to some other song. You want to just resume the original song. How do you even do that !? I suppose it must be possible, but I don't know. It should just be a button.

Design for music playing devices has been perfect for a long time. You have play, pause, skip, volume. You have those on different buttons that are ALWAYS those buttons. They're physical buttons you can touch, so you can use the device while it's in your pocket or your eyes are closed. They should be rubber (or rubberized) so they're tactile, and each button should have a different shape so you know you're on the right one.

It's just fucking rubbish. It's like so many cars these days, changing user interfaces for no good reason and making them worse. Don't give me fucking digital buttons to increment and decrement the air conditioning, that's awful! Give me a damn dial or a slider that has an absolute scale. I don't want to be hitting plus-plus-plus.

The damn "Start" button that's on so many cars now really pisses me off. You used to stick in a key, then turn it. What's wrong with that? It works perfectly fucking fine. Now I have to stick in a key, make sure I'm pressing the brake or the clutch or whatever, then press a start button. Why !? It's more steps, it's just worse.

The worst of course is the menu shit like iDrive that's basically an iPod style interface with a fucking wheel and menus and context-dependent actions. Context-dependent actions are fucking horrible user interface design, quit it. With consumer electronic devices there should just be a few buttons, and those buttons always execute the same action and do it immediately. I know some jack-hole is going to run into me because he was trying to mate his bluetooth and was browsing around the menus.

4/22/2009

04-22-09 - Spam

WTF gmail, why do you keep letting obvious nonsense through?

I literally get around 10 mails like this a day :


From: Susan Brown <susannjtnpocs@hotmail.com> 

Subject: Authentic discounted prescriptions from Canada. 

Fast delivery and qualitative support- come and see yourself.

www.sentencefallcold.in

How can you not tell that's spam !?

On the plus side, today it delivered one of the more amusing spams I've ever gotten :

I want you to read this message very carefully, and keep the secret with you till further notice, You have no need of knowing who I am, where am from, till I make out a space for us to see, I have being paid me $15000 in advance to terminate you with some reasons listed to me by my employer, it's one I believe you call a friend, I have followed you closely for one week and three days now and have seen that you are innocent of the accusation.

Do not contact the police or security agent or try to send a copy of this to them, because if you do I will know, and might be pushed to do what I have being paid to do,,beside, this is the first time I turned out to be a betrayer in my job. Now, listen, I will arrange for us to see face to face but before that you have to pay me some $20,000. I will be coming to see you in your office or home determine where you wish we meet, do not set any camera to cover us or set up any tape to record our conversation, my employer is in my control now, I will give you the full information of my employer and video and audio tape conversation with him that contains his request for me to terminate you, which will be enough evidence for you to take him to court (if you wish to).

You don’t need my phone contact for now till am assured you are ready to comply positively. Once more I shall remain anonymous, however, you have no alternatives other than to co-operate positively and do as I said or face the brewing storm. You must also trust that as long as you keep your part of the deal, I shall too keep mine and live up to my words. This time I am holding the trump card and shall put it to effective use if you fail to co-operate positively as requested. Any move by you other than co-operation will have catastrophic consequences to you. I wait for your response urgently to settle this and counter my employer directives against you immediately.

Thanks you for your attention Cross

I love the writing style. It's got that high-school-kid-trying-to-sound smart ring to it. "do as I said or face the brewing storm" - how precious! It reminds me a bit of VCB. "They sat in the cafe and discussed art and romance." Oh yes. I am such a passionate latin lover, I can't be bothered to button my shirt properly, and I love to listen to Spanish guitar and discuss romance.

4/17/2009

04-17-09 - Bleck

We've seen some really awful movies recently.

"The Queen" : wow, WTF is the point of this movie. It's hard to imagine anything less interesting than the royal family, and despite what all the critics say, the portrait of them was completely stereotypical and superficial. In no way was it interesting or realistic or surprising. Plus like 50% of the time is spent on stock footage of the whole Diana death bullshit.

"Vicky Christina Barcelona" : another wowie zowie this was shockingly bad. People talk about this movie and "Match Point" as the reemergence of Woody Allen after a period of making predictable snoozers. Yes his 90's movies were pretty weak, but these are even worse. VCB is literally laugh-out-loud awful. I'm actually a little perplexed by it. I'm not quite sure if the whole movie is supposed to be a "level". The voice over for example is intentionally bad, right? I mean that voice over can't be serious, it's super cheezy bad like a romance novel or a Penthouse Forum letter, I assume/hope that's on purpose. I guess that all the critics that liked this movie are just horny sexually repressed nerds that were aroused by all the hints of sexuality.

I guess I'll go back to watching endless old Top Gear episodes.

The new High Stakes Poker cast is disappointing too. Hachem, Laak and Esfandiari are some of the biggest cocks in poker. They're all cry-baby attention whores with no real game skill or interesting personality. Laak might be the worst; he's a huge nit and just generally sucks; he normally plays $10/20 NL live and just bum-hunts. Last episode he made one of the most retarded plays in poker - raising to "defend" a weak hand. It's a play that tilts me so bad, because dumb "pros" like Laak or Gabe Kaplan will recommend it, and it usually works so they think they did the right thing. Basically if you flop a hand that is decent and probably best, but is not strong, you should almost never be raising with that. Like say you flop a weak middle pair or something like that. You just call, you don't raise. (obviously there are exceptions, everything in poker is situational, there are no formulas or set "moves").

04-17-09 - Oodle File Page Cache

I'm trying to figure something out, maybe someone out there has a good idea.

This is about the Oodle File Page Cache that I mentioned previously. I'm not doing the fully general page cache thing yet (I might not ever because it's one of those things you have to buy into my philosophy which is what I'm trying to avoid).

Anyway, the issue is about how to prioritize pages for reclamation. Assume you're in a strictly limitted memory use scenario, you have a fixed size pool of say 50 pages or so.

Now obviously, pages that are actually current locked get memory. And the next highest priority is probably sequentially prefetched pages (the pages that immediately follow the currently locked pages) (assuming the file is flagged for sequential prefetching, which it would be by default).

But after that you have to decide how to use the remaining pages. The main spot where you run into a question is : I need to grab a new page to put data into, but all the pages are taken - which one do I drop and recycle? (Or, equivalently : I'm thinking about prefetching page X, the least important page currently in the pool is page Y - should I reclaim page Y to prefetch page X, or should I just wait and not do the prefetch right now).

The main ambiguity comes from "past" pages vs. "prefetched" pages (and there's also the issue of old prefetched pages that were never used).

A "past" page is one that the client has unlocked. It was paged in, the client locked it, did whatever, then unlocked it. There's one simple case, if the client tells me this file is strictly forward-scan streaming, then the page can be dropped immediately. If not, then the past page is kept around for some amount of time. (there's another sequence point when the file containing the page is closed - again optionally you can say "just drop everything when I close the file" or the pages can be kept around for a while after the close to make sure you were serious about closing it).

A "prefetched" page obviously can be prefetched by sequential scan in an open file. It could also be from a file in the prefetch-ahead file list that was generated by watching previous runs.

Prefetches pages create two issues : one is how far ahead do I prefetch. Basically you prefetch ahead until you run out of free pages, but when you have no free pages, the question is do I reclaim past pages to do new prefetches?

The other issue with prefetches is what do you do with prefetched pages that were never actually used. Like I prefetched some pages but then the client never locked them, so they are still sitting around - at what point do I reclaim those to do new prefetches?

To make it more clear here's a sort of example :


Client gives me a prefetch file list - {file A, file B, file C, file D}

Client opens file A and touches a bunch of pages.  So I pull in {A:0,A:1,A:2} (those are the page numbers in the file).

I also start prefetching file B and file C , then I run out of free pages, so I get {B:0,B:1,C:0}.

Client unlocks the pages in file A but doesn't close file A or tell me I can drop the pages.

Client now starts touching pages in file C.  I give him C:0 that I already prefetched.

Now I want to prefetch C:1 for sequential scan, I need to reclaim a page.

Do I reclaim a page from B (prefetched but not yet used) or a page from A (past pages) ?  Or not prefetch at all?

When client actually asks for C:1 to lock then I must reclaim something.

Should I now start prefetching {D:0} ?  I could drop a page from {A} to get it.

Anyway, this issue just seems like a big mess so I'm hoping someone has a clever idea about how to make it not so awful.

There's also very different paradigms for low-page-count vs high-page-count caches. On something like the PS2 or XBox 1 where you are super memory limitted, you might in fact run with only 4 pages or something tiny like that. In that case, I really want to make sure that I am using each page for the best purpose at all times. In that scenario, each time I need to reclaim a page, I should reevaluate all the priorities so they are fresh and make the best decision I can.

On something like Windows you might run with 1024 pages. (64k page * 1024 pages = 64 MB page cache). In that case I really don't want to be walking every single page to try to pick the best one all the time. I can't just use a heap or something, because page priorities are not static - they can change just based on time passing (if I put any time-based prioritization in the cache). Currently I'm using a sort of cascaded priority queue where I have different pools of priority groups, and I only reevaluate the current lowest priority group. But that's rather complicated.

04-17-09 - Waffling

I'm waffling about what car to get. I really *need* a new car because I've realized that the Qualude is just too small for me. I've been sitting hunched over all my life, and I'm trying to correct that and decompress my spine, and I just can't do it.

I'm waffling about whether to have surgery. Some days my shoulders feel sort of okay and I think "I could live with this" but then other days I'm in a lot of pain and it's just all crunchy and grinding and searing pain and awful. And of course any time I try to do a pushup I think "I need surgery".

I always see both sides of every decision, and it's quite paralyzing. In my opinion, most people, even smart people are very cavalier about ignoring the pros and cons. But they are probably right to do so. It's better to just pick something and go with and do your best with that decision. To put it in programming terms - I would say 90% of the smart programmer that I know are far too opinionated about various style issues, they don't rationally give fair credit to the other side's arguments and see that there are good arguments on both sides. But that's fine. Somebody who just says "low level C-style is the one true way" and runs with it might be wrong, but they can be productive because they have just made a decision and are getting work done. Somebody who sits around all the time thinking "hmm would it be better to do this C-style, or C++ OOP? or maybe need a GC language, or I could use OCaml for this bit.." is wasting way too much time equivocating.

Most successful business men, and just dynamic and fun people that I admire, are quick decision makers that just pick something and stick with it.

A related topic I've been thinking about recently : I've always been very dubious about the idea of learning from people who have been successful. There's this whole cult of worshipping rich people, reading interviews with them, getting their opinions on things, trying to learn what made them successful. I think it's mostly nonsense. The thing is, if you just look at who the biggest earners are, it's almost entirely luck.

Think about it this way - you have a bunch of gamblers. Some of them are very good and will make a steady profit of +10 a year with not much variance. The others suck, but play very wild, so their expected profit is +0, but they have high variance, so it could go between -100 and +100 in any given year.

If you look at the list of who the top winners are in any given year, it will be the people who suck. It will just be the ones that happened to get lucky. If you also look at the biggest losers, they will be using the exact same strategy as the biggest winners.

To make a more concrete example - lets say you have a biased coin you're allowed to flip. It has a 55% chance of being heads and you can bet as much of your bankroll as you want on it. A smart gambler might use a "Kelly" bet size ( see my old blog post ) and make a nice expected profit. A crazy gambler would just bet their whole bankroll on every flip. If you have 1000 people betting smart, and 1000 people betting crazy , after 10 flips all the biggest winners will be people playing the crazy strategy.

Anyway, the point is if you just look at successful business people, they will probably be confident, decisive, risk takers, aggressive at seizing opportunities, aggressive about growing the business quickly, etc. That doesn't mean that those are the right things to do. It just means that those are variance-increasing traits that give them a *chance* to be a big success.

Now, just like in poker, there are times when variance-increasing is the right move. For example, if you believe that you basically suck. If you think you don't have an edge on your opponent, then the best way to beat him is to just play really wild and hope you get lucky. For example, the best way for a novice poker player to have a chance in a tournament is to just do lots of all-in preflop shoving and hope you win the draws. The same thing could certainly be true in capitalism; if you believe you don't really have any great money-making skills, your best way to get rich is to variance it up.

4/16/2009

04-16-09 - The Daily Ramble

Seattle is getting fucked. They're destroying Capital Hill. It's still sort of charming now, with old brick buildings, lots of greenery and plants, cheap rents and hipsters and artists. But they are tearing down the cool old buildings as fast as they can and putting up awful generic big blocks of condos plus retail space. They are literally turning it into Bellevue as fast as they can.

I despise dealing with the fucking receptionists at all these doctor places. I'm pretty sure the receptionist at MTI Physical Therapy is an alcoholic. I feel like I can identify it with just a minute of contact. There's a certain lazy attitude, a sloppy way of sitting, and a sort of floppy facial muscle nature, the way they talk is like their lips are too big and loose and they have to chew out the words. She's also got the sort of big jaw and brow and large nose with the bad pitted skin that I believe is caused by liver damage (Petechiae).

Anyway, it makes me think it might be nice to have a personal assistant. In theory you don't actually need to be making all that much money for a PA to be worth it. If they could actually remove all these distractions for me and help me focus on work and have a clear mind, it would be a huge boon. I don't want to have to think about bills or car registration or talking to receptionists or shopping or anything. I want only work and pleasure. It would give me time to exercise more, and to do more hobby programming. My concern would be that I couldn't actually trust the PA. Not "trust" as in worry they would steal from me, but trust as in be confident that they are taking care of things correctly. If I can't trust them, then it doesn't free my mind from worrying about all those things; I have to have full confidence that they are on top of everything, and it seems like someone solid enough to take care of things is not going to be a PA.

The tulips have started opening up around the city. I love the way the spring bloom here has such discrete stages. In a lot of places (Texas) it just suddenly happens one day and everything is blooming, but here we've had these steps as different species hit their time one by one. It's also pretty amazing the way the bulbs around here naturalize and just pop up in random parking strips and between bricks and such; we've had crocus, daffodil, and now the tulips.

I found a nice way to get a bike up and down cap hill - Interlaken Drive in Interlaken Park. It's like a tiny bit of woods, and there's nary a car on it. It's also got the really cool Hebrew Academy on it, which looks like some old gothic mansion; it makes me think of Wuthering Heights or Dickens or something, I imagine a rich old man muddling around in its dusty halls with a terribly unhappy neice that he forbids to go outside. I put some pictures on my flickr .

4/15/2009

04-15-09 - Oodle Page Cache

So I'm redoing the low level file IO part of Oodle. Actually I may be retargetting a lot of Oodle. One thing I took from GDC and something I've been thinking about a long time is how to make Oodle simpler. I don't want to be making a big structure that you have to buy into and build your game on. Rather I want to make something "leafy" that's easy for people to plug in at the last minute to solve specific problems.

Pursuant to that, the new idea is to make Oodle a handful of related pieces. You can use one or more of the pieces, and each is easy to plug in at the last minute.

1. Async File IO ; the new idea with this is that it's cross platform, all nice and async, does the LZ decompression on a thread, can do DVD packaging and DVD emulation, can handle the PS3/Xenon console data transfers - but it just looks like regular files to you. This is less ambitious than the old system ; it no longer directly provides things like paging data in & out, or hot-loading artist changes; you could of course still do those things but it leaves it more up to the client to do that.

The idea is that if you just write your game on the PC using lazy loose file loads, boom you pop in Oodle and hardly touch the code at all, and you automatically get your files packed up nice and tight into like an XBLA downloadable pack, or a DVD for PS3, or whatever, and it's all fast and good. Oh, and it also integrates nicely with Bink and Miles and Granny so that you can things like play a Bink video while loading a level, and the data streamers share the bandwidth and schedule seeks correctly.

2. Texture goodies. We'll provide the most awesome threaded JPEG decoders, and also probably a better custom lossy and custom lossless texture compressors that are specifically designed for modern games (with features like good alpha-channel support, support for various bit depths and strange formats like X16Y16 , etc.). Maybe some nice DXTC realtime encoding stuff and quality offline encoding stuff. Maybe also a whole custom texture cache thing, so you can say Oodle use 32 MB for textures and do all the paging and decompression and such.

3. Threading / Async utilities. You get the threaded work manager, the thread profiler, we'll probably do "the most awesome" multithreaded allocator. We'll try to give these specific functions that address something specific that people will need to ship a game. eg. a customer is near ship and their allocator is too slow and taking too much memory so they don't fit in the 256 MB of the console. Boom plug in Oodle and you can ship your game.

Anyway, that's just the idea, it remains to be worked out a bit. One thing I'm definitely doing is the low level IO is now going through a page cache.

As I'm writing it I've been realizing that the page cache is a super awesome paradigm for games in general these days. Basically the page cache is just like an OS virtual memory manager. There's a certain limited amount of contiguous physical memory. You divide it into pages, and dynamically assign the pages to the content that's wanted at the time.

Now, the page cache can be used just for file IO, and then it's a lot like memory mapped files. The client can use the stdio look-alike interface, and if they do that, then the page cache just automatically does the "right thing", prefetching ahead pages as they sequentially read through a file, etc.

But since we're doing this all custom and we're in a video game environment where people are willing to get more manual and lower to the bone, we can do a lot more. For example, you can tell me whether a file should sequential prefetch or not. You can manually prefetch at specific spots in the file that you expect to jump to. You can prefetch whole other files that you haven't opened yet. And perhaps most importantly, you can assign priorities to the various pages, so that when you are in a low memory situation (as you always are in games), the pages will be used for the most important thing. For example you can prefetch the whole next file that you expect to need, but you would do that at very low priority so it only uses pages if they aren't needed for anything more urgent.

The next awesome thing I realized about the page cache is that - hey, that can just be the base for the whole game memory allocator. So maybe you give 32 MB to page cache. That can be used for file IO, or video playback - or maybe you want to use it to pop up your in game "pause menu" GUI. Or say you want to stream in a compressed file - you map pages to read in the packed bits, and then you map pages as you decompress; as you decompress you toss the pages with the packed bits and leave the uncompressed pages in the cache.

Say you have some threaded JPEG decompress or something - it grabs a page to decompress to. The other cool thing is because all this is manual - it can grab the page with a certain priority level. If no page is available, it can do various things depending on game knowledge to decide how to respond to that. For example if it's just decompressing a high res version of a texture you already have in low res, it can just abort the decompress. If it's a low priority prefetch, it can just go to sleep and wait for a page to become available. If it's high priority, like you need this texture right now, that can cause other pages that are less important to get dropped.

Pages could also cache procedurally generated textures, data from a network, optional sounds, etc. etc.

I think about it this way - in the future of multicore and async and threading and whatnot, you might have 100 jobs to run per frame. Some of those jobs need large temp work memory. You can't just statically assign memory to various purposes, you want it to be used by the jobs that need it in different ways.

04-15-09 - PEPPR

There's a black Porsche Cayenne in Seattle with the custom plate "PEPPR". That on its own is reason enough to punch this guy in the crotch, but he's also a complete douchebag driver. He was speeding up and slowing down, tailgating, cutting across many lanes of traffic. He cut way across the merge triangle from 520 to 405, which kicks up a ton of debris, and he also did it straight into heavy traffic, causing lots of people to have to brake for him. What a cock. If you ever see this guy, please run into him intentionally, or at least key his car.

It is your moral duty as someone alive to punish dicks. We are all investors in the "social economy" that we all play in. If you don't punish dicks, you're just as bad as the pension managers who were asleep at the wheel of corporate oversight.

BTW you may be thinking it's hypocritical of me to criticize crazy drivers. Yes, I am somewhat crazy myself, but I try to be crazy without inconveniencing others. I try to be considerate. I hate people who drive too slow, but I also hate people who drive too crazy. That's not a contradiction - the point is that like most things in life there is a good middle ground, and you are a dick if you go too far in either direction.

04-15-09 - Bodies and Computers

I'm trying desperately to avoid sitting at the computer when I'm not working, but so far I haven't been too successful. I just don't know what else to do with myself. Last weekend I tried to make myself do all kinds of non-computer stuff. I put up drape hold backs, I vacuumed the radiators, I tacked the cable TV line to the floorboards, I went shopping for clothes around the city, I went for walks. I made a pork shoulder ragu with papardelle (that was quite fantastic, maybe the best ever ; similar to this or this ; some mods I made : use a mix of stock and red wine for braising liquid; when the braise is done, remove the meat and boat-motor the veg to make a sauce, use a bit of fennel seed, and one dried red chile).

And that all only took a few hours. Now what? I can read, watch TV, or get on the computer.

I kinda want to make a little iPhone game, but I'm not letting myself because that would mean a lot more computer sitting. (it would also surely distract me from the work I need to be doing for RAD).

If you are a young man thinking about a career, I strongly encourage you to stay away from computers. They will ruin your life. You will spent your years trapped in cubicles under flourescent lights, hanging out with a bunch of other nerdy men, not doing anything really dynamic or exciting in your life, and slowly ruining your body through keyboarding and too much sitting. You won't have contact with girls, or fun different people, you won't do things that you can tell stories about, you won't have freedom to develop your other interests, it will take all your time and you'll wind up getting pigeon-holed into some specialty that becomes increasingly tedious to you.

ADDENDUM : I don't think anyone sees my addendums now, because the first draft gets sent out with RSS, and then nobody goes back to the original page to see the updates. Mmm I kinda hate RSS. Anyhoo -

I got a medical massage from this guy named John Bagley at Belltown Healing Arts. It was fantastic, I felt a ton better afterward, and actually went in for a doctor exam later that same day, and my function tests were way improved. He's literally the first person I've seen that I felt like really examined my body and identified the problems. If you have computer/injury body problems, I highly recommend him, though it does mean you will have to be touched by a man, which may be confrousing to some.

If you go in to an orthopedist or a physical therapist and they don't tell you to take your shirt off, and actually look at and touch your muscles and bones to see what's going on - just walk out. They're a fucking hack. How can they diagnose you and treat you without seeing what's going on? Furthermore, a physical therapist should actually have a finger on your muscles as you do the exercises. If they just sit there and look bored and you do your reps, walk the fuck out.

4/13/2009

04-13-09 - Fashion Chicken

I think fashion is kind of like a game of chicken. When I see someone that makes me stop and go "wow that's a cool look" it's almost always because they're wearing something really ridiculous, something that actually just looks goofy and retarded. But by wearing something ridiculous and pretending that it's okay, they are showing that they have enough status and coolness to get away with it. They're daring the world to laugh at them.

In PUA lingo, this is DHV (Demonstration of Higher Value). When someone wears something really outrageous it's like making a poker bluff saying "I am cool enough to get away with this outfit". They put that bet out there, and then the world can either call their bluff (by laughing at them), or if the world lets them get away with it, they have succeeded with the DHV.

Of course most people, like me, chicken out and don't even try to be outrageous. Wearing bland ordinary stuff is a way of being meek, invisible, of not trying, because you're afraid your bluff will fail and the world will laugh at you.

04-13-09 - Anger

My fucking physical therapist lied to me about being a preferred provider for my PPO, so now I get to go through health care grievance arbitration. Yay ! I'm not too optimistic about that working out well, so I probably get to eat the cost. Fuckers.

In other news our fucking cock landlord has the heat in the building turned off and it's been super cold again recently, so we had literally no heat over the weekend and it was miserable, so I got the fun of calling him.

Oh, and the white trash neighbors had a backyard bon fire at midnight and filled our house with smoke. That's the last one I'm tolerating, next time I just tell them no, and if it persists they get reported for fire code violations. I went and talked to them about it once before and they were just like "oh yeah okay". I have no problem confronting them, I'm just concerned about the possible revenge response. (* see later).

And the other side neighbors just had a baby. It's not actually that bad, we hear it cry a bit once in a while, but the side walls in the building are almost half decent (unlike the floors). The baby neighbors are actually incredibly considerate about not making noise, actually too considerate, I find it annoying. It makes me feel uptight just being around them. I sort of imagine that they put a pillow over the crying baby because they're worried about it being too loud.

The whole situation with health care billing pisses me off so much. You don't get to see your bill until weeks after you go in, because the health plan has to approve or not approve various charges. Many providers illegally try to "balance bill" you. Watch out for that. Even if you are careful and specifically try to go to a preferred providers, they will often take you in for xrays or something and the xray tech is not a preferred provider, so blammo suddenly you get a nasty bill. The thing that happened with my physical therapist is that the company is a preferred provider, but the particular therapist that they assigned me to is not.

It's ridiculous how bad the health insurance system is if you think about it. The health insurance companies should want you to go to cheaper, better providers. It would save them money, which should be the whole point of an HMO type company, is to set you up with better care for cheaper. But they don't, they do nothing. The right way to do it would be to provide information and choise to the customer. For example, make you pay a 10% coinsurance, and then show you the average cost of various providers. Also, they should give you quality ratings for various providers. It is in their best interest financially to send you to better quality providers who make you healthier in the minimum number of visits, because that reduces their cost of followups and complications. They are better off if you see surgeons who have a higher improvement rate. But they provide you with none of that information. Instead they literally just give you a provider directory with 1000 names in it. Gee thanks, now what do I do? Eenie meanie miney moe.

Some random less bitter junk :

This is pretty old, but the NYT Buy vs. Rent Calculator is super awesome. It's just an amazingly well designed piece of interactive statistics graphing. You can click on all the spots to get more info, you can drag the sliders on the left and it adjusts the graph. It's just so money.

The crashing real estate market is sort of tempting me, but this actually reminded me that it's not so hot. Yes, things are closer to a decent bargain than they have been, but even undervalued stuff is not likely to go up much at all in the next 5 years, what with finances being generally tight and people being scared of buying. I'm expecting home prices to be almost level for 5 years, so even getting into something that seems like a bargain right now isn't really a great move unless you hold it for 10 years or so.

(*) : I think the fear of revenge response is actually usually wrong. For example in customer service situations, when someone is being totally incompetent, like the other day I went to the UPS store and was trying to send something to Lebanon, PA, and the guy starts asking about declared value and contents and whatnot, I started thinking in my head "wait, are you doing a custom form to send this to the country of Lebanon?" but I just tried to be nice. The logic is that if you just call them on their stupidness then they will be unhelpful, while if you are polite and nice to them, they will try to help you more and give you better service.

I think that's mostly wrong. Most people just try to incompetently muddle through life and are just constantly fucking up and cutting corners and being lazy and not doing their job. If you let them, they will keep doing that. If you give an inch by being nice, they will just keep fucking up. If you call them on and say "hey, I'm sending this to Pennsylvania buddy", they get shocked out of it, they realize they can't get away with sneaking their half assed shit past you, because you're going to call them on it. Then they want to get their work done and take care of you because they don't want to be around someone that calls them on shit.

Also : Majesco's stock ticker is COOL !? WTF that is so awesome. And it's perfect if you ever meet the Majesco people. They know absolutely nothing about video game development, and literally talk like The Sopranos; I swear that company is some kind of mob money laundering operation.

4/08/2009

04-08-09 - Link

Right now in Seattle, the Lawrimore Project features "Stability" - two people are living inside a giant teeter totter, so that they have to balance each other all the time.

Bakery Nouveau in West Seattle is probably the best bakery I've found yet here. It seems they only do one early morning baking, so late-day stuff is a bit stale. It's worth going in the morning on the weekend, fantastic pastries, bread, quiche and etc.

Skagit Valley is home to lots of Tulip and Daffodil farms where the fields are full of flowers for commercial production. It's quite a pretty area. The main bloom is coming in the next few weeks . I might try to get up there for a little road trip.

D & M seems to be one of the few places that will ship liquor to WA. Nice scotch selection.

Blackbird in Ballard seems to be the only actual cool men's clothing store in Seattle. Rather pricey.

You may know I've been looking at getting a 135 (damn Ryan beat me to it). One interesting option is Performance Center Delivery . You get the car cheaper, and you get a free track day out of it, which would be awesome. The down side is it's in fucking South Carolina (what kind of moron would ever want to live in SC!) which is an awfully far way to drive back here.

CoreInfo is useful for seeing how your cores map to logical processors and all that nonsense which will become ever more critical. I find all the low level lock free multithreading stuff super amusing to program, but it's just really not practical most of the time, because it does make code much harder to modify and debug and everything.

Kubota Garden is a large Japanese Garden in Seattle that was build almost entirely by a single family over many years. The city now runs it. Will have to check that out.

Good links from other blogs : Meme timeline and radical cartography .

Gus Hansen TV is pretty fucking awesome. He's quite a nutter. I think the jury is actually still out on whether he is any good or not. Most of the top players seem to think he is a huge fish - the high stakes games actually run *around* Gus right now, but Gus is not actually losing money in them yet. Time will tell. It requires Silverlight and a decently fast connection.

Some Puget Sound area touring links. I haven't done any of this yet so I don't know what of this information is actually good :

Trails and trips for Washington hiking, free info and resources...
SummitPost - Mountain Loop Highway -- Climbing, Hiking & Mountaineering
State Scenic Byway Scenic Driving Tours Washington State
National Scenic Byways Washington
Mountain Loop Highway, Glacier Peak Region, Washington
GORP - Washington Scenic Drives
Beat back the forest to exquisite Bedal Basin
Alpinism in the Pacific Northwest

04-08-09 - Knife Dipping

When you're putting condiments on your sandwich, you have to go in the right order. If you want mustard, mayo, and butter (btw butter on french bread is what makes a sandwich yum) you should go butter first, then mayo, then mustard. Getting a little butter in the mayo is no big deal, same with getting a little mayo in the mustard, but getting mustard on the butter would be a major faux-pas.

It's a trickier question with peanut butter and jelly. Do I get the PB in the jelly, or the jelly in the PB? And BTW please don't suggest using two knives, that is right out.


I've been going to a lot of physical therapy and doctor appointments and whatnot recently. It's such a huge distraction. For one thing it just takes a ton of time. The appointment might be only an hour, but you have to get there & park, then wait because they're always off schedule, then drive back. It winds up taking about three hours. And then there's the fact that it breaks up your day. I can't concentrate for hours before an appointment because I know an appointment is coming up. (it's kind of like how if I know I have to wake up early for something, I can't sleep that whole night because I keep waking up every half hour and looking at the clock wondering if I'll sleep through the alarm somehow). The whole hour before the appointment I can't really do any work because I don't want to start digging into something and get on a roll if I'm gonna have to cut it off.

The result is that on these days with doctor shit, I hardly get a lick of work done, and I just can't get my mind back onto focus. I only really do awesome work when I can wake up and just start working and not have any appointments or anything scheduled. I need to just be able to work as long as I want, when I want.

It also gets really pricey. At close to $100/visit at 3 visits to different people a week, you get into $1000/month very fast. I guess poor people don't get to be healthy. One of the weird things is that even though the physical therapist visit charges $200, (and I pay $50-$100 out of pocket that the insurance doesn't cover), the actual therapist is not making much money at all, they get maybe $30/hour ($60k/year). $170 is going to overhead, to the owner of the PT business, to the health insurance company and its executives and stock holders.

This is basically the model of all commerce in America these days - the product is too expensive for median-income consumers, and yet the people who actually make the product don't get that money, and they aren't paid enough to buy the very thing they make. The difference is getting skimmed to the super-rich. This is how income inequality grows. As a consumer, you should demand higher quality goods for cheaper prices, which means more of your money is going directly to the people who actually made the goods or provided the services. You should refuse to buy expensive garbage where 50%+ of the cost goes to management and shareholders.


Wright Angle is another nice quality Seattle food blog. Not awesome humor content like Surly Gourmand, but lots of good info and photos. It's helpful to me just to have blog subscriptions related to the things I like to do in life, because it gives me little reminders to get out and do those things. I'd like to find one about cycling around Seattle, maybe one about day trips and driving tours in the Puget Sound area, maybe one about raves, one about S&M and swinging, you know, all my interests.


Drew sent me this : NVidia Ion mini PC . I love these little mini cheap quiet PC's. NVidia might save their company with the cheap integrated market segment, because their integrated controller/graphics part is by far the best on the market right now (my god AMD/ATI should have such a clear advantage in this segment but they just can't get their shit together).

Anyway, the Ion PC made me think about something Butcher's been saying that just finally really rang true to me : the real loser in these PC price wars is MS. I mean obviously that's not true, clearly the hardware guys are the first to take the big hit, like Dell is in trouble because the Micro PC's are driving down prices and profit. But once we get into $300 and cheaper PC's, the problem for MS is that a $50-$100 copy of Windows doesn't make much sense any more.

When a PC is $1000 you can easily hide the $100 price of Windows, it doesn't seem so significant. But if you look at a $200 mini PC running Linux vs. a $300 mini PC with Windows, people will go for the Linux. (MS is aware of this and has recently started selling Windows cheaper to OEM's for use in the mini-PC market).

4/07/2009

04-07-09 - Unbelievable

Fucking doctors are such fucking cock ass mother fucking scum sucking sons of bitches.

I've requested my medical records from every doctor that's been involved with my shoulders. I've gotten them from zero. The offices always treat me like I'm being such a huge pain in the ass. Fuck you it's my fucking records. When I tell them that I just referred myself they give me all this attitude. Fuck you, I have a PPO and I have fucking torn shoulders, I know I should be seeing a shoulder specialist, I don't need to go to some fucking primary care retard just to get referred back to you.

The doctors spend less than 5 minutes actually looking at me. They don't ask about detailed history at all. They immediately send you for MRI's. But then they literally don't even look at the MRI's. They just read the one page summary by the MRI tech. I've read those summaries and they're literally like one or two sentences. The doc doesn't read it before you visit, they just skim the summary once they get handed your chart in the office. Then they refer you to Physical Therapy.

The PT and the doc literally never talk. In fact, the doc doesn't even write any notes or instructions for the PT. When you show up at the PT office, they don't have your medical records or know anything about you really. WTF am I paying all you fucking tards so much for !?

Even at the PT places, the actual trained PT only sees you for maybe half an hour or so, and then you get handed off to the exercise babysitter who doesn't know shit about shit.

4/06/2009

04-06-09 - Multi-threaded Allocators

I've been reading a bit about Multi-threaded Allocators. A quick survery :

The canonical base allocator is Hoard . Hoard is a "slab allocator"; it takes big slabs from the OS and assigns each slab to a fixed-size block. In fact it's extremely similar to my "Fixed Restoring" allocator that's been in Galaxy3 forever and we shipped in Stranger. The basic ideas seem okay, though it appears to use locks when it could easily be lock-free for most ops.

One thing I really don't understand about Hoard is the heuristic for returning slabs to the shared pool. The obvious way to do a multi-threaded allocator is just to have a slab pool for each thread. The problem with that is that if each thread does 1 alloc of size 8, every thread gets a whole slab for itself and the overhead is proportional to the number of threads, which is a bad thing going into the massively multicore future. Hoard's heuristic is that it checks the amount of free space in a slab when you do a free. If the slab is "mostly" free by some measure, it gets returned to the main pool.

My problem with this is it seems to have some very bad cases that IMO are not entirely uncommon. For example a usage like this :


for( many )
{
    allocate 2 blocks
    free 1 block
}

will totally break Hoard. From what I can tell what hoard will do is allocate you a slab for your thread the first time you go in, then when you free() it will be very empty, so it will return the slab to the shared pool. Then after that when you allocate you will either get from the shared pool, or pull the slab back. Very messy.

There are two questions : 1. how greedy is new slab creation? That is, when a thread allocates an object of a given size for the first time, does it immediately get a brand new slab, or do you first give it objects from a shared slab and wait to see if it does more allocs before you give it its own slab. 2. how greedy is slab recycling? eg. when a thread frees some objects, when do you give the slab back to the shared pool. Do you wait for the slab to be totally empty, or do you do it right away.

The MS "Low Fragmentation Heap" is sort of oddly misnamed. The interesting bit about it is not really the "low fragmentation", it's that it has better multi-threaded performance. So far as I can tell, the LFH is 99.999% identical to Hoard. See MS PPT on the Low Fragmentation Heap

We did a little test and it appears that the MS LFH is faster than Hoard. My guess is there are two things going on : 1. MS actually uses lock-free linked lists for the free lists in each slab (Hoard uses locks), and 2. MS makes a slab list *per processor*. When a thread does an alloc it gets the heap for the processor it's on. They note that the current processor is only available to the kernel (KeGetCurrentProcessorNumber ). Hoard also makes P heaps for P processors (actually 2P), but they can't get current processor so they use the thread Id and hash it down to P which leads to occasional contention and collisions.

The other canonical allocator is tcmalloc . I still haven't been able to test tcmalloc because it doesn't build on VS2003. tcmalloc is a bit different than Hoard or LFH. Instead of slab lists, it uses traditional SGI style free lists. There are free lists for each thread, and they get "garbage collected" back to the shared pool.

One issue I would be concerned about with tcmalloc is false sharing. Because the free lists get shared back to a global pool and then redistributed back to threads, there is no real provision to prevent little items from going to different threads, which is bad. Hoard and LFH don't have this problem because they assign whole slabs to threads.

The Hoard papers make some rather ridiculous claims about avoiding false sharing, however. The fact is, if you are passing objects between threads, then no general purpose allocator can avoid false sharing. The huge question for the allocator is - if I free an object on thread 1 that was allocated on thread 2 , should I put it on the freelist for thread 1, or on the freelist for thread 2 ? One or the other will make false sharing bad, but you can't answer it unless you know the usage pattern. (BTW I think tcmalloc puts it on thread 1 - it uses the freelist of the thread that did the freeing - and Hoard puts it on thread 2 - the freelist of the thread that did the allocation; neither one is strictly better).

Both tcmalloc and the LFH have high memory overhead. See here for example . They do have better scalability of overhead to high thread counts, but the fact remains that they may hold a lot of memory that's not actually in use. That can be bad for consoles.

In fact for video games, what you want in an allocator is a lot of tweakability. You want to be able to tweak the amount of overhead it's allowed to have, you want to be able to tweak how fast it recycles pages for different block types or shares them across threads. If you're trying to ship and you can't fit in memory because your allocator has too much overhead, that's a disaster.

BTW false sharing and threaded allocators are clearly a place that a generational copying garbage collector would be nice. (this does not conflict with my contention that you can always be faster by doing very low level manual allocation - the problem is that you may have to give tons of hints to the allocator for it to do the right thing). With GC it can watch the usage and be smart. For example if a thread does a few allocs, they can come from a global shared pool. It does some more allocs of that type, then it gets its own slab and the previous allocs are moved to that slab. If those objects are passed by a FIFO to another thread, then they can be copied to a slab that's local to that thread. Nice.

It seems clear to me that the way to go is a slab allocator. Nice because it's good for cache-coherence and false sharing. The ops inside a slab can be totally thread-local and so no need to worry about multithread issues. Slabs are large enough that slab management is rare and most allocations only take the time do inside-slab work.

One big question for me is always how to get from an object to its owning slab (eg. how is free implemented). Obviously it's trivial if you're okay with adding 4 bytes to every allocation. The other obvious way is to have some kind of page table. Each slab is page-aligned, so you take the object pointer and truncate it down to slab alignment and look that up. If for example you have 32 bit pointers (actually 31), and you pages are 4k, then you need 2 to the 19 page table entries (512k). If a PTE is 4 bytes that's 2 MB of overhead. It's more reasonable if you use 64k pages, but that means more waste inside the page. It's also a problem for 64-bit pointers.

There are some other options that are a bit more hard core. One is to reserve a chunk of virtual address for slabs. Then whenever you see a free() you check if the pointer is in that range, and if so you know you can just round the pointer down to get the slab head. The problem is the reserving of virtual address.

Another option is to try to use pointer alignment. You could do something like make all large allocs be 4k aligned. Then all small allocs are *not* 4k aligned (this happens automatically because they come from inside a 4k page), and to get the base of the page you round down to the next lowest 4k.

04-06-09 - Continuing Vista Shit

The Vista lappy keeps having one little quirk after another. The latest one is that it would go to 100% CPU if left idle for a while. This was a little hard to track down, because as soon as you touch it to try to see what's using the CPU - it shuts off that task, so you can't catch it in the task manager.

So I knew it obviously had to be one of those "background optimization" services that Vista so kindly runs for you. So I looked in the Scheduled Tasks thing for anything that was set to run "when machine is idle".

It looks like the culprit was CrawlStartPages.

Eat "disable" motherfucker.

04-06-09 - Washington State Liquor Laws

The Washington State Liquor Laws are literally a direct funnel of money to the big distributors and producers. I knew that intuitively, but I found a good article that lays out the details : here . The law literally prohibits retailers from negotiating prices with distributors, prevents retailers from buying directly from producers, and mandates a 10% markup. This kind of public-private monopoly is always bad for the public, but is quite popular with both Dems and Reps (see, eg. health care, power utilities, telecom, military contractors, etc).

04-06-09 - The Work Dispatcher

I thought I'd write a bit about my multithreaded "Worklet" dispatcher before I forget about it. I call little units of work "worklets" just because I like to be nonstandard and confusing.

The basic idea is that the main thread can at any time fire up a bunch of worklets. The worklets then go to a bunch of threads and get done. The main thread can then wait on the worklets.

There are a few things I do differently in the design from most people. The most common "thread pool" and "job swarm" things are very simple - jobs are just an isolated piece of independent work, and often once a bunch is fired you can only wait on them all being done. I think these are too limiting to be really generally useful, so I added a few things.

1. Worker threads that are not in use should go completely to sleep and take 0 cpu time. There should be a worker thread per processor. There might also be other threads on these processors, and we should play nice with arbitrary other programs running and taking some of the cpu time. Once a worker thread wakes up to do work, it should stay awake as long as possible and do all the work it can, that is, it shouldn't have to go to sleep and wait for more work to get fired so it can wake up again.

2. Worklets can have dependencies on other worklets. That way you can set up a dependency tree and fire it, and it will run in the right order. eg. if I want to run A, then B, then run C only after A & B are both done, you fire worklets {A}, {B} and {C: dependent on AB}. Dependencies can be evaluated by the worker threads. That's crucial because it means they don't need to stall and wait for the main thread to fire new work to them.

3. The main thread can block or check status on any Worklet. The main thread (or other game threads) might keep running along, and the worker threads may be doing the work, and we want to be able to see if they are done or not. In particular we don't want to just support the OpenMP style "parallel for" where we fork a ton of work and then immediately block the main thread on it - we want real asynchronous function calls. Often in games we'll want to make the main thread(s) higher priority than the worker threads, so that the worker threads only run in idle time.

The actual worker threads I implemented with a work-stealing scheme. It's not true work-stealing at all, because there is a concept of a "main thread" that runs the game and pushes work, and there's also the dependencies that need to be evaluated. All of the thread-thread communication is lock free. When the main thread adds new work items it just jams them onto queues. When the worker threads pop off work items they just pop them off queues. I do currently use a lock for dependency evaluation.

Traditional work stealing (and the main papers) are designed for operating system threads where the threads themselves are the ones making work. In that environment, the threads push work onto queues and then pop it off for themselves or steal from peers. There are custom special lock-free data structures designed for this kind of operation - they are fast to push & pop at one end, but also support popping at the other end (stealing) but more slowly. What I'm doing is not traditional work stealing. I have external threads (the "game threads") that do not participate in the work doing, but they can push work to the workers. In my world the workers currently can never make new work (that could be added if it's useful).

There are a lot of nice things about work stealing. One is you don't need a seperate dispatcher thread running all the time (which would hurt you with more context switches). Another is that workers who have cpu time can just keep jamming along by stealing work. They sort of do their own dispatching to themselves, so the threads that have the cpu do the work of the dispatching. It also offloads the dispatching work from the main thread. In my system, the workers do all work they can that's known to be depency-okay. Once that work is exhausted they reevaluate dependencies to see if more work can be done, so they do the dependency checking work for themselves off the main thread.

Another nice thing about work stealing is that it's self-balancing in the face of external activity. Anything running on a PC has to face lots of random CPU time being stolen. Even in a console environment you have to deal with the other threads taking variable amounts of time. For example if you have 6 cores, you want 6 workers threads. But you might also have 3 other threads, like a main thread, a gpu-feeding thread, and a sound thread. The main thread might usually take 90% of the cpu, so the worker on that core rarely gets any time, the gpu-feeder might usually take 50% of the cpu time on that thread, but in phases, like it takes 100% of the cpu for half the frame then goes idle for the other half. With work stealing your worker thread will automatically kick in and use that other time.

In order for the self balancing to work as well as possible you need small worklets. In fact, the possible wasted time is equal to the duration of the longest task. The time waste case happens like this :


Fire N work items , each taking time T to N cores
For some reason one of the cores is busy with something else so only (N-1) of them do work
Now you block on the work being done and have to wait for that busy core, so total time is 2T

Total time should have been N*T/(N-1)
For N large the waste approaches T

Basically the smaller T is (the duration of longest work) the more granular the stealing self-allocation is. Another easy way to see it is :

You are running N workers and a main thread
The main thread takes most of the time on core 0
You fire (N-1) very tiny work items and the (N-1) non-main cores pick them up
You fire a very large work item and core 0 worker picks it up

That's a disaster. The way to avoid it is to never fire single large work items - if you would have split that into lots of little work items it would have self-balanced, because the stealing nature means that only worker threads that have time take tasks.

For example, with something like a DXTC encoder, rather than split the image into something like N rectangles for N cores and fire off 1 work item to each core, you should go ahead and split it into lots of tiny blocks. Of course this requires that the per-work-item overhead is extremely low, which of course it is because we are all lock-free and goodness.

There are some things I haven't done yet that I might. One is to be a bit smarter about the initial work dispatching, try to assign to the CPU that will have the most idle time. If you actually did make nice tiny worklets all the time, that wouldn't be an issue, but that's not always possible. In the case that you do make large work items, you want those to go the cores that aren't being used by other threads in your game.

Another issue is the balance of throughput vs latency. That is, how fast does the system retire work, vs. how long does it take any individual work item to get through. Currently everything is optimized for throughput. Work is done in a roughly FIFO order, but with dependencies it's not gauranteed to be FIFO, and with the work stealing and variations in CPU Time assignment you can have individual work items that take a lot longer to get through the system than is strictly necessary. Usually this isn't a big deal, but sometimes you fire a Worklet and you need it to get done as quickly as possible. Or you might need to get done inside a certain deadline, such as before the end of the frame. For example you might fire a bunch of audio decompression, but set a deadline to ensure it's done before the audio buffers run out of decompressed data. Handling stuff like that in a forward-dispatched system is pretty easy, but in work-stealing it's not so obvious.

Another similar issue is when the main thread decides to block on a given work item. You want that item to get done as soon as possible by the thread that has the highest probability of getting a lot of CPU time. Again not easy with work-stealing since some worker thread may have that item in its queue but not be getting much CPU time for some reason.

old rants