10/18/2010

10-18-10 - Frustum and RadiusInDirection

Ryg has a note on Zeux' Frustum culling series , both are good reads.

The thing that struck me when reading it is that you can get almost directly to the fastest possible code from a very generic C++ framework.

The way I've done frustum culling for a while is like this :


template < class t_Volume >
Cull::EResult Frustum::Cull(const t_Volume & vol) const
{   
    Cull::EResult eRes = Cull::eIn;

    for(int i=0;i < m_count;i++)
    {
        const Plane::ESide eSide = PlaneSide(vol,m_planes[i]);
        
        if ( eSide == Plane::eBack )
            return Cull::eOut;
        else if ( eSide == Plane::eIntersecting )
            eRes = Cull::eCrossing;
        // else Front, leave eRes alone
    }

    return eRes;
}

now any primitive class which implements "PlaneSide" can be culled. (Cull is trinary - it returns all in, all out, or crossing, similarly PlaneSide is trinary, it returns front or back or crossing).

Furthermore, PlaneSide can be overridden for classes that have their own idea of how it should be done, but almost always you can just use the generic PlaneSide :


template < class t_Volume >
Plane::ESide PlaneSide(const t_Volume & vol, const Plane & plane) const
{
    const float dist = plane.DistanceToPoint( vol.GetCenter() );
    const float radius = vol.GetRadiusInDirection( plane.GetNormal() );

    if ( dist > radius )
        return Plane::eFront;
    else if ( dist < - radius )
        return Plane::eBack;
    else
        return Plane::eIntersecting;
}

For a volume to be generically testable against a plane it has to implement GetCenter() and GetRadiusInDirection().

GetRadiusInDirection(dir) tells you the half-width of the span of the volume along direction "dir". The neat thing is that GetRadiusInDirection turns out to be a pretty simple and very useful function for most volumes.

Obviously the implementation for Sphere is the simplest, because RadiusInDirection is the same in all directions :


Sphere::GetCenter() { return m_center; }
Sphere::GetRadiusInDirection() { return m_radius; }

for an AxialBox, if you store your box as {m_center, m_halfExtent} so that min & max are { m_center - m_halfExtent, m_center + m_halfExtent } then the implementation is :

AxialBox::GetCenter() { return m_center; }
inline float AxialBox::GetRadiusInDirection(const Vec3 & dir) const
{
    return 
         fabsf(dir.x) * m_halfExtent.x +
         fabsf(dir.y) * m_halfExtent.y +
         fabsf(dir.z) * m_halfExtent.z;
}

if you now compile our test, everything plugs through - you have Ryg's method 4c. (without the precomputation of absPlane however).

Of course because we are generic and geometric, it's obvious how to extend to more primitives. For example we can make our Frustum work on oriented bounding boxes just like this :


float OrientedBox::GetRadiusInDirection(const Vec3 & dir) const
{
    const float radius = 
        fabsf((dir * m_axes.GetRowX()) * m_radii.x) +
        fabsf((dir * m_axes.GetRowY()) * m_radii.y) +
        fabsf((dir * m_axes.GetRowZ()) * m_radii.z);

    return radius;
}

(I store my OrientedBox as an m_center, an orthonormal rotation matrix m_axes, and the extent of each of the 3 axes in m_radii ; obviously you could speed up this query by storing the axes scaled by their length, but that makes some of the other operations more difficult so YMMV).

Similarly tests against cylinders and lozenges and k-Dops or convex hulls or what-have-you are pretty straightforward.

We can use GetRadiusInDirection for other things too. Say you want to find the AxialBox that wraps the old AxialBox but in a new rotated orientation ? Well with GetRadiusInDirection it's very obvious how to do it - you just take the new basis axes and query for the slab spans along them :


AxialBox AxialBox::Rotate( const Matrix & mat ) const
{
    AxialBox ret;
    ret.m_center = m_center;

    ret.m_halfExtent.x = GetRadiusInDirection( mat.GetColumnX() );
    ret.m_halfExtent.y = GetRadiusInDirection( mat.GetColumnY() );
    ret.m_halfExtent.z = GetRadiusInDirection( mat.GetColumnZ() );

    return ret;
}

And you will find that this is exactly the same as what Zeux works out for AABB rotation . But since we are doing this with only GetCenter() and GetRadiusInDirection() calls - it's obvious that we can use this to make an AxialBox around *any* volume :

template < class t_Volume >
AxialBox MakeAxialBox( const t_Volume & vol, const Matrix & mat )
{
    AxialBox ret;
    ret.m_center = vol.GetCenter();

    ret.m_halfExtent.x = vol.GetRadiusInDirection( mat.GetColumnX() );
    ret.m_halfExtent.y = vol.GetRadiusInDirection( mat.GetColumnY() );
    ret.m_halfExtent.z = vol.GetRadiusInDirection( mat.GetColumnZ() );

    return ret;
}

The nice thing about generic programming is it gives you a set of interfaces which provide a contract for geometric volumes, and then anything that implements them can be used in certain functions. You wind up doing this kind of thing where you write a routine just for AxialBox rotations, but then you see "hey I'm only using calls that are in the generic Volume spec so this is general".

Now I'm not claiming by any means that you can make C++ generic templates and they will be competitive with hand-tweaked code. For example in the case of AABB vs. Frustum you probably want to precompute absPlane , and you probably want to special case to a 5-plane frustum and unroll it (or SIMD it). Obviously when you want maximum speed, you want to look at the assembly after the C++ compiler has had its turn and make sure it's got it right, and you may still want to SIMD and whatever else you might want to do.

But as in all optimization, you want to start your assembly work from the right algorithm, and often that is the one that is most "natural". Interestingly the interfaces which are most natural for generic programming are also often ones that lead to fast code (this isn't always the case, just like the true physics equations aren't always beautiful, but it's a good starting point anyway).

BTW a few more notes on Frustum culling. Frustum culling is actually a bit subtle, in that how much work you should do on it depends on how expensive the object you are culling is to render. If the object will take 0.0001 ms to just render, you shouldn't spend much time culling it. In that case you should use a simpler approximate test - for example a Cone vs. Sphere test. Or maybe no test at all - combine it into a group of objects and test the whole group of culling. If an object is very expensive to render (like maybe it's a skinned human with very complex shaders), it takes 1 ms to render, then you want to cull it very accurately indeed - in fact you may want to test against an OBB or a convex hull of the object instead of just an AABB.

Another funny thing about frustum culling in game usage is that the planes are not all equal. That is, our spaces are not isotropic. We usually have lots of geometry in the XY plane and not much above or below you. You need to take advantage of that. For example using an initial heirarchy in XY can help. Or if you are using a non-SIMD cull with early outs, order your plane tests to maximize speed (eg. the top and bottom planes of the frustum should be the last ones checked as they are least important).

6 comments:

ryg said...

Addendum to the addendum: You want GetRadiusInDirection anyway (together with a "GetCenterInDirection", which reduces to GetCenter for most symmetric objects) since that's the primitives you need to determine the support of a convex object along an arbitrary axis - an essential ingredient to both Separating Axis Tests for overlap detection and the GJK algorithm for more accurate collision detection.

There's still some details you need to handle yourself (e.g. figure out which axes to test for separation), but other than that, you get very nice and uniform code out of this.

cbloom said...

Good point. Of course this type of PlaneSide test is just a separating axis test (the only axis for a plane is its normal).

BTW I didn't see either of your mention the old Quake-style AABB-Plane test. In some cases it can be faster.

You store your AABB as min and max in an array : m_ends[2]

For the Planes you precompute whether to check min or max for each component (from the sign of plane normal), stored as ix,iy,iz (0 or 1).

Then the test is :

d = plane.w + plane.x * m_ends[ plane.ix ].x + plane.y * m_ends[ plane.iy ].y + plane.z * m_ends[ plane.iz ].z

I believe this is equivalent as your Method 5 of precomputing the sign flip mask.

The details of whether its faster or slower will depend on your platform, whether a dependent load hurts or whether mixing int & float math hurts.

Similar technique of precomputing an index for a test is used in voxel raytracing, etc.

nothings said...

I don't really buy the claim that this arises "naturally" (i.e. "automatically") from using some kind of type-based dispatch mechanism.

It certainly works well if you happen to use a design with GetRadiusInDirection. But I'm highly doubtful you (or at least the average developer) would ever have defined GetRadiusInDirection as part of the API unless you already knew it was an important thing to have for fast implementation of separating planes or plane-sidedness testing.

Again, I'm not denying it maps cleanly, and that that is cool. Just the seeming implication that OO saves you having to figure out what the fast method is in the first. Maybe I'm misinterpreting what you wrote, though.

Tom said...

Maybe there's some more subtle point at work, but I interpreted that part of the post as simply being about how the compiler flattens all the templates out at compile time. So you can just write the natural routine, in a natural sort of way, and you'll end up with as many instantiations as you need, each one compiled as if you'd written it out by hand. So, at best, you end up with something pretty good at the end, and you've written M+N parts rather than M*N.

And at worst, you end up with something more like boost... That takes a bit of effort, though! -- for most code, any templates required are usually straightforward.

(You could do something similar with macros if you were doing this in C, I guess, but they don't interact as well with debuggers, and keeping on top of everything (\s everywhere, and the occasionally-surprising argument expansion rules) can get a bit much.)

cbloom said...

" I don't really buy the claim that this arises "naturally" (i.e. "automatically") from using some kind of type-based dispatch mechanism. "

No no, I don't mean "natural" as in the templates or the C++ junk, I mean "natural" as in the mathematical choice of variables and way of expressing the mathematics.

This is a very common theme in physics but I guess it isn't that well known in CS.

In physics there's this sort of superstition (which turns out to be true very often so I guess its more of a rule of thumb) that if you simply phrase your problem in the correct coordinates and with the right symmetries, then the simplest, most beautiful way to solve it pops right out.

A lot of problems in physics are a tedious nightmare to solve if you just go about them in a brute force way from cartesian coordinates, but they can become super simple if you find the "natural" way of writing them.

eg. if you're trying to solve a field equation, if you first transform the field into a basis where the basis functions have the same symmetry as the problem, the math becomes very easy.

The thing I observed is that the right way to make an interface for Generic Programming is to think about what the "natural" mathematical way is to talk to these different objects. And it just so turns out that the natural way is often efficient computationally.

cbloom said...

For example I've often seen this with Dave Eberly's geometry intersection stuff.

He tends to find intersections by just plugging in the equations for the primitives and finding where they are equal.

This often leads to very ugly and inefficient code.

If instead you form equations based on the symmetries and geometric tests, you wind up with code that is usually (*) not only much cleaner and easier to read but also faster.

(* = obviously this is not always true, it's just a rule of thumb, there's no gaurantee that the elegant way of writing something is actually the most efficient)

old rants