11-11-08 - REYES

Gritz' course on Renderman : To Infinity and Beyond is the best reference I've found on REYES. I'm still curious about the details of how to efficiently do the part from micropolygons to pixel colors.

The brute force way takes a micropolygon and puts a bound on it that cover its entire temporal sweep across the current frame. That is then binned into all the pixels it touches. In each pixel, 16 (or so) supersampling locations are checked with precomputed jittered sample locations and times. At each spot where a sample is found, the color & Z are added to a list for later blending. Note with fast moving objects and depth of field, it can get really inefficient, because micropolygons have to get tested in many pixels.

The Pixar Library is a nice collection of papers. There's a new paper I hadn't seen on point-based GI that's based on the Bunell disk method for realtime GI but enhanced for better quality. It seems extremely simple and the results look good. It's not super accurate in terms of solving the rendering equation with minimal error, but that's not really what we want anyway. Though actually now that I think about it it's just a variant on the ancient "render the world from each surfel's point of view" method for GI and doesn't really add much.

Anyway, back to REYES. Ignoring temporal sampling, the spatial sampling stuff can all be done with a normal rasterizer I'm pretty sure. The micropolygon subdivision and binning is just your fragment rasterizer (a rasterizer is just a device that creates micropolygons that are at most pixel size and makes a grid that's aligned with the screen grid). When you decide a fragment goes in a pel, you don't just stuff the color in the frame buffer, instead you stick it in the list to sample against the stochastic supersampled grid test just like REYES.

But it occurs to me that if you're building up a list of fragments in each pel like this, you may as well do analytic coverage than stochastic sampling. When you rasterize your triangles, compute the area of each pel that's covered. This can be done efficienctly incrementally just like a Wu antialiased line drawer for the case of only one triangle edge interesting a pel, if 2 or 3 edges intersect one pel it's more work.

Add the fragment to the pel list, consisting of {color, alpha, Z, area}. Once you have all these fragments sort them back-to-front and accumulate into the framebuffer. Fragments that have the same Z are first gathered together before being alpha-blended onto the current value in the frame buffer. That means that two triangles that share an edge act like a continuous surface and will correctly give you 100% area covers and will correctly not alpha blend with each other. This gives you exact anti-aliasing with alpha blending.


castano said...

Have you read the stochastic rasterization paper?


What you describe sounds like NVIDIA's coverage sampled antialiasing. The only difference is that NVIDIA computes the coverage numerically, while you propose to do it analytically.


The current ratio between coverage samples and color samples is 1:4, so at this point CSAA is cheaper than the analytic approach. I'm not sure at what point the analytic approach may become attractive enough, though.

Storing multiple fragments per pixel is something that is possible in DX11 already, since fragment shaders have scatter and gather operations.

cbloom said...

I'll reply more later, but -

it's really fucking annoying that when I google for "coverage sampled aa" I get 99% fanboy sites about quality compare screenshots and hardware reviews and driver settings.

You should have to write heapsort or something to get write access to the internet.

cbloom said...

Hmm. I can't really figure out how CSAA works. In a typical mode, it's got 4 color & Z samples and 16 coverage samples.

What does it do with partial coverage on samples? Say I draw a triangle that only partially covers a Z/color sample, do I replace the existing Z? do I test against it? After I draw, I should have different Z's for the different coverage points but I don't have enough Z samples, what do I do?

BTW I found this paper which is not directly related but kind of interesting -

Efficient Hardware for Antialiasing Coverage Mask Generation

won3d said...

That's a great paper. I remember reading that one a while back. A good adjunct to the hierarchical tiling paper: www.cs.princeton.edu/courses/archive/spring01/cs598b/papers/greene96.pdf

I'm trying to understand CSAA now, too. At one point, I thought I understood it, but now I'm not so sure. Clearly it is approximate. Each time a primitive is rasterized into sample (as in MSAA), you also get a coverage bit mask. You can compare the bit masks and do the corresponding porter-duff blend based on the src/dst color/coverage, and whether depth test passed/failed. It would probably work assuming you were rendering in roughly depth-sorted order.

Of course, I could be completely off.

castano said...

Each coverage sample is associated to the nearest Z/color sample. The coverage bits are set only when the Z/color sample is covered. When downsampling, the color samples are weighted according to their respective coverages.

If you google for "virtual coverage anti-aliasing" you will find more detailed information about the technique. It's unfortunate that's the only document that describes it in detail. :(

cbloom said...

The stochastic rasterization stuff does in fact seems like a good way to do motion blur and DOF, but it seems like an awful lot of expensive hardware just for blurs.

won3d said...

Thanks for clarifying.

It is interesting how CSAA uses neighboring pixels to do the multisample resolve. Neat.

Sean Barrett said...

I'd always assumed that CSAA would just be the "best 4 samples", e.g. you have 4 color/z data items, and a (theoretical) 16-bit mask for each one. When you add a color/z to the set, you clear the matching bits of anything it's in front of, and clear its own bits for anything it's behind.

If anything's bitmask goes to 0, you discard it entirely.

If you end up with 5 items in the set, you discard the one with the fewest samples (or I guess you could do something like merging a pair of them, but that's pretty messy, since you want to merge a pair with nearby z values, but also nearby locations).

I would tend to assume that once you have more than 2 color/Z samples with a system like this, it doesn't matter that much, since now you're looking at a 3-way join where the "right" answer isn't going to be as obviously distinguishable (because its 3-way-ness makes it a discontinuous point anyway). But I suppose there's some subtler case, especially with distant/thin stuff.

Anyway, but I guess that's not what CSAA actually is. since I guess what I just described would be way more a PITA to implement (you'd need to store a little 2-bit id in each coverage slot, and do a bunch more work).

old rants