cbloom rants

3/18/2010

03-18-10 - Physics

Gravity is a force that acts proportionally to the two masses. (let's just assume classical Newtonian gravity is in fact the way the universe works)

People outside of science often want to know "but why?" or "how exactly? what is the mechanism? what carries the force?" . At first this seems like a reasonable question, you don't just want to have these rules, you want to know where they come from, what they mean exactly. But if you think a bit more, it should be clear that these questions are absurd.

Let's say you know the fundamental physical laws. These are expressed as mathematical rules that tell you the behavior of objects. Say for example we lived in a world with only Newtonian dynamics and gravity and that is all the laws. Someone asks "but what *is* gravity exactly?". I ask : How could you ever know? What could there ever be that "is" gravity? If something was faccilitating the force of gravity, there would have to be some description of that thing, some new law to describe it. That would mean some new rule to describe this thing that carried gravity. Then you would ask "well where does this rule for the carrier of gravity come from?" and you would need a new rule. Say you said "gravity is carried by the exchange of gravitons" ; then of course they could ask "why is there a graviton, what makes gravitons exactly, why do they couple in this way?" etc.

The fundamental physical laws cannot be explained by anything else.

That's almost a tautology because that's what I mean by "fundamental" - you take all the behavior of the universe, every little thing, like "I pushed on this rock with 10 pounds of force and it went 5 meters per second". You strip away every single law that can be explained with some other law. You strip and strip and finally you are left with a few laws that cannot be explained by anything else. These are the fundamental laws and there is no "why" or "how" for them. In fact the whole human question of "how" is imprecise; what we really should say is "what simpler physical law can explain this phenomenon?". And at some point there is no more answer to that.

Of course this is assuming that there *is* a fundamental physical law. Most physicists assume that to be true without questioning it, but I wrote here at cbloom.com long ago that in fact the set of physical laws might well be infinite - that is, maybe we will find some day that the electrical and gravitational force can be explained in terms of some new law which also adds some new behaviors at very small scale (if it didn't add new behaviors it would simply be a new expression of the same law and not count), and then maybe that new law is explained in terms of another new law which also adds new behaviors, etc. ad infinitum - a russian doll of physcial laws that never ends. This is possible, and furthermore I contend that it is irrelevant.

There is a certain human need to know "why" the physical laws are as they are, or to know the "absolute" "fundamental" laws - but I don't believe there's really much merit to that at all. What if they do finally work out string theory, and it explains all known phenomena for a while, but then we find that there is a small error in the mass of the Higgs Boson on the order of ten to the minus one billion, which tells us there must be some other physical law that we don't yet know. The fact that string theory then is only a very good model of the universe and not the "absolute law" of the universe changes nothing except our own silly human emotions in response to it (and surely crackpots would rise up and say that since it's not "100% right" then there must be angels and thetans at work).

What if we found laws that explained all phenomena that we know of perfectly. We might well think those laws are the "absolute fundamental" laws of the universe. But how would we ever know? Maybe there are other phenomena that can't be explained by those laws that we simply haven't seen yet. Maybe those other phenomena could *never* be seen! (for example there may be another entire set of particles and processes which have zero coupling to our known matter). The existance of this unexplained phenomena does not reduce the merit of the laws you know, even though they are now "not complete" or "don't describe all of nature".

It's funny to think about how our intuition of "mass" was screwed up by the fact that we evolved on the earth in a high gravity environment where we inherently think of mass as "weight" - eg. something is heavy. There's this thing which I will call K. It's the coefficient of inertia, it's how hard something is to move when you apply a certain force to it. F = K A if you will. Imagine we grew up in outer space with lots of large electrical charges around. If we apply an electric field to two charges of different K, one moves fast and one moves slow, the difference is the constant K. It's a very funny thing that this K, this resistance to changes of motion, is also the coupling to the gravitational field.

3/10/2010

03-10-10 - Distortion Measure

What are the things we might put in an ideal distortion measure? This is going to be rather stream of conscious rambling, so beware. Our goal is to make output that "looks like" the input, and also that just looks "good". Most of what I talk about will assume that you are running "D" on something like 4x4 or 8x8 blocks and comparing it to "D" on other blocks, but of course it could be run on a gaussian windowed patch, just some way of localizing distortion on a region.

I'm going to ignore the more "macroscopic" issues of which frame is more important than another frame, or even which object within a frame is more important - those are very important issues I'm sure, but they can be added on later, and are beyond the scope of current research anyway. I want to talk about the microscopic local distortion rating D. The key thing is that the numerical value of D assigns a way to "score" one distortion against another. This not only lets you choose the way your error looks on a given block (choosing the one with lowest score obviously), it also determines how your bits are allocated around the frame in an R/D framework (bits will go to places that D says are more important).

It should be intuitively obvious that just using D = SSD or SAD is very crude and badly broken. One pixel step of numerical error clearly has very different importance depending on where it is. How might we do better ?

1. Value Error. Obviously the plain old metric of "output value - input value" is useful even just as a sanity check and regularizer ; it's the background distortion metric that you will then add your other biasing factors to. All other things being equal you do want output pixels to exactly match input pixels. But even here there's a funny issue of what measure you use. Probably something in the L-N norms, (L1 = SAD, L2 = SSD). The standard old world metric is L2, because if you optimize for D = L2, then you will minimize your MSE and maximize your PSNR, which is the goal of old literature.

The L-N norms behave differently in the way they rate one error vs another. The higher N is, the more importance it puts on the largest error. eg. L-infinity only cares about the largest error. L-2 cares more about big errors than small ones. That is, L2 makes it better to change 100->99 than 1->0. Obviously you could also do hybrid things like use L1 and then add a penalty term for the largest error if you think minimizing the maximum error is important. I believe that *where* the error occurs is more important than what its value is, as we will discuss later.

2. DC Preservation. Changes in DC are very noticeable. Particularly in video, the eye is usually tracking mainly one or two foreground objects; what the means is that most of the frame we are only seeing with our secondary vision (I don't know a good term for this, it's not exactly peripheral vision since it's right in front of you, but it's not what your brain is focused on, so you see it at way lower detail). All this stuff that we see with secondary vision we are only seeing the gross properties of it, and one of those is the DC. Another issue is that if a bunch of blocks in the source have the same DC, and you change one of them in the output, that is sorely noticeable.

I'm not sure if it's most important to preserve the median or the mean or what exactly. Usually people preserve the mean, but there are certainly cases where that can screw you up. eg. if you have a big field of {80} with a single pixel spike on it, you want to preserve that background {80} everywhere no matter what the spike does in the output. eg. {80,80,255,80,80} -> {80,80,240,80,80} is better than making it go -> {83,83,240,83,83} even though the latter has better mean preservation.

3. Edge Preservation. Hard edges, especially long straight lines or smooth curves, are very visible to humans and any artifact in them stands out. The importance of edges varies though; it has something to do with the length of the edge (longer edges are more major visual features) and with the contrast range of the region around the edge : eg. an edge that separates two very smooth sections is super visible, but an edge that's one of many in a bunch of detail is less important (preserving the detail there is important, but the exact shape of the edge is not). eg. a patch of grass or leaves might have tons of edges, but their exact shape is not crucial. An image of hardwood floor has tons of long straight parallel edges and preserving those exactly is very important. The border between objects is typically very important.

Obviously there's the issue of keeping the edges that were in the original and also the issue of not making new edges that weren't in the original. eg. introducing edges at block boundaries or from ringing artifacts or whatever. As with edge preservation, the badness of these introduces edges depends on the neighborhood - it's much worse to make them in a smooth patch than once that's already noisy. (in fact in a noisy patch, ringing artifacts are sort of what you want, which is why JPEG can look better than naive wavelet coders on noisy data).

4. Smooth -> Smooth (and Flat -> Flat). Changing smooth input to not smooth is very bad. Old coders failed hard on this by making block boundaries. Most new coders now handle this easily inherently either because they are wavelet or use unblocking or something. There are still some tricky cases though, such as if you have a smooth ramp with a bit of gaussian noise speckle added to it. Visually the eye still sees this as "smooth ramp" (in fact if you squint your eyes the noise speckly goes away completely). It's very important for the output to preserve this underlying smooth ramp; many good modern coders see the noise speckle as "detail" that should be preserved and wind up screwing up the smooth ramp.

5. Detail/Energy Preservation. The eye is very sensitive to whether a region is "noisy" or "detailed", much more so than exactly what that detail is. Some of the JPEG style "threshold of visibility" stuff is misleading because it makes you think the eye is not sensitive to high frequency shapes - true, but you do see that there's "something" there. The usual solution to this is to try to preserve the amount of high frequency energy in a block.

There are various sub-cases of this. There's true noise (or real life video that's very similar to true noise) in which case the exact pixel values don't matter much at all as long as the frequency spectrum and distribution of the noise is reproduced. There's detail that is pretty close to noise, like tree leaves, grass, water, where again the exact pixels are not very important as long as the character of the source is preserved. Then there's "false noise" ; things like human hair or burlap or bricks can look a lot like noise to naive analysis metrics, but are in fact patterned texture in which case messing up the pattern is very visible.

There are two issues here - obviously there's trying to match the source, but there's also the issue of matching your neighbors. If you have a bunch of neighboring source blocks with a certain amount of energy, you want to reproduce that same patch in the output - you don't want to have single blocks with very different energy, because they will stand out. Block energy is almost like DC level in this way.

6. Dynamic range / sdev Preservation. Of course related to previous metrics, but you can definitely see when the dynamic range of a region changes. On an edge it's very easy to see if a high contrast edge becomes lower contrast. Also in noise/detail areas the main things you notice are the DC, the amount of noise, and the range of the noise. One reason its so visible is because of optical fusion and affects on DC brightness. That is, if you remove the bright specks from a background it makes the whole region look darker. Because of gamma correction, {80,120} is not the same brightness as {100,100}. Now theoretically you could do gamma-corrected DC preservation checks, but there are difficulties in trying to be gamma correct in your error metrics since the gamma remapping sort of does what you want in terms of making changes of dark values relatively more important; maybe you could do gamma-correct DC preservation and then scale it back using gamma to correct for that.

It's unclear to me whether the important thing is the absolute [low,high] range, or the statistical width [mean-sdev,mean+sdev]. Another option would be to sort the values from lowest to highest and look at the distribution; the middle is the median, then you have the low and high tails on each side; you sort of want to preserve the shape of that distribution. For example the input might have the high values in a kind of gaussian falloff tail with most values near median and fewer as it gets higher; then the output should have a similar distribution, but exactly matching the high value is not important. The same block might have all of its low values at exactly 0 ; in that case the output should also have those values at exactly 0.

Whatever all the final factors are, you are left with how to scale them and combine them. There are two issues on scaling : power and coefficient. Basically you're going to combine the sub-distortions something like this :


D = Sum_n { Cn * Dn^Pn }

Dn = distortion sub type n
Cn = coefficient n
Pn = power n

The power Pn lets you change the units that Dn are measured in; it lets you change how large values of Dn contribute vs. how small values contribute. The cofficient Cn obviously just overall scales the importance of each Dn vs. the other D's.

It's actually not that hard to come up with a good set of candidate distortion terms like I did above, the problem is once you have them (the various Dn) - what are the Cn and Pn to combine them?

3/08/2010

03-08-10 - Distortion and Bit Allocation

I now know that rate allocation is by far the most important thing in video. It's obviously important in a lot of things, but in video you just have so many bits and so much flexibility in where you put them, and there are lots of psychovisual phenomena that don't exist in images (due to motion, eye adaptation, feature tracking, etc. because the eye notices changes over time, etc). In fact I conjecture that you could take a really shitty old coder like MPEG2 and make videos that beat anything currently in existance with just better rate allocation.

What can rate allocation do ?

1. Move bits to the source of predictions. That is, code some frame (or part of a frame) better than normal because it will be used as a mocomp source in the future. This is actually a purely mathematical win and would apply without any psychovisual consideration. A lot of people do this in semi-heuristic ways, but of course those can make lots of mistakes (for example there may be cases where increasing the bit assignment to a block might actually make it a worse source for the future, eg. the future might be a better match to the block with more distortion; also starving the future might cause it to no longer choose that block as a source, etc). Some people move bits around while holding all the block mode decisions and movecs constant, which at least lets you converge, but of course you should consider all possible bit moves and all possible mode changes.

2. Move bits from frame to frame to make some frames look better and some look worse. Move bits around the frame to make parts look better and parts look worse. In general choosing where to put your error.

There's also a related issue which is not exactly rate allocation but is very similar. In lossy coders like video coders you often you have a choice of what your error looks like. That is, for the same distortion (in a numerical sense) you could make different shapes of error, through choosing different block modes, choosing different movecs, or more globally choosing quantizers or quantization matrices. This often ties into rate allocation because it involves how you make your free choices in the encoder :

3. What the distortion looks like. In particular, if you make some amount of error (in an SSD or SAD sense (aka L2 or L1 norm)) what does that error look like? what is the shape of it?

Now, in a lagrangian framework the main thing driving all these decisions is just the D metric in J = R + lambda D. If you change D, it changes where bits get put. D determines how important you think one type of error is vs. another type of error.

Just as an example, say you ran face detection on your video, then you could assign face regions to all your frames, and any error in the face region could be counted as extra important - if you put this into your "D" metric, then the lagrangian coder automatically gives those areas more bits. But that kind of example is rather banal. There are obviously tons of human error-importance issues that you could try to account for, having to do with what objects are most important in the frame, where the motion is, what kind of errors are particularly appalling, etc etc.

Purely numerical error distribution can be important : say you have an error of 3 somewhere and an error of 20 somewhere else. You have bits to change each by 1. Should you change the 3 to 2 or the 20 to 19 ? Well, it depends on their neighborhoods, but I think more often than not you should do the 3->2. That will be more visually noticeable. Using L1 or L2 (or L-N for whatever other N's) causes you to make different decisions in these cases. Most simplistically you can see it as a continuum between minimizing the total abs error (L1) vs minimizing the maximum error (L-infinity). That is, the issue of whether you have clumpy error or spread out error is a pretty big one.

The thing holding back development is a lack of a procedure for measuring real "quality". The problem is changing distortion to change your bit allocation for psychovisual purposes will by definition hurt your abstract measures. (hacky changes to D might hurt RMSE but help SSIM, but in that case I would say some of the change was not "psychovisual" - the part of the change which helps SSIM is in fact an analytical change to improve a certain metric). At some point you have to be able to make a decision that you will allocate bits in such a way that your video will look worse to computers, but will look better to humans. (with our current shitty computer analysis models).

x264 and others have a bit of a solution for this - they use a kind of "crowd sourcing" (bleck web 2.0 buzz word, I feel like I just vomitted in my own mouth a little). They can put beta features in their code and they have mobs of fan-boys who will download betas and try them on lots of videos and then post results on the forums. This gives you lots of real human eyes saying "this looks better" or not for attempts at psychovisual. But I don't think you can really make big developments using that technique - you can only make small heuristic stabs in the dark and then find out if they were okay, because the turnaround time for results from the crowd is too long, and if you release too many dead ends for them to test they will stop doing it, so you have to be reasonably sure it is a good change before publishing it to the crowd, etc. It's not the kind of thing a researcher needs, which is a black box where I can throw videos and say "which looks better to a human".

The result is that we are mostly stabbing in the dark and occasionally getting lucky.

3/03/2010

03-03-10 - Image Compresson - Color , ScieLab - Part 2

Follow up to the last post on color .

First a correction : what I said about downsampling there is mostly wrong. I made the classic amateur's blunder of testing on too small a data set and drawing conclusions from it. I'm a little embarassed to make that mistake, but hey this is a blog not a research journal. Any expectations of rigor are unfounded. For example this is one of the test images I ran on that convinced me that downsample was bad :


aikmi
-i7 qtable ; CoCg optimized joint for min SCIELAB

downsample :

   262,144 ->    32,823 =  1.001 bpb =  7.986 to 1 (per pixel)
Q : 11.0000  Co scale = Cg Scale = 1.525
bits DC : 19636|5151|3832 , bits AC : 175319|38483|19879
bits DC = 10.9% bits AC = 89.1%
bits Y = 74.3% bits CoCg = 25.7%
rmse : 7.3420 , psnr : 30.8485
ssim : 0.9134 , perc : 73.3109%
scielab rmse : 2.200

no downsample :

   262,144 ->    32,679 =  0.997 bpb =  8.021 to 1 (per pixel)
Q : 12.0000  Co scale = Cg Scale = 0.625
bits DC : 19185|13535|9817 , bits AC : 160116|39407|19091
bits DC = 16.3% bits AC = 83.7%
bits Y = 68.7% bits CoCg = 31.3%
rmse : 6.9877 , psnr : 31.2781
ssim : 0.9111 , perc : 72.9532%
scielab rmse : 1.980

you can see that downsample is just much worse in every way, including severely worse in SCIELAB which doesn't care about chroma differences as much as luma. In this particular image, there's a lot of high detail color bits, and the downsampled version looks significantly worse, it's easy to pick out visually.

However, in general this is not true, and in fact downsample is often a small win.

Without further ado I present lots of stats :

	i0 Cg=1 Co=1		i0 Cg = 0.6 Co = 0.575		i7 Cg = 0.6 Co = 0.575		i4/i7 opt per image		i7 CoCg optimized independently per image				i7 CoCg optimized jointly per image downsampled
file	rmse	scielab	rmse	scielab	rmse	scielab	rmse	scielab	Co	Cg	rmse	scielab	Co / Cg	rmse	scielab
kodim01	12.6809	4.8898	12.5848	4.8413	12.6567	4.3415	12.7018	4.238	0.455	0.455	12.623	4.3153	1.225	12.486	4.2525
kodim02	6.235	2.1961	6.1733	2.1793	6.2836	2.0519	6.2544	1.9542	0.58	0.58	6.2285	1.978	1.3375	6.4866	1.9841
kodim03	4.0098	1.7135	3.974	1.7173	4.0621	1.5587	3.9778	1.5883	0.705	0.83	4.0853	1.5359	1.6	4.1235	1.6102
kodim04	6.3981	2.4661	6.3658	2.4929	6.4083	2.2579	6.4083	2.2579	0.705	0.705	6.4092	2.248	1.5625	6.3698	2.1977
kodim05	14.2903	7.2293	14.0531	7.1756	14.1613	6.5253	14.2296	6.452	0.58	0.58	14.167	6.5291	1.5625	13.9658	6.4326
kodim06	8.9416	3.6338	8.836	3.5923	8.9622	3.2131	9.0316	3.1608	0.455	0.58	8.9664	3.2184	1.3	8.8455	3.1733
kodim07	5.147	2.316	5.1145	2.1919	5.2338	2.0167	5.2388	1.9815	0.58	0.58	5.202	2.0047	1.225	5.1601	1.9462
kodim08	14.6964	7.5082	14.5479	7.5237	14.5675	6.8769	14.6411	6.7521	0.58	0.83	14.5726	6.8285	1.4875	14.3053	6.692
kodim09	4.4789	1.8149	4.439	1.8574	4.5303	1.675	4.5303	1.675	0.705	0.955	4.5467	1.6359	1.4125	4.5389	1.6906
kodim10	4.9926	2.0932	4.9477	2.1196	5.0678	1.9887	5.0398	1.9514	0.58	0.955	5.0585	1.9109	1.6	5.0449	1.9556
kodim11	7.9484	3.2677	7.9006	3.2315	8.0441	2.9234	8.0441	2.9234	0.58	0.58	8.0478	2.9276	1.375	7.939	2.858
kodim12	4.6495	1.8486	4.6326	1.8529	4.7335	1.6862	4.7259	1.6663	0.58	0.705	4.7041	1.6776	1.2625	4.7001	1.6457
kodim13	18.5372	8.3568	18.3502	8.2634	18.5334	7.2841	18.6579	7.1262	0.455	0.58	18.5013	7.2697	1.1125	18.381	7.2327
kodim14	11.076	4.8628	10.972	4.7473	11.0146	4.3268	11.064	4.2636	0.58	0.58	11.0151	4.3308	1.3	10.9818	4.3614
kodim15	5.8269	2.4099	5.8082	2.4665	5.9134	2.2246	5.8383	2.2457	0.705	0.705	5.9158	2.2098	1.525	5.8699	2.1497
kodim16	5.689	2.3266	5.6289	2.3199	5.7372	2.0534	5.7372	2.0534	0.58	0.58	5.7373	2.055	1.375	5.6667	2.0276
kodim17	5.5166	2.3244	5.47	2.2994	5.6716	2.0774	5.5853	2.0874	0.455	0.705	5.6523	2.0574	1.4125	5.6014	2.037
kodim18	10.8501	4.8609	10.7131	4.7903	10.9517	4.3169	10.9639	4.2627	0.58	0.705	10.9266	4.3006	1.3375	10.8048	4.2189
kodim19	7.1545	2.8338	7.0872	2.8518	7.2311	2.4977	7.2637	2.4362	0.58	0.705	7.2158	2.4758	1.5625	7.1314	2.4396
kodim20	4.7872	1.8258	4.7183	1.8042	4.9208	1.6441	4.863	1.6524	0.455	0.83	4.9265	1.6306	1.1875	4.9427	1.656
kodim21	7.7757	3.3671	7.6338	3.3427	7.9293	3.0078	7.8541	3.0018	0.705	0.705	7.9204	2.95	1.3	7.7688	2.9302
kodim22	8.279	3.2205	8.1788	3.1253	8.3292	2.8656	8.3542	2.8114	0.455	0.58	8.3026	2.8379	1.45	8.267	2.8436
kodim23	3.917	1.5567	3.8968	1.5138	3.953	1.4315	3.961	1.4157	0.58	0.58	3.9481	1.4146	1.6	4.3382	1.573
kodim24	10.9877	5.2479	10.8105	5.0477	11.0256	4.6141	11.0435	4.5882	0.455	0.455	11.0413	4.6005	1.3375	10.9372	4.503
	194.86	84.17	192.84	83.35	195.92	75.46	196.01	74.54			195.71	74.94		194.65	74.41

explanation :

output bit rate 1 bpb in all cases
parameters are optimized to minimize E = ( 2 * SCIELAB + 1 * RMSE )
RMSE is on RGB
SCIELAB is perceptual color difference metric

i0 = flat quantization matrix
i7 = tweaked perceptual quantization matrix to minimize E
i4/i7 = optimized blend of flat to perceptual matrices


The table reads roughly left to right in terms of decreasing perceptual error.  

"i0 Cg=1 Co=1" : flat q-matrix, standard lossless YCoCg transform without extra scaling

"i0 Cg=0.6 Co=0.575" ; optimize CoCg scale for E ; interestingly this also helps RMSE

"i7 Cg=0.6 Co=0.575" ; non-flat constant Q-matrix ; hurts RMSE a bit, helps SCIELAB a lot

"i4/i7 opt per image" ; per-image non-flat Q-matrix ; not a big difference

"i7 CoCg optimized independently per image" : independently optimize Co and Cg for each image

"i7 CoCg optimized jointly per image downsampled" : downsample test, CoCg optimized with Co=Cg

On the full kodak set, downsampling is a slight net win. There are a few cases (kodim03,kodim23) where it hurts a lot like I saw before, but in most cases it is a slight win or close to neutral. The conclusion is that given the speed benefit, you should downsample. However there are occasional cases where it will hurt a lot.

I think most of the results are pretty intuitive and not extremely dramatic.

It's a little non-inuitive what exactly is going on with the per-image customized chroma scales. Your first thought might be "well those images have different colors in them, so the color space scale is adapting to the color content in the image". That's not so. For one thing, more or less content of a certain color doesn't mean you need a different color space - it just means that that band of the color space will get more energy, and thus more bits. e.g. an image that has lots of "Co" component colors will simply have more energy in the Co plane - that doesn't mean scaling Co either up or down will help it.

If you think about the scaling another way it's more obvious what's going on. Scaling the color planes is equivalent to using different quantizers per plane. Optimizing the scalings is equivalent to doing an R/D optimization of the quantizer of each plane. Thus we see what the scaling is doing : it's taking bits away from hard to code planes and moving them to easier to code planes (in an R/D slope sense).

In particular, when I visually inspected some of the more extreme cases (cases where the per-image optimized scales were a big win vs. a constant overall scale, such as kodik10) what I found was that the optimized scalings were taking bits *away* from the dominant colors. One very obvious case was on photos of the ocean. The ocean is mostly one color and is very hard to code (expensive in an R/D sense) because it's all choppy and random. The optimized scaling took bits away from the ocean and moved them to other colors that had more R/D payoff.

(BTW rambling a bit : I've noticed that x264 Psy VAQ tends to do the same kind of thing - it takes bits away from areas that are really noisy mess, such as water, and moves them to areas that have smooth pattern and edges. Intuitively you can guess if an area is a mess and just really hard to code then you should just say "fuck it" and starve it for bits even if MSE R/D tells you it wants bits. I think also that improving an area from an RMSE of 4 to 2 is better than improving from 10 to 7, even though it's less of a distortion win. Visually there's a bit difference that occurs when an area goes from "looks good" to "looks noisy" , but not much of a difference when an area goes from "looks bad" to "looks really bad").

So this is in fact not really a surprising result. We know already that heavy R/D bit allocation can do wonders for lossy compressors. That are lots more areas to explore - optimization of every coefficient in the quantization matrix, optimization of the color transform, optimization of the transform basis functions, etc. etc. - and in each case you need to be clever about the way you encode the extra rate control side information.

ADDENDUM : I thought I should write up what I think are the useful takeaway conclusions :

1. It is crucial to do the right kind of scaling to Co/Cg (or chroma more generally) depending on whether you downsample or not. In particular the way most people just turn downsample on or off and don't compensate by scaling chroma is a mistake, eg. not a fair comparison, because their scaling will be tuned for one or the other.

2. Downsample vs. no-downsample is pretty close to neutral. If you downsample for speed, that's probably fine. There are rare cases where it does hurt a whole lot though.

3. Using a non-flat Q matrix does in fact help perceptual quality significantly. And it doesn't hurt RGB RMSE nearly as much as it helps SCIELAB (helps SCIELAB by 10.35 % , hurts RMSE by 1.58 % ).

4. It does appear acceptable to use global tweaked values all the time rather than custom tweaking to each image. Custom tweaks do give you another bit of benefit, but it's not huge, thus not worth the very slow optimization step. (see DCTune eg)

2/23/2010

02-23-10 - Image Compresson - Color , ScieLab

The last time I wrote about anything technical it was to comment on image coding perceptual targets and chroma . Let's get into that a bit more.

There are these standard weapons available to us : 1. Colorspace transform (lossy or lossless) , 2. Relative scaling of color channels, 3. Downsampling , 4. Non-flat quantization matrices.

Many image compressors use some combination of these. For example, JPEG uses YCbCr colorspace, which has a built-in down scaling of the chroma channels, also optionally downsamples chroma, and also usually uses a very high-frequency-killing quantization matrix. The result is that chroma is attacked in many ways - the DC accuracy is destroyed by the scaling in the color conversion as well as the [0] entry of the quantization matrix, and high frequency info is killed both by downsampling and the high entries in the quantization matrix.

But is this good? Obviously it's all bad in terms of RMSE (* not completely true, but close enough), so we need something that approximates the human eye's less sensitie chroma receptors.

For a long time I put off this question because it seemed the only way to attack it was by showing a ton of images to test subjects and asking "is this better?". (Furthermore, there's the ugly problem that any perceptual metric is heavy tied to viewing conditions, and without knowing the viewing conditions you may be optimizing for the wrong thing). But maybe I found a solution.

Let me be clear briefly that I am here only trying to address the issue of how the human eye sees chroma vs luma. This is not a full "psychovisual perceptual metric" which would have to account for the brain identifying areas of noise vs. areas of smoothness, repeated patterns, linear ramps, etc. Basically the only thing I'm trying to capture here is the importance of luma bits vs. chroma bits.

Well, it turns out there's this thing from color research called SCIELAB . You may be familiar with "CIE LAB" aka the "Lab color space" which is considered to be pretty close to "perceptually uniform" , that is 1 unit of distance between two Lab colors has the same perceptual error importance no matter what the two colors are. Well SCIELAB is the extension of CIELAB to images (not just single colors). You can read the paper at that link (or see links below), but the basic thing it does is very simple :

SCIELAB takes the image and transforms it to "opponent color" (luma, red-green, and blue-yellow) , which is roughly the color space that eyes use to see light (rods see luma, cones see chroma) (note that here we are transforming "pixel values" into real light values, so we have to make an assumption about the brightness and color calibration of the viewing device). In opponent color space, each channel is filtered. The filter used for each channel represents the angular resolution that a rod or cone has. Basically this is a gaussian whose sdev is proportional to the angular resolution in that channel. This depends on the DPI of the viewing device and the viewing distance (eg. how many pixels fit into one degree at the eye). The gaussian is narrow for luma, indicating good precision, and wider for chroma. The filter also has a wide negative lobe around the center peak, which captures the fact that we see values as relative to their neighborhood - eg. 100 on a background of 10 looks brighter than 100 on a background of 50.

The gaussian filters represent the probability of a photon from a given pixel hitting and activating a rod or cone. The wider filters for chroma indicate that a half-toned image in red-green or blue-yellow will be indistiguishable from the original at a much shorter distance than a half-toned luma image.

One you do this filtering, you transform back to CIELAB and then you can just do a normal MSE to create a "delta E". (CIE also defines a more elaborate more uniform "delta E" metric for LAB , but for our purposes the plain L2 distance is very close and much simpler). The result is a "SCIELAB delta E" metric that is analytic and can be used in place of MSE for comparing images. Having this SCIELAB metric now lets us try various things and it tells us whether they are perceptually better or not (in terms of optical perception of color, anyway).

So far as I know this has never been used in the mainstream image compression literature ; the only place I found it was this Stanford school project tech report : Direction-Adaptive Partitioned Block Transform for Color Image Coding . This paper is pretty interesting; they aren't actually doing anything with the DA-PBT , they're just evaluating color spaces and how to do color coding starting with a grayscale image compressor.

Let's go through the EE398 paper in detail.

First they use YCbCr because they claim it produces better scielab results than RGB. True enough, but there were a lot of other color spaces to try. Furthermore, they don't mention this, but they are using the JPEG style YCbCr, which has a built in 0.5 scaling of the chroma channels (chroma should have a range of [-256,256] but JPEG offsets and scales to put it back into [0,256]) - they have effectively killed the chroma precision by using YCBCr.

They then look at whether sub-sampling helps or not. They find it to be roughly neutral - but when you try subsampling or not subsampling you should also try optimizing all other free options (scaling of the chroma channels, quantization matrix).

The most interesting part to me is "Rate Allocation". They try giving different fractions of the bit budget to Y or CbCr. They find that optimal delta E almost always occurs somewhere around Y bits = 66% of the total , that is the bit ratios are like [4:1:1]. In order to acheive this ratio they had to use small quantization step sizes for CbCr than Y, but that is an anomaly because of the fact that the YCbCr they use has killed the chroma - if you use a non-scaling YCbCr you would find that the chroma quantization values should be *larger* than luma to acheive the 66% bit allocation. (note that using different quantization values on each channel is equivalent to scaling the channels relative to each other).

They also found that using non-uniform quantization matrices (ala JPEG) hurt. I believe this was just an anomaly of their flawed testing methodology.

This paper was the most serious study of color in image compression that I've ever seen, but is still flawed in some simple ways that we can fix. The big problem is that they make the classic blunder of many people working in compression of optimizing parameters one by one. That is, say you have a compressor with options {A,B,C}. The blunderer finds the optimal value for option A and holds that fixed, then the optimal for B, then the optimal for C. They then try out some experimental new mode for step A, and their tests show it doesn't help - but they failed to retry every option for B and C in the new mode for A. eg. for example something like downsampling might hurt if you're using YCbCr, but say you use some other color space, or scale your colors in some way, or whatever, then downsampling might help and the result of doing all those steps together may be the best configuration.

Let's go back through it carefully :

First of all, the color conversion. Let me note that we use the color conversion in image compression for really two separate purposes which are mixed up. One use is for decorrelation (or energy compaction if you prefer) - this helps compression even for lossless mode. The second is for perceptual separation of chroma from luma so that we can smack the chroma around. Obviously here we need a color transform which gives us {luma/chroma} separation - that is, we cannot use something like the KLT which doesn't necessarilly have a perceptual "luma" axis.

From my earlier color studies, I found that YCoCg produces good results, usually within 1% of the best color transform on each image, so we'll just use that. But we will be careful and use a float <-> float YCoCg which doesn't scale any of the channels.

We will then scale Y relative to CoCg. This scaling is equivalent to variable quantizers and is (one of the ways) how we will control the bit allocation to Y vs. Chroma. This scaling gives you a difference in "value resolution" , it doesn't kill high frequencies.

You can then optionally downsample chroma. Note that in naive tests I have found in the past that downsampling chroma sometimes helps visual quality; and in fact in some cases it even helps MSE measured on the RGB data. I now know that that was just an anomaly due to the fact that I wasn't considering chroma scaling. That is, downsampling was just a crude way of allocating fewer bits to chroma, which does in fact sometimes help, but if you also have the ability to change the chroma bit allocation by relative scaling of the channels, the advantage of downsampling vanishes.

I optimized the scaling of CoCg relative to Y on lots of images. Obviously the true optimum value is highly image dependent (you could compute this per image and store it with the image of course), but in most cases a scale near 0.7 is optimal if you are not downsampling, and a scale near 1.1 is close to optimal when downsampling ( 1.0 is not bad when downsampling ). When not downsampling, the optimal bit allocation is usually in the area of Y ~= 66% of the bits, as seen in the EE398 paper. When downsampling, the optimal bit allocation tends to be closer to Y = 80% of the bits. Downsampling generally hurts RGB MSE and SCIELAB delta E, but I find it sometimes helps RGB SSIM.

Obviously downsampling is resulting in more bits being used on luma, which means you'll have sharper edges and better preservation of texture and a visual appearance of more "detail", at the cost of the color values being far off. By my own examination, I often will find that if I just stare at the image made from downsampled chroma it looks "better" - eg. I see more edge detail, and it has less of that obvious appearance of being compressed, eg. less ringing artifacts, halos, stair-steps, etc. However, when I switch back and forth between the original and the compressed, the version made from downsampled chroma shows obvious color errors. The version made from non-downsampled chroma obviously has much better color preservation, but appears generally blurrier, has more block artifacts, etc. The non-downsampled version wins according to "delta E" , but by my eyes I can't really clearly say one is better than the other, they're just different errors.

The last tool we have is a non-uniform quantization matrix. NUQM lets us give more bits to the low frequencies vs. the high frequencies. Generally NUQM hurts MSE, but it might help "delta E" , because SCIELAB accounts for the "fuzziness" of human visual (insensitivity to high frequency pattern). To test this, what we need to try is various different NUQM's for both luma and chroma, as well as optimizing the relative scaling value in each case. I haven't completed this yet, but early results show that NUQM's do in fact help delta E. Note that I'm not talking about doing a per-image optimal NUQM like "dctopt" does or something, just finding something like the JPEG style skewed matrix to use globally.

Some numbers for example :


On a 512x512 color image of a face , at 1.0 bits per pixel , 
optimizing quality at constant bit rate


baseline : delta E = 2.2933

not downsampled , optimal CoCg scale = 0.625 : delta E = 2.155  (bits Y = 72%)

    downsampled , optimal CoCg scale = 1.188 : delta E = 2.381  (bits Y = 80%)

best NUQM and scaling (no downsampling) : delta E = 1.899  (bits Y = 61%)

( JPEG delta E = 2.7339 )

One thing I notice that NUQM does obviously is give a lot more bits to the DC's. In this case :


not downsampled, same cases as previous
UQM = uniform quantization matrix

 UQM , bits DC = 15.4% , Q = 14.0  , delta E = 2.155 , bits Y = 72%

NUQM , bits DC = 19.4% , Q =  7.25 , delta E = 1.899 , bits Y = 61%

Here Q is the quantizer of the DC component of Y - in the UQM case all Q's are the same (though the Q for chroma is effectively scaled). In the NUQM case the higher frequency AC components get much higher Q's. We can see from the above that because of NUQM, the quantizer for the DC can be much lower at the same bit rate.

Personal visual inspection indicates that the NUQM images just have much more "JPEG-like" artifacts. That is, they generally look more speckly. They obviously preserve flat areas and simple ramps somewhat better. The tradeoff is much worse ringing artifacts and destruction of high frequency detail like fine edges. (in my case the lower Q from NUQM also means a much weaker deblocking filter is used which may be part of the reason for more speckly appearance).

In any case, NUQM clearly helps delta E due to the ability to take bits away from the high frequency chroma data - much better than just scaling and downsampling can.

This is all very interesting and promising, but we have to ask ourselves at some point - how much do we trust this "scielab delta E" ? eg. by optimizing for this metric are we actually making better results? More and more I am convinced that the biggest thing missing from data compression is a better image quality metric (and then once you have that, you need to go back to basics and re-test all your assumptions against it in the correct way).

Color links :

Working Space Comparison sRGB vs. Adobe RGB 1998
Welcome to IEEE Xplore 2.0 Using SCIELAB for image and video quality evaluation
Video compression's quantum leap - 12112003 - EDN
Useful Color Equations
Useful Color Data
Standard illuminant - Wikipedia, the free encyclopedia
SpringerLink - Book Chapter
S-CIELAB Matlab implementation
References related to S-CIELAB
Lab color space - Wikipedia, the free encyclopedia
IEEE Xplore - Login
help - sRGB versus Adobe RGB (1998)
efg's Chromaticity Diagrams Lab Report
CIECAM02 - Wikipedia, the free encyclopedia
Chromatic Adaptation
Brian A. Wandell -- Reference Page
Ask a Color Scientist!
A top down description of S-CIELAB and CIEDE2000. Garrett M. Johnson. 2003; Color Research & Application - Wiley InterScienc
A proposal for the modification of s-CIELAB

2/10/2010

02-10-10 - Some little image notes

1. Code stream structure implies a perceptual model. Often we'll say that uniform quantization is optimal for RMSE but is not optimal for perceptual quality. We think of JPEG-style quantization matrices that crush high frequencies as being better for human-visual perceptual quality. I want to note and remind myself that actually just the coding structure actually targets perceptual quality even if you are using uniform quantizers. (obviously there are gross ways this is true such as if you subsample chroma but I'm not talking about that).

1.A. One way is just with coding order. In something like a DCT with zig-zag scan, we are assuming there will be more zeros in the high frequency. Then when you use something like an RLE coder or End of Block codes, or even just a context coder that will correlate zeros to zeros, the result is that you will want to crush values in the high frequencies when you do RDO or TQ (rate distortion optimization and trellis quantization). This is sort of subtle and important; RDO and TQ will pretty much always kill high frequency detail, not because you told it anything about the HVS or any weighting, but just because that is where it can get the most rate back for a given distortion gain - and this is just because of the way the code structure is organized (in concert with the statistics of the data). The same thing happens with wavelet coders and something like a zerotree - the coding structure is not only capturing correlation, it's also implying that we think high frequencies are less important and thus where you should crush things. These are perceptual coders.

1.B. Any coder that makes decisions using a distortion metric (such as any lagrange RD based coder) is making perceptual decisions according to that distortion metric. Even if the sub-modes are not overtly "perceptual" if the decision is based on some distortion other than MSE you can have a very perceptual coder.

2. Chroma. It's widely just assumed that "chroma is less important" and that "subsampling is a good way to capture this". I think that those contentions are a bit off. What is true, is that subsampling chroma is *okay* on *most* images, and it gives you a nice speedup and sometimes a memory use reduction (half as many samples to code). But if you don't care about speed or memory use, it's not at all clear that you should be subsampling chroma for human visual perceptual gain.

It is true that we see high frequencies of chroma worse than we see high frequencies of luma. But we are still pretty good at locating a hard edge, for example. What is true is that a half-tone printed image in red or blue will appear similar to the original at a closer distance than one in green.

One funny thing with JPEG for example is that the quantization matrices are already smacking the fuck out of the high frequencies, and then they do it even harder for chroma. It's also worth noting that there are two major ways you can address the importance of chroma : one is by killing high frequencies in some way (quantization matrices or subsampling) - the other is how fine the DC value of the chroma should be; eg. how should the chroma planes be scaled vs. the luma plane (this is equivalent to asking - should the quantizers be the same?).

1/22/2010

01-22-10 - Exponential

One of my all time pet-peeves is people who say things are "increased exponentially". Of course the worst use of all is just when people use it to mean "a lot" , eg. they're not even talking about a trend or a response curve, eg. "the 911 Turbo provides an exponential increase in power over the base spec". This abortion of usage is becoming more and more common. But even scientists and mathematicians will use "exponential" to describe a trend that's faster than linear (it's quite common in NYT math/economics/science articles for example).

Today I was reading this blog on software development organizations and my hackles immediately went up when I read :

"The real cost of complexity increases exponentially."

I started to write a snarky post about how he almost certainly meant "geometrically". But then I started thinking about it a bit more. (* correction : by "geometrically" I mean "polynomially").

Maybe software development time actually is exponential in number of features?

If you're trying to write N features and they are all completely independent, then time is O(N), eg. linear.

If each feature can be used with only one other feature, and that interaction is completely custom but independent, then time is O(N^2), eg. geometric.

Already I think there's a bit of a myth we can tackle. A lot of optimistic software devs think that they can get under even this O(N^2) complexity by making the features interact through some generic well defined pathways. eg. rather than specificially code how feature (A) and feature (B) and feature (A+B) work, they try to just write (A) and (B) but make them both aware of their circumstances and handle various cases so that (A+B) works right. The problem is - you didn't really avoid the O(N^2). At the very least, you still had to test (A+B) to make sure it worked, which meant trying all N^2 ways, so your time is still O(N^2). The code might look like it's O(N) because there's only a function for each feature, but within each function is implicit O(N^2) logic !!

What I mean by implicit logic is the blank lines which testing reveals you don't have to write! eg. :


void Feature_A( )
{

    DoStuff()

    if ( SelectionMode() )
    {
        // ... custom stuff here to make (A+C) and (A+D) work
    }

    // !!! blank lines here !
    //  this is implicit knowledge that (A+B) and (A+E) don't need custom code

}

You might argue that this is slightly less than quadratic complexity for the developer, and tester time is cheaper so we don't care if that's quadratic. But in any case it's geometric and above linear.

But is it actually exponential? What if all the features could be enabled or disabled, so the state of your code is a binary string indicating what features are on or off; eg. 1100 = A on, B on, C off, D off. Then there are in fact 2^N feature states, which is in fact exponential.

Another possibility is if the features can only be enabled one by one, but they have lingering effects. You have some shared state, like a data file you're processing, and you can do A then C then B then E on it. In that case the number of sequences is something like N! which is exponential for large N (actually super-exponential)

Let's concretely consider a real video game coding scenario.

You're trying to write N features. You are "sophisticated" so rather than writing N^2 hard-coded interactions, you make each feature interact with a shared world state via C "channels". (in the old Looking Glass speak, these C channels might be a set of standard "properites" on objects and ways to interact with those channels; in the old Oddworld Munch codebase there were C "component" types that could be on objects such as "SoundTrigger "Pressable" etc.). So your initial code writing time something like O(N*C).

But for the N features to really be meaningful, the C is ~= N (roughly proportional). (or at least C ~= log(N) , but certainly C is not constant as N increases - as you add features you always find you need more channels for them to communicate with each other). So just code writing time is something between O(NlogN) and O(N^2).

But your features also affect shared state - e.g. the "world" that the game takes place in, be that physical state, or state variables that can be flipped on other objects. If you have N objects each with K internal states, this creates K^N world states that have to be tested. Even with very small world state, if the features are order dependent, you're back to N! test cases.

If the bug rate was a constant percentage of test cases (eg. 0.1% of test cases produce a bug), then you are back to exponential number of bugs = exponential coder time. But I'm not sure that model of bugs is correct. If the bug rate was a constant percentage of lines of code, then bug rate would only be geometric.

1/17/2010

01-17-10 - Nob or Knob -

Is there a difference between Nob and Knob or are they just alternate spellings of the same thing ?

It appears that "knob" is generally considered more correct now, though in old english "nob" was more common. There are many meanings, I'll show with the spelling I prefer :

knob : dial/wheel control
knob : bump or protuberance
knob : small amount, usually of butter
knob : head of the penis
nob  : head of a man (archaic)
nob  : wealthy or upper class person (archaic)

Some people seem to think that either "nob" or "knob" are exclusively correct for the slang meaning of penis (they also disagree on whether it refers to the whole penis or just the head). It's unclear to me where the origin of this slang came from, since "nob" can either mean a person's head (archaic), which would suggest the slang came to refer to head of the penis, or "knob" can mean any protuberance.

Wiktionary seems to think "knob" is a common way to refer to a hill; perhaps this is British, I've never heard it in America.

Some weird uses :

"Nob Hill" - my guess is this common name refers to a hill where the wealthy people lived, not the fact that the hill was a knob, but I could be totally wrong about that. Some people seem to point Nob Hill at nabob but since nabob basically means the same thing as nob I don't see why you would point "Nob" at "nabob" when you could just point it at "nob".

"Hob Nob" - apparently this is completely unrelated if you believe this etymology it came from habbe nabbe

"For his nobs" in Cribbage is a funny one ; it appears to also be completely unrelated, coming from the game noddy which means simpleton, and since the knave of the same suit was important to the game it was referred to as the "knave noddy" or just "noddy" which must have become nob. (unrelated but the story of John Suckling, purported inventor of modern Cribbage, seems pretty fascinating; he was apparently a master gambler and cheater at cards who used his skill to get money beyond his station; he was involved in a plot to spring a prisoner from the tower of london, and received at least one beating at the handle of a nobleman tradgames ; ezinearticles ; wikipedia )

There are some funny uses : Nob Hill Knob Set and Nifty Nob Inc. maker of fine Knobs good job on the consistency, guys.

1/14/2010

01-14-10 - A small note on Trellis quantization

See reference .

I guess this is obvious, but you do get a pretty nice win from using the true floating point Dct results rather than the quantized Dct results when you do TQ.

I believe the standard practice (what I was doing before anyway) is to do your normal fast Dct + quantization, which takes your integer pixels and makes quantized integer post-Dct output. You then apply TQ on that quantized-and-transformed output, which means instead of sending the true output { 4,3,0,0 } you also consider {4,2,0,0 } and {4,0,0,0 } and so on, measure J(R,D) of each and pick the best.

Okay, but the distortion of changing "1" to 0 is not the same as the distortion of another 1 to 0 , if those were not really the same 1 before quantization.

For example, say you're quantizing with a quantizer = 1.0 for simplicity, and no deadzone, even bucket sizes, so you have quantization buckets :


[ -0.5, 0.5 ] -> 0
[ 0.5 , 1.5 ] -> 1
etc.

In that case, when TQ decides to take a quantized "1" and send instead a 0, if the true value was 0.51 , that's not so bad. If the true original value was 1.49 , that's a lot worse.

However, interestingly, if the true original value was 1.49 , then we could send it as a "2". If the value is near a quantization boundary, then the distortion doesn't care whether you kick the value up or down, but the rate might be significantly different, in which case you should make a choice based on J and get a win.

So my Dct now also does the true float -> float transform just for use in the distortion measurement for TQ. It's also useful in this application to make sure your Dct doesn't do any scaling, so that the transform is Unitary, that is, L2 norm is preserved. That way the distortion measure in post-Dct space is the same as the distortion measure in pre-Dct space which means you can use the same lambda for lagrange J decisions.

1/13/2010

01-13-10 - Oodle Revisited

I got some emails from a friend recently that made me start thinking about Oodle again. Friend is an indie 360 developer who just shot me a query like (paraphrasing) :

"Hey, I'm loading stuff in my game and only getting like 20 MB/sec , what gives?"

So we started trying to dig into what he was doing exactly - are you opening/reading the files with all the right flags, are you on 4k alignment, are you on a thread so you aren't just stalling out for the seeks, etc. ? In the end I think we figured out that his problem was he was reading too many small files, to which he replied :

"Oh yeah, duh, I've always packed files into bundles and loaded big chunks in previous games, but I just hadn't gotten around to it yet in this and it was bugging me that my level loads were taking so long."

! Ah ha ! To me this is where Oodle comes in as a product. It's not super hard stuff, but it's something that people always put off until the very end, which is kind of a shame because it means their level loads take forever during dev. So for three years while you're making your game you suffer through annoying slow level loads. Instead you buy Oodle on day 1 and your level loads are automagically fast.

I believe in this as a product, in the sense that I think we can make it, and it would be extremely valuable, but I'm not convinced that the game industry is mature enough to buy it. The game industry has always suffered from the mistaken thinking of "I could write that myself, so I won't pay for it". That's silly. What you should look at is what's the full cost of writing it yourself, including debugging, and perhaps most importantly including the opportunity cost of spending that time on this instead of something else, or the opportunity cost of not having feature X until you've gotten around to writing it.

For example, say you're on a one man dev team and you lay out your coding tasks for your project in order {A,B,C,D,E} . All during dev you are suffering a penalty from not having feature E done. If it would make dev a lot easier to have feature E done up front, you should pay a *lot* of money to get it immediately. Some of the classic mistakes in this vein are things like profile HUDs which people put off until the end, but would provide huge benefit if you had from the beginning; another one people aren't aware of is level save/load and memory card support - you think of that as a minor detail to do at the end, but as soon as you do it level designers are walking around with scenarios saved on memory cards to show to each other and they get a huge productivity boost, so it would have been a nice win to do early (though then you have maintenance pain).

To me the big win of Oodle is :

Client just writes code to load files one by one with plain old fopen/fread/whatever.

Behind the scenes Oodle magically robustly incrementally syncs PC files to consoles, packages files into bundles, makes prefetch lists and prefetches bundles, compresses & decompresses bundles on threads. You have ship quality fast loading from the beginning.

That's a small, simple, great product I think. The problem is to make it something compelling enough to sell we start to add lots of features : high performance paging in/out for seamless worlds, hot reloading of changed content, smooth IO integration with streaming data files like audio/video, DVD layout optimization, texture compressors, etc. which really just wind up clouding the tiny little valuable product at the core.