This may seem really bone-headed but it's very very common. You may recall in some of my past posts I looked into some basic video stuff in great detail and found that almost every major video encoder/decoder is broken on the simple stuff :
the DCT (or whatever transform)
color space conversion
translating float math into ints
These things are all relatively trivial, but it's just SO common to get them wrong, and you throw away a bottom bit (or half a bottom bit) when you do so. Any time you are writing a compressor, you need to write reference implementations of all these basic things that you know are right - and check them! And then a crucial thing is : keep the reference implementation around! Ideally you would be able to switch it on from the command line, or failing that with a build toggle, so at anytime you can go back and enable the slow mode and make sure everything works as expected.
(of course a frequent cause of trouble is that people go and grab an optimized integer implementation that they found somewhere, and it either is bad or they use it incorrectly (eg. maybe it assumes data that's biased in a certain way, or centered at 0, or scaled up by *8, or etc))
A lot of this basic stuff in video is very easy to do regression tests on (eg. take some random 8x8 data, dct, quantize, dequantize, idct, measure the error, it should be very low) so there's no excuse to get it wrong. But even very experienced programmers do get it wrong, because they get lazy. They might even start with a reference implementation they know is right, and then they start optimizing and translating stuff into ints or SIMD, and they don't maintain the slow path, and somewhere along the line a mistake slips in and they don't even know it.
I've been thinking about a more difficult problem, which is : how do you deal with bugs in compression algorithms?
I don't mean bugs like "it crashes" or "the decompressor doesn't reproduce the original data" - those are the easy kind of bugs and you just go fix them. I mean bugs that cause the compressor to not work the way you intended, and thus not compress as much as it should.
The very hard thing about these bugs is that you can have them and not even know it; I'm sure I have a hundred of them right now. Frequently they are tiny things like you have a less-than where you should have a less-or-equal.
To avoid them really requires a level of care that most programmers never use. You have to be super vigilant. Any time something surprises you or is a bit fishy, you can't just go "hmm that's weird, oh well, move on to the next task". You have to stop and think and look into it. You have to gather data obsessively.
Any time you implement some new idea and it doesn't give you the compression win you expected, you can't just say "oh well guess that didn't work", you have to treat it like a crash bug, and go set breakpoints and watch your variables and make sure it really is doing what you think; and if it is, then you have to gather stats about how often that code is hit and what the values are, and see where your expectations didn't match reality.
I've really been enjoying working on compression again. It's one of the most fun areas of programming that exists. What makes it great :
1. Clear objective measure of success. You can measure size and speed (or whatever other criteria) and see if you are doing well or not. (lossy compression is harder for this).
2. Decompressors are one extreme of fun "hacker" programming; they have to be very lean; great decompressors are like haikus, they're little pearls that you feel could not get any simpler without ruining them.
3. Compressors, on the other hand, can be big and slow, and you get to pull out all the big guns of algorithms for optimization and searching and so on.