The goal is to make this work without taking any speed hit. There are lots of little tricks to make this happen. For example, the LZ decode match copier is allowed to trash up to 8 bytes past where it thinks the end is. This lets me do a lot fewer bounds checks in the decode. To prevent actual trashing then, I just make the encoder never emit a match within 8 bytes of the end of a chunk. Similarly, the Huffman decoder can be made to always output a symbol in finite number of steps (never infinite loop or access a table out of bounds). You can do this just by doing some checks when you build your decode tables, then you don't have to do any checks in the actual decode loop.
So, how do we make sure that it actually works? To prove that it is 100% fuzz resilient, you would have to generate every possible bit stream of every possible length and try decoding them all. Obviously that is not possible, so we can only try our best to find bad cases. I have a couple of strategies for that.
Random stomps. I just stomp on the compressed data in some random way and then run the decoder and see what happens (it should fail but not crash). I have a test loop set up to do this on a bunch of different files and a bunch of different stomp methods.
Just stomping random bytes in turns out to not be a very good way to find failures - that type of corruption is actually one of the easiest to handle because it's so severe. So I have a few different stomp modes : insert random byte, insert 00, insert FF, flip one bit, and the same for shorts, dwords, qwords. Often jamming in a big string of 00 or FF will find cases that any single byte insert won't. I randomize the location of the stomp but prefer very early position ones, since stomping in the headers is the hardest to handle. I randomize the number of stomps.
One useful thing I do is log each stomp in the form of C code before I do it. For example I'll print something like :
compBuf[ 906 ] ^= 1then if that does cause a crash, I can just copy-paste that code to have a repro case. I was writing out the stomped buffers to disk to have repro cases, but that is an unnecessary slowdown; I'm currently running 10,000+ stomp tests.<
<
3; compBuf[ 61 ] ^= 1<
<
3; compBuf[ 461 ] ^= 1<
<
4;
(note to self : to do this, run main_lz -t -z -r1000)
Okay, so that's all very nice, but you can still easily miss failure cases. What I really want is something that gives me code coverage to tell that I've handled corrupted data in all the places where I read data. So I stole an idea from relacy :
Each place I get a byte (or bits) from the compressed stream, I replace :
U8 byte = *ptr++;with
U8 byte = *ptr++; FUZZ(byte); // I wanted to do this but couldn't figure out how to make it work : // U8 byte = FUZZ( *ptr++ );(and similar for getting bits). Now, what the FUZZ macros do is this :
The first time they are encountered, they register their location with the FuzzManager. They are then a disabled possible fuzz location. Each one is given a unique Id.
I then start making passes to try to fuzz at all possible locations. To do this, each fuzz location is enabled one by one, then I rerun the decompressor and see if that location was in fact hit. If a fuzz location is enabled, then the FUZZ macro munges the value and returns it (using all the munge modes above), and if it's disabled it just passes the byte through untouched.
Once I try all single-munges, I go back and try all dual munges. Again in theory you should try all possible multi-fuzz sequences, but that's intractable for anything but trivial cases, and also it would be very odd to have a problem that only shows up after many fuzzes.
As you make passes, you can encounter new code spots, and those register new locations that have to be covered.
Again, a nice thing I do is before each pass I log C code that will reproduce the action of that pass, so that if there is a problem you can directly reproduce it. In this case, it looks like :
Fuzz : 1/36 rrFuzz_StartPass_Select(".\compress\rrLZHDecompress.cpprrLZHDecoder_DecodeSome",351010,3,1,0x28502CBB);In order to have reproducability, I use FILE/LINE to identify the fuzz location, not an index, since the index can change from run to run based on the code path taken. Also, note that I don't actually use FILE/LINE because I have FUZZ in macros and templates - I use __FUNCDNAME__ so that two versions of a template get different tags, and I use __COUNTER__ so that macros which cause multiple fuzzes to occur at the same original code line get different location numbers. eg. this works :
#define A() do { U8 t = *ptr++; FUZZ(t); } while(0) #define B() A(); A(); templateAs usual I'm not sure how to get the same thing in GCC. (maybe __PRETTY_FUNCTION__ works? dunno).<
int i > void func() { B(); } void main() { func<
0 >(); func<
1 >(); } // there should be 4 separate unique FUZZ locations registered : /* I log : rrFuzz_Register(".\main_lz.cpp|??$func@$0A@@@YAXXZ",1318000) = 0; rrFuzz_Register(".\main_lz.cpp|??$func@$0A@@@YAXXZ",1318001) = 1; rrFuzz_Register(".\main_lz.cpp|??$func@$00@@YAXXZ",1318000) = 2; rrFuzz_Register(".\main_lz.cpp|??$func@$00@@YAXXZ",1318001) = 3; */
The actual FUZZ macro is something like this :
#define FUZZ_ID __FILE__ "|" __FUNCDNAME__ , __LINE__*1000 + __COUNTER__ #define FUZZ( word ) do { static int s_fuzzIndex = rrFuzz_Register(FUZZ_ID); if ( rrFuzz_IsEnabled(s_fuzzIndex) ) { word = rrFuzz_Munge(word); } } while(0)
The only imperfection at the moment is that FUZZ uses a static to register a location, which means that locations that are never visited at all never get registered, and then I can't check to see if they were hit or not. It would be nice to find a solution for that. I would like it to call Register() in _cinit, not on first encounter.
Anyway, this kind of system is pretty handy for any code coverage / regression type of thing.
(note to self : to do this, define DO_FUZZ_TEST and run main_lz -t -r1000)
ADDENDUM : another practical tip that's pretty useful. For something small and complex like your headers, or your Huffman tree, or whatever, you might have a ton of consistency checks to do to make sure they're really okay. In that case, it's usually actually faster to just go ahead and run a CRC (*) check on them to make sure they aren't corrupted, then skip most of the validation checks.
On the primary byte stream we don't want to do that because it's too slow, but for headers the simplicity is worth it.
(*) not actually a CRC because doing byte-by-byte table lookups is crazy slow on some game platforms. There are other robust hashes that are faster. I believe Bob Jenkin's Lookup3 is probably the best and fastest, since we have platforms that can't do multiplies fast (ridiculous but true), so many of the hashes that are fast on x86 like Murmur2 are slow on consoles.
"To prevent actual trashing then, I just make the encoder never emit a match within 8 bytes of the end of a chunk."
ReplyDeleteErm, that's no good. What if random corruption causes something within the last 8 bytes of a chunk to be decoded as a match?
I think what he means is that a valid file will never write past boundary (into the last 8 bytes), so he doesn't worry about having to fixup legitimate writes into the last 8 bytes when he does wraparound processing, because there are no such writes to fixup.
ReplyDelete"Erm, that's no good. What if random corruption causes something within the last 8 bytes of a chunk to be decoded as a match? "
ReplyDeleteErr, yeah, this is a good point, which I should clarify about :
I do not actually necessarily *detect* corruption - I just don't crash.
So, I always have 8 bytes of pad allocation at the end of a buffer, so that if I do get corruption that causes a match in there, I don't go stomp into other memory.
However, if my buffer is at the end of a sliding window or next to another portion of LZ buffer that's being decoded on another thread, then that match at the end might step into the next guy's buffer. That will make the results invalid, but won't cause a crash.
I wonder how well this works:
ReplyDeletehttp://klee.llvm.org/
From the abstract:
"We present a new symbolic execution tool, KLEE, capable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs. We used KLEE to thoroughly check all 89 stand-alone programs in the GNU COREUTILS utility suite, which form the core user-level environment installed on millions of Unix systems, and arguably are the single most heavily tested set of open-source programs in existence. KLEE-generated tests achieve high line coverage — on average over 90% per tool (median: over 94%) — and significantly beat the coverage of the developers' own hand-written test suites. When we did the same for 75 equivalent tools in the BUSYBOX embedded system suite, results were even better, including 100% coverage on 31 of them. We also used KLEE as a bug finding tool, applying it to 452 applications (over 430K total lines of code), where it found 56 serious bugs, including three in COREUTILS that had been missed for over 15 years. Finally, we used KLEE to cross-check purportedly identical BUSY-BOX and COREUTILS utilities, finding functional correctness errors and a myriad of inconsistencies."
Oh, and one cooky way to have crash-hardened decompression would be to do the wacky bijective/one-to-one compression by D, A. Scott. He seems to think that it is better for crypto for intuitively appealing but actually crackpotty reasons. I mean, it seems that if this kind of compression increases security, it would be for the hardness (no buffer overruns due to manipulated headers/payloads) rather than the miniscule extra randomness you might get.
ReplyDelete"Oh, and one cooky way to have crash-hardened decompression would be to do the wacky bijective/one-to-one compression by D, A. Scott."
ReplyDeleteYeah, to some extent that's what I mean by "setting up the huffman decoder so that every bit stream makes valid output" - a bit prefix code like Huffman *is* bijective. (ignoring the issue of headers and missing symbols and termination, which DAS deals with but are not actually important).
" He seems to think that it is better for crypto for intuitively appealing but actually crackpotty reasons. "
Indeed. He also makes claims that algorithms need to be bijective to maximize compression; while in theory that is true (non-bijective means wasted code space), in practice we are nowhere near that razor edge of closeness to optimal.
Klee is interesting. LLVM is very cool.
ReplyDeleteBut actually generating data for all possible branches would get out of hand pretty quick. In fact all you need is one loop :
int i = input();
while( i --)
{
// do stuff
}
you suddenly have 2^32 test cases. Put a loop inside that loop and you have 2^64.
You have to fall back to sampling rather than exhausting enumeration pretty fast.
Of course if you're going to mention CRC-ish codes without tables, you should weigh in on Adler32, which is used by zlib for compression verification, and is also used by ZFS which needs a fast checksum for its always-on validation.
ReplyDeleteOkie doke. Coming up...
ReplyDelete...or Fletcher32, which is probably better than Adler32.
ReplyDeleteOh yeah, Fletcher certainly should be faster, and supposedly quality is about the same. However this is seriously fucked up :
ReplyDelete"The Fletcher checksum cannot distinguish between blocks of all 0 bits and blocks of all 1 bits. For example, if a 16-bit block in the data word changes from 0x0000 to 0xFFFF, the Fletcher-32 checksum remains the same. This also means a sequence of all 00 bytes has the same checksum as a sequence (of the same size) of all FF bytes."