I was remembering how modern LZ's like LZMA (BitKnit, etc.) that (can) do pos&3 for literals might like bitmaps in XRGB rather than 24-bit RGB.
In XRGB, each color channel gets its own entropy coding. Also offset bottom bits works if the offsets are whole pixel steps (the off&3 will be zero). In 24-bit RGB that stuff is all mod-3 which we don't do.
(in general LZMA-class compressors fall apart a bit if the structure is not the typical 4/8/pow2)
In compressors it's generally terrible to stick extra bytes in and give the compressor more work to do. In this case we're injecting a 0 in every 4th byte, and the compressor has to figure out those are all redundant just to get back to its original size.
Anyway, this is an old idea, but I don't think I ever actually tried it. So :
PDI_1200.bmp
LZNA :
24-bit RGB : LZNA : 2,760,054 -> 1,376,781
32-bit XRGB: LZNA : 3,676,818 -> 1,311,502
24-bit RGB with DPCM filter : LZNA : 2,760,054 -> 1,022,066
32-bit XRGB with DPCM filter : LZNA : 3,676,818 -> 1,015,379 (MML8 : 1,012,988)
webpll : 961,356
paq8o8 : 1,096,342
moses.bmp
24-bit RGB : LZNA : 6,580,854 -> 3,274,757
32-bit XRGB: LZNA : 8,769,618 -> 3,022,320
24-bit RGB with DPCM filter : LZNA : 6,580,854 -> 2,433,246
32-bit XRGB with DPCM filter : LZNA : 8,769,618 -> 2,372,921
webpll : 2,204,444
gralic111d : 1,822,108
other compressors :
32-bit XRGB with DPCM filter : LZA : 8,769,618 -> 2,365,661 (MML8 : 2,354,434)
24-bit RGB no filter : BitKnit : 6,580,854 -> 3,462,455
32-bit XRGB no filter : BitKnit : 8,769,618 -> 3,070,141
32-bit XRGB with DPCM filter : BitKnit : 8,769,618 -> 2,601,463
32-bit XRGB: LZNA : 8,769,618 -> 3,022,320
32-bit XRGB: LZA : 8,769,618 -> 3,009,417
24-bit RGB: LZMA : 6,580,854 -> 3,488,546 (LZMA lc=0,lp=2,pb=2)
32-bit XRGB: LZMA : 8,769,618 -> 3,141,455 (LZMA lc=0,lp=2,pb=2)
repro:
bmp copy moses.bmp moses.tga 32
V:\devel\projects\oodle\radbitmap\radbitmaptest
radbitmaptest64 rrz -z0 r:\moses.tga moses.tga.rrz -f8 -l1
Key observations :
1. On "moses" unfiltered : padding to XRGB does help a solid amount (3,274,757 to 3,022,320 for LZNA) , despite the source being 4/3 bigger. I think that proves the concept. (BitKnit & LZMA even bigger difference)
2. On filtered data, padding to XRGB still helps, but much (much) less. Presumably this is because post-filter data is just a bunch of low values, so the 24-bit RGB data is not so multiple-of-three structured (it's a lot of 0's, +1's, and -1's, less coherent, less difference between the color channels, etc.)
3. On un-filtered data, "sub" literals might be helping BitKnit (it beats LZMA on 32-bit unfiltered, and hangs with LZNA). On filtered data, the sub-literals don't help (might even hurt) and BK falls behind. We like the way sub literals sometimes act as an automatic structure stride and delta filter, but they can't compete with a real image-specific DPCM.
Now, XRGB padding is an ugly way to do this. You'd much rather stick with 24-bit RGB and have an LZ that works inherently on 3-byte items.
The first step is :
LZ that works on "items"
(eg. item = a pixel)
LZ matches (offsets and lens) are in whole items
(the more analogous to bottom-bits style would be to allow whole-items and "remainders";
that's /item and %item, and let the entropy coder handle it if remainder==0 always;
but probably best to just force remainders=0)
When you don't match (literal item)
each byte in the item gets it own entropy stats
(eg. color channels of pixels)
which maybe is useful on things other than just images.
The other step is something like :
Offset is an x,y delta instead of linear
(this replaces offset bottom bits)
could be generically useful in any kind of row/column structured data
Filtering for values with x-y neighbors
(do you do the LZ on un-filtered data, and only filter the literals?)
(or do you filter everything and do the LZ on filter residuals?)
and a lot of this is just webp-ll