tag:blogger.com,1999:blog-5246987755651065286.post6249697173882078193..comments2024-02-22T16:15:42.388-08:00Comments on cbloom rants: 05-20-09 - Some Huffman notescbloomhttp://www.blogger.com/profile/10714564834899413045noreply@blogger.comBlogger3125tag:blogger.com,1999:blog-5246987755651065286.post-14953880321461212692010-06-02T21:14:19.058-07:002010-06-02T21:14:19.058-07:00"You're avoiding branches on the input da..."You're avoiding branches on the input data, but don't you get those branches back (almost as unpredictable, although fewer since they're per-symbol instead of per-bit)"<br /><br />It is inherently impossible to eliminate unpredictable branches in a decompressor. (in fact the unpredictability of the branches is related to the efficiency of the compressor! the memcpy decoder is the only decoder without branches and it also gets zero compression)<br /><br />I believe this way is the minimum # of branches in a huffman decode (or very close anyway).<br /><br />The IJG method actually has lots of branches that you aren't accounting for. Filling the bit buffer has branches, how many bits of the buffer are used to decode a symbol has branches, etc.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-72078056211676310252009-06-03T05:30:54.515-07:002009-06-03T05:30:54.515-07:00You're avoiding branches on the input data, bu...You're avoiding branches on the input data, but don't you get those branches back (almost as unpredictable, although fewer since they're per-symbol instead of per-bit) in the for loop since you can get multiple outputs from each byte?<br /><br />libjpeg uses an 8-bit lookup table but without the same kind of state, just filling the buffer to make sure there's always enough. Of course, then you get a branch on “do we need to refill the buffer?”...Unknownhttps://www.blogger.com/profile/10912694059109291802noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-57697488991765565582009-05-21T17:17:29.064-07:002009-05-21T17:17:29.064-07:00Your approach does has use even in a "normal&...Your approach does has use even in a "normal" one-symbol eating canonical huffman decoder (I take the IJG-decoder as an example). You normally have to maintain a little buffer-register holding enough left-align bits for the LUT, then you shift the buffer bit-wise to the left (as much bits as you consumed). Your approach allows to get rid of the buffer-register cycling, which means (algorithmically) a lot of conditional bit-shifting would be converted to conditional loops (namely all the shifting into the register, the shifting out of the register is still necessary). That allows to do full adress & size aligned cache-bypass reads (huffman-code streams are hardly ever write-back) from the code-streams.<br /><br />So under the condition that you have a better instruction-pipe than ALU (loops faster than bit-ops), or the ratio register to cache/mem is high (read-manipulate-write oncache takes long, caches are too full, reading from memory takes long), you have a gain here even without a multi-symbol pushing decoder.<br /><br />I couldn't say the gain is in fantastic dimensions, but it may work given the right architectural conditions.Ethatronhttps://www.blogger.com/profile/10014459901168712472noreply@blogger.com