tag:blogger.com,1999:blog-5246987755651065286.post6563706227760169088..comments2024-02-22T16:15:42.388-08:00Comments on cbloom rants: 03-15-14 - Bit IO of Elias Gammacbloomhttp://www.blogger.com/profile/10714564834899413045noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-5246987755651065286.post-24750827442987906022014-03-16T18:24:34.006-07:002014-03-16T18:24:34.006-07:00Added Exp Golomb to the post.Added Exp Golomb to the post.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-72003204293725399142014-03-16T17:19:36.899-07:002014-03-16T17:19:36.899-07:00Yup. With 64-bit BitIO I'm always torn between...Yup. With 64-bit BitIO I'm always torn between byte-based refills (sooo many spare bits to work with, so many refill checks that can be skipped!) and 32-bit-at-a-time refills (nice predictable rare branch, always do 32-bit aligned accesses).<br /><br />With old school-ish RISCs (especially if they don't have good branch predictors), I think 32-bit aligned is superior; on x86/ARM with their fast support for unaligned reads it's a bit more complicated.<br /><br />GCN GPUs have "alignbit" and "alignbyte":<br /><br /> v_alignbit_b32 dest, s0, s1, s2<br /><br />dest = (s0 | (s1 << 32)) >> (s2 & 31)<br /><br /> v_alignbyte_b32 dest, s0, s1, s2<br /><br />dest = (s0 | (s1 << 32)) >> (8 * (s2 & 3))<br /><br />I'd sure like to have those on general-purpose CPUs...Fabian 'ryg' Giesenhttps://www.blogger.com/profile/13685994980026854143noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-23775944280257989182014-03-16T11:08:01.182-07:002014-03-16T11:08:01.182-07:00Ah yes. At least I didn't cover it before on ...Ah yes. At least I didn't cover it before on my own blog, which I've been known to do.<br /><br />One nice thing about 64-bit BITIO and the grab-8 method of refilling is that you always have >= 57 bits available, which means you don't need to even check for the overflow case for typical value ranges.<br />cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-68797082469366500292014-03-15T21:55:43.204-07:002014-03-15T21:55:43.204-07:00Eh, 31 - clz32 and 63 - clz64 respectively. You ge...Eh, 31 - clz32 and 63 - clz64 respectively. You get the idea.Fabian 'ryg' Giesenhttps://www.blogger.com/profile/13685994980026854143noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-55390600169021198202014-03-15T21:54:49.902-07:002014-03-15T21:54:49.902-07:00http://fgiesen.wordpress.com/2011/01/19/a-small-no...http://fgiesen.wordpress.com/2011/01/19/a-small-note-on-the-elias-gamma-cod/ :)<br /><br />While working on Bink 2 which uses LSB-first bottom-justified bit packing, I realized unary is easy there too: calculate "x & -x" (clears all but the lowest set bit in x) and then you can do "32 - clz32" (see also "http://fgiesen.wordpress.com/2013/10/18/bit-scanning-equivalencies/"). A bit more work but still short and branch-free. You do lose the super-nice Gamma decoder though.Fabian 'ryg' Giesenhttps://www.blogger.com/profile/13685994980026854143noreply@blogger.com