tag:blogger.com,1999:blog-5246987755651065286.post7689886066940220309..comments2024-02-22T16:15:42.388-08:00Comments on cbloom rants: 02-01-14 - Understanding ANS - 3cbloomhttp://www.blogger.com/profile/10714564834899413045noreply@blogger.comBlogger36125tag:blogger.com,1999:blog-5246987755651065286.post-6792359050353536632014-02-05T08:49:21.238-08:002014-02-05T08:49:21.238-08:00> "huf" and "fse" are trans...> "huf" and "fse" are transmitting their code tables. "arith" and "rans" are not. <br />They should add about 256 bytes of header to be fair.<br /><br />Quick comment, just in case it would matter : FSE exposes its internal functions, so it's possible to retrieve the size of the compressed stream "without the code tables" :<br />int FSE_compress_usingCTable (void* dest, const unsigned char* source, int sourceSize, const void* CTable);<br /><br />In this case, only a small 4-bytes header remains.Cyanhttps://www.blogger.com/profile/02905407922640810117noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-4613570005576555142014-02-05T05:43:15.458-08:002014-02-05T05:43:15.458-08:00difference between tANS and fast rANS - additional...difference between tANS and fast rANS - additional and+shift+mul+add<br /><br />but rANS gets input by words, while tANS gets input by bits<br /><br />so proximately mul+add (about 4 cpu clocks from rANS 15 clocks) is all difference, while rANS gives perfect compressionAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-24150186332002583782014-02-05T04:52:58.294-08:002014-02-05T04:52:58.294-08:00i implemented fast version of rANS
64 bit register...i implemented fast version of rANS<br />64 bit registers<br />sum(counters)=2^12<br />lookup table<br /><br />decode take about 15 CPU clocks per byte<br />perfect compression<br /><br />main decode loop: <br />http://pastebin.com/LM8Au01pAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-26499291810322152772014-02-04T06:16:39.446-08:002014-02-04T06:16:39.446-08:00i write carry flag to stream and all work good, ex...i write carry flag to stream and all work good, except loss of 0.01% compresson.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-16362407633084648542014-02-04T02:21:02.388-08:002014-02-04T02:21:02.388-08:002DudaJarek
>which allows to immediately deduce...2DudaJarek<br /><br />>which allows to immediately deduce the number of bits to transfer while decoding.<br /><br />This was my first rescale way. But it needs bit buffers and bit counters variables. So it will be slow.<br /><br />My currently used rescaling is faster in decoding. Only this stupid corner case with carry. Only way i see - use ugly ifs and reencoding tryes for (x==2^32-1)...Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-63258215001222513172014-02-03T16:59:15.171-08:002014-02-03T16:59:15.171-08:00enotuss,
I also thought about such reversed renorm...enotuss,<br />I also thought about such reversed renormalization: transfer digits after encoding step (not decoding). Unfortunately, in this case we don't know how many digits we should transfer before decoding step ...<br />I couldn't think of any other practical renormalization, but the current one seems ok(?) - especially that for uABS, rABS, rANS you can use large b=2^8, 2^16 to take one or two bytes at once.<br />Alternatively, most of processors should support finding the number of the most significant "0"s ( http://en.wikipedia.org/wiki/Find_first_set ), which allows to immediately deduce the number of bits to transfer while decoding.<br /><br />Regarding division, in Matt's fpaqc it was changed into multiplication thanks to storing 1/f in a table.Jarek Dudahttps://www.blogger.com/profile/11358050996148333936noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-88637890870607524272014-02-03T16:11:56.377-08:002014-02-03T16:11:56.377-08:00bigger file works too
org file size 1261520
Compr...bigger file works too<br /><br />org file size 1261520<br />CompressorRC<br />cmp 45 bytes 0.029616 sec 0.003567% 40.622053 mb/s<br />dcm 1261520 bytes 0.037891 sec 100.000000% 31.751321 mb/s<br />Good<br /><br />CompressorRAns<br />cmp 44 bytes 0.034650 sec 0.003488% 34.720928 mb/s<br />dcm 1261520 bytes 0.039045 sec 100.000000% 30.812467 mb/s<br />Good<br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-79410758829329023342014-02-03T16:09:09.725-08:002014-02-03T16:09:09.725-08:00File length 78890 bytes, first 'a', rest &...File length 78890 bytes, first 'a', rest 'b'. Worked well with not fixed rANS.<br /><br />org file size 78890<br />CompressorRC<br />cmp 12 bytes 0.001585 sec 0.015211% 47.463245 mb/s<br />dcm 78890 bytes 0.002393 sec 100.000000% 31.435408 mb/s<br />Good<br /><br />CompressorRAns<br />cmp 12 bytes 0.002172 sec 0.015211% 34.642339 mb/s<br />dcm 78890 bytes 0.002423 sec 100.000000% 31.046799 mb/s<br />Good<br /><br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-35052332887227285882014-02-03T15:56:05.751-08:002014-02-03T15:56:05.751-08:00Try encoding a file consisting of one 'a' ...Try encoding a file consisting of one 'a' followed by 999999 'b's.Fabian 'ryg' Giesenhttps://www.blogger.com/profile/13685994980026854143noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-30916023818619098892014-02-03T15:09:27.249-08:002014-02-03T15:09:27.249-08:002Fabian 'ryg' Giesen
>None of this wor...2Fabian 'ryg' Giesen<br /><br />>None of this works.<br /><br />It works!! 1M wiki8 compress and decompress. Equal data. Only corner case needs fixing. I was afraid that corner case will show up on 200 byte data, but its working on 1M file.<br /><br />One or two days, i will fix corner case, optimize code ...Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-72694398948712267362014-02-03T15:01:12.559-08:002014-02-03T15:01:12.559-08:00I checked. My rescaling code works. Except for cor...I checked. My rescaling code works. Except for corner case. Big files (>1m) corrupt. Little better compression than carryless RC.<br /><br />Can anybody help me fixing RC code? It work, but slow.<br />http://pastebin.com/gd8EcUM9Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-16438712631535008202014-02-03T14:39:32.985-08:002014-02-03T14:39:32.985-08:00enotuss: None of this works.
1. Supporting only o...enotuss: None of this works.<br /><br />1. Supporting only one "shift" of b is not enough (e.g. take b=2, m=1024 and a symbol with frequency 512 - you need to shift 9 times) to make that work.<br />2. You can get into cases where none of the options you're considering are viable (i.e. none lead to a valid x). My example above shows that for the renormalization as in the paper, your moving it around doesn't get rid of the problem, it just makes it happen elsewhere.<br /><br />Discussing this further here is pointless. It might well be possible that there's a different way to renormalize that works with less restrictive constraints, but unless you can actually present a working implementation (not pseudocode) I'm not interested in wasting more time on this.Fabian 'ryg' Giesenhttps://www.blogger.com/profile/13685994980026854143noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-86000649075170232372014-02-03T05:13:38.387-08:002014-02-03T05:13:38.387-08:00Some addition to above code.
m=sum(counters)<2...Some addition to above code.<br /><br />m=sum(counters)<2^32-1<br />b=32<br />limit=32<br />x precision=64<br /><br />Covering corner case:<br /><br />Decoding:<br />if(x<2^limit){x=((x-1) shiftLeft b)|input_b_bits()}<br />(s,x)=decode_symbol(x)<br /><br />Encoding:<br />xNext=encode_symbol(s,x shiftRight b +1)<br />if(xNext>=2^limit){output_b_lower_bits(x};x=xNext}<br />else{x=encode_symbol(s,x)}<br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-78209686982014234742014-02-03T03:05:06.852-08:002014-02-03T03:05:06.852-08:00x - state
b - shift bits (2^32 for example)
limit ...x - state<br />b - shift bits (2^32 for example)<br />limit - limit (2^16 for example)<br /><br />Decoding:<br />if(x<2^limit){x=(x shl b)|input_b_bits()}<br />(s,x)=decode_symbol(x)<br /><br />Encoding:<br />xNext=encode_symbol(s,x shr b)<br />if(xNext>=2^limit){output_b_lower_bits(x};x=xNext}<br />else{x=encode_symbol(s,x)}Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-25283756263111770032014-02-03T03:01:44.366-08:002014-02-03T03:01:44.366-08:00one more try
Decoding:
if(x<2^limit){x=(x<&g...one more try<br />Decoding:<br />if(x<2^limit){x=(x<>b)<br />if(xNext>=2^limit){output_b_lower_bits(x};x=xNext}<br />else{x=encode_symbol(s,x)}Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-10659466205111406822014-02-03T03:01:01.889-08:002014-02-03T03:01:01.889-08:00comments system sucks
Decoding:
if(x<2^limit){...comments system sucks<br /><br />Decoding:<br />if(x<2^limit){x=(x<>b)<br />if(xNext>=2^limit){output_b_lower_bits(x};x=xNext}<br />else{x=encode_symbol(s,x)}Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-43509288786414258792014-02-03T02:58:19.307-08:002014-02-03T02:58:19.307-08:002Fabian 'ryg' Giesen
hah It appears ensur...2Fabian 'ryg' Giesen<br /><br />hah It appears ensuring same rescaling in encoder and decoder is big problem. <br /><br />Following solution is not good but will work:<br /><br />x - state<br />b - shift bits (2^32 for example)<br />limit - limit (2^16 for example)<br /><br />Decoding:<br />if(x<2^limit){x=(x<>b)<br />if(xNext>=2^limit){output_b_lower_bits(x};x=xNext}<br />else{x=encode_symbol(s,x)}Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-69846376262381787442014-02-02T20:11:31.966-08:002014-02-02T20:11:31.966-08:00Yeah, ryg's the "b-unique" issue is ...Yeah, ryg's the "b-unique" issue is the crucial thing with ANS. I'll talk about it in the context of tANS in the next post or two.<br /><br />@enotuss in general - rANS is super fast for order-0 coding of arrays. If you're doing order-0 arrays you should of course scale the frequencies to a power of 2 total. That's what the report here is comparing.<br /><br />rANS for arbitrary frequency sums and for adaptive coding is a more questionable area (eg. could you use it for PPM?). Unclear if it's a win vs arith. The need to buffer and reverse the encoding side is definitely a pain. But it might still be a win when you mainly care about decoder speed.<br /><br />Most modern adaptive compressors are binary arith only. Comparison of that to rABS is something we have yet to do.<br />cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-63191116467892989602014-02-02T17:35:13.944-08:002014-02-02T17:35:13.944-08:00enotuss: "You dont need this. You dont need t...enotuss: <i>"You dont need this. You dont need to follow paper."</i><br /><br />As long as you don't come up with a different renormalization algorithm that works with finite precision (or a laxer sufficient condition for b-uniqueness), yes I do.<br /><br />Example: l=5, b=2, m=3 (note m does not divide l). Two symbols, l_0=2, l_1=1 and therefore b_0=0, b_1=2.<br /><br /> I = { 5,6,7,8,9 }<br /><br />if you work out the I_s, you get:<br /><br /> I_0 = { 4,5,6 }<br /> I_1 = { 1,2 }<br /><br />I_0 isn't b-unique. You're toast. Suppose you're at x=7 and want to encode a 0. x=7 is not in I_0, so you emit a bit and divide x by 2. Now you have x=3 - still not in I_0, and there's no way to get back into it now, you're already below its lower bound.Fabian 'ryg' Giesenhttps://www.blogger.com/profile/13685994980026854143noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-28384748189147364702014-02-02T17:17:41.079-08:002014-02-02T17:17:41.079-08:00Speed of tANS is well covered by cbloom. Not much ...Speed of tANS is well covered by cbloom. Not much slower than Huffman.<br /><br />Question is: how fast rANS is.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-23111854541685496522014-02-02T17:09:54.935-08:002014-02-02T17:09:54.935-08:00Hi Fabian,
The paper was supposed to be theoretica...Hi Fabian,<br />The paper was supposed to be theoretical, I wrote "combining Huffman speed" in the title, and in abstract I have clearly referred to Yann's implementation, which back then seemed to be a fast implantation (I haven't made tests myself) - I will definitely be more careful in the next version.<br /><br />Regarding tABS, Pascal Massimino (skal) has written that after some "obvious optimizations" the speed of his version of tANS has significantly improved - unfortunately he hasn't posted improved results for the binary version: https://groups.google.com/a/webmproject.org/d/msg/codec-devel/idezdUoV1yY/dPqqSeaQhhYJ<br /><br />The fact that C(s,x) increases with x seems obvious from construction?<br />The renormalization condition for uABS and range versions is discussed in Section 3.3.<br />Regarding cryptographic applications, I have made some larger discussion in the previous paper ( 0902.0271 ), but decided to leave only some basic remarks in this version - it just requires further work of someone with cryptographic background.<br />Thanks,<br />JarekJarek Dudahttps://www.blogger.com/profile/11358050996148333936noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-27782084021980316252014-02-02T17:02:33.634-08:002014-02-02T17:02:33.634-08:002Fabian 'ryg' Giesen
>The renormalizat...2Fabian 'ryg' Giesen<br /><br />>The renormalization algorithm Jarek describes requires l=k*m for some positive integer k.<br /><br />You dont need this. You dont need to follow paper.<br /><br /><br />>Furthermore, once you support arbitrary m, the decoder now also needs to do a div/mod operation.<br /><br />rANS with arbitrary m uses one div+mod for encoder and decoder:<br />C(s,x) = m[x/ls] + bs + mod(x,ls)<br />D(x) = (s, ls[x/m] + mod(x,m) - bs)<br /><br />RC uses one division too.<br /><br />So for arbitrary m rANS must be faster than RC and, i think, rANS must give little better compression (due to limited RC precision).Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-61119557029575109822014-02-02T16:42:30.237-08:002014-02-02T16:42:30.237-08:00enotuss: I know how standard models look, and I kn...enotuss: I know how standard models look, and I know that arbitrary m would be desirable. What I'm trying to tell you is that the "streaming" (finite-precision) versions of rANS just can't do it, not as given in the paper anyway. The renormalization algorithm Jarek describes requires l=k*m for some positive integer k. The infinite-precision version does not, but that's little use if you want to write a fast encoder/decoder to say the least.<br /><br />Furthermore, once you support arbitrary m, the decoder now also needs to do a div/mod operation.<br /><br /><i>"Rescale needs division. If probabilities are dynamic rescale will take all time.<br /><br />m=2^k is special case for static probabilities.<br /><br />Direct comparison RC and rANS must use any m."</i><br />Exactly! The rANS encoder/decoder I posted can only support static probability distributions (efficiently), and the same goes for the table-based (tANS) variants. You can have several baked tables and switch between them quickly, but adaptive updates are off the table for the non-binary variants.<br /><br />What Charles is measuring is rANS against an arithmetic (range) coder that also assumes m is a power of two, so it's a fair comparison. However, the general arithmetic coder can support arbitrary m easily while rANS cannot. So yes, comparing rANS vs. AC directly and claiming ANS is faster is disingenuous. That's precisely what I'm complaining about when I say Jarek's over-selling it (on that axis anyway).Fabian 'ryg' Giesenhttps://www.blogger.com/profile/13685994980026854143noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-89551493404513828052014-02-02T16:32:41.932-08:002014-02-02T16:32:41.932-08:002Fabian 'ryg' Giesen
>the fact that an...2Fabian 'ryg' Giesen<br /><br />>the fact that an interval with the required three properties has the form {l, l+1, ..., b*l-1} requires proof<br /><br />You dont need this. You can rescale as you wish, if you ensure same rescaling in encoder and decoder. But if x<lMax (max range or max counter) you will lose compression.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-8684737230506721522014-02-02T16:23:53.910-08:002014-02-02T16:23:53.910-08:002Fabian 'ryg' Giesen
>Powers-of-2 for ...2Fabian 'ryg' Giesen<br /><br />>Powers-of-2 for both m and l are the most practical compromise<br /><br />In most cases, data model probabilities are array of counters. <br />m=sum(counters)<br />If m=2^k then you need rescale counters. Rescale needs division. If probabilities are dynamic rescale will take all time.<br /><br />m=2^k is special case for static probabilities.<br /><br />Direct comparison RC and rANS must use any m.<br /><br /><br />>You might miss that in the paper <br /><br />I dont get most from paper. Only two rANS formulas: encode and decode. Idea of rANS is simple and clear.Anonymousnoreply@blogger.com