tag:blogger.com,1999:blog-5246987755651065286.post3125655105778480025..comments2022-01-05T09:06:18.603-08:00Comments on cbloom rants: Some learnings from ZStdcbloomhttp://www.blogger.com/profile/10714564834899413045noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-5246987755651065286.post-86108699373190909632018-03-15T12:20:48.734-07:002018-03-15T12:20:48.734-07:00Hi Charles
Sorry for the off-topic comment, but w...Hi Charles<br /><br />Sorry for the off-topic comment, but whatever happened to your old content on cbloom.com<br /><br />It has always been a resource I pointed data compression enthusiasts at, but the website seems to be defunct since quite a while.<br /><br />Please put it back somewhere online if you still have that stuff!<br /><br />rep_movsdhttps://www.blogger.com/profile/13340953911780739648noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-3575329280253615442017-09-29T11:01:12.578-07:002017-09-29T11:01:12.578-07:00Indeed, in practice there is some compromise neede...Indeed, in practice there is some compromise needed, for example distinguishing singleton in the center of range from the very low probability symbols.<br />More sophisticated is storing probability distribution quantized in an optimized way (minimizing cost of header + KL), then also decoder perform proper quantization and symbol spread ... https://encode.ru/threads/1883-Reducing-the-header-optimal-quantization-compression-of-probability-distribution<br />Another option is storing symbol spread in the header ... anyway, there are many possibilities to optimize among.Jarek Dudahttps://www.blogger.com/profile/11358050996148333936noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-82912166006687144092017-09-29T08:33:29.037-07:002017-09-29T08:33:29.037-07:00Jarek, but that would require transmitting the tru...Jarek, but that would require transmitting the true probability (p), not the normalized probability (q). That may or may not be worth it, as the true probability may take more bits, and it would require the decoder to spend the time to normalize (to recreate the q's since it was sent the p's), which is non-trivial.<br />cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-25280962924075211832017-09-29T01:19:53.720-07:002017-09-29T01:19:53.720-07:00Hi Charles,
Thanks for the comments. I don't h...Hi Charles,<br />Thanks for the comments. I don't have experience with LZ, but regarding the last part, the best (still heuristic) method for tANS symbol spread I am aware of is "tuned spread" which uses both quantization and the actual probabilities to correspondingly shift the symbol appearances left or right.<br />If quantization is p[s] ~ q[s]/L, this symbol has q[s] appearances i \in {q[s],...,2q[s]-1} and<br />preferred position for i-th appearance is x ~ 1/(p[s] ln(1 + 1/i)).<br /><br />For a singleton i=1, x ~ 1/(p[s] ln(2)) \in [L,2L-1], so it can well represent probabilities between p[s] ~ 1/(2L ln(2)) ~ 0.72/L for the most-right position (x=2L), to p[s] ~ 1/(L ln(2)) ~ 1.44/L for the most-left position (x=L).<br />https://encode.ru/threads/2013-Asymmetric-numeral-system-toolkit-and-fast-tuned-symbol-spread<br />https://github.com/JarekDuda/AsymmetricNumeralSystemsToolkit<br />Best,<br />JarekJarek Dudahttps://www.blogger.com/profile/11358050996148333936noreply@blogger.com