tag:blogger.com,1999:blog-5246987755651065286.post5804336968290477141..comments2024-02-22T16:15:42.388-08:00Comments on cbloom rants: 02-10-14 - Understanding ANS - 9cbloomhttp://www.blogger.com/profile/10714564834899413045noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-5246987755651065286.post-89442026319394474872014-11-17T05:37:17.579-08:002014-11-17T05:37:17.579-08:00and, for the record, here is a C-version of the it...and, for the record, here is a C-version of the iteration, showing the emerging 1/x stationary distribution:<br /><br />http://pastebin.com/NPqmsR8LPascalhttps://www.blogger.com/profile/02506306462425421142noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-57304460223158686592014-11-17T04:35:39.115-08:002014-11-17T04:35:39.115-08:00note that you need to make sure that 'b' i...note that you need to make sure that 'b' is not a power of 2. Otherwise, 'x' will be constant.<br /><br /><br />/skalPascalhttps://www.blogger.com/profile/02506306462425421142noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-35507078422851203272014-03-21T07:56:29.193-07:002014-03-21T07:56:29.193-07:00Yes, you only get the perfect 1/x when the statist...Yes, you only get the perfect 1/x when the statistics of the data match the normalized counts exactly (eg. with synthetic data), and when the sort is "Duda's precise".<br /><br />I still don't entirely understand the 1/x.<br /><br />I'd like to see a proof that the optimal sort order must lead to a 1/x distribution, or vice-versa that a 1/x distribution must provide minimum code length.<br />cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-26287682789917410952014-03-20T16:43:27.541-07:002014-03-20T16:43:27.541-07:00I tried to reproduce your result, but so far got o...I tried to reproduce your result, but so far got only a hint that it was the "general direction", not something as clean as your distribution picture.<br /><br />Some differences :<br />- I'm using real-world input, not synthetic<br />- FSE distribution is not the same as yours, and is likely introducing some noise & distortion.<br /><br />Anyway, it shows that the probability of a state is not necessarily a clean 1/X distribution graph, and will vary a lot depending on symbol distribution over the state table, something you hinted into your second graph.Cyanhttps://www.blogger.com/profile/02905407922640810117noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-43862529759463733382014-02-12T10:45:18.162-08:002014-02-12T10:45:18.162-08:00I find the 1/x distribution picture very interesti...I find the 1/x distribution picture very interesting. Maybe there could be a way to use that in calculating more precisely distribution weight.Cyanhttps://www.blogger.com/profile/02905407922640810117noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-81234351623223687752014-02-10T17:16:36.214-08:002014-02-10T17:16:36.214-08:00Hi Charles,
I think your "bias 1 works better...Hi Charles,<br />I think your "bias 1 works better for the real data" corresponds to the fact that in real data there happen extremely low probable symbols - bias 1/2 approximates them as 1/L, while bias 1 a bit lower.<br />If we encode p_s sequence with encoder optimal for q_s distribution, the least probable symbols are the most significant:<br />deltaH ~ sum_s (p_s-q_s)^2/p_s<br /><br />So for a good initializer, I would first take the extremely low probable symbols (let say < 0.5/L) as a singletons to the end of the table, and then use precise initialization for the remaining ones.<br />Additionally, as f[s]/L ~ p[s] quantization is not exact, for example the "bias" could be individually chosen: as a bit larger if p[s] < f[s]/L, or smaller otherwise.<br />... but the details need further work.<br />Best,<br />JarekJarek Dudahttps://www.blogger.com/profile/11358050996148333936noreply@blogger.com