tag:blogger.com,1999:blog-5246987755651065286.post7661682619927260162..comments2024-02-22T16:15:42.388-08:00Comments on cbloom rants: 09-12-10 - Context Weighting vs Escapingcbloomhttp://www.blogger.com/profile/10714564834899413045noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-5246987755651065286.post-88555448442172230032010-09-13T17:58:00.561-07:002010-09-13T17:58:00.561-07:00Yeah I thought someone might point that out, but I...Yeah I thought someone might point that out, but I didn't mention it because you can apply arbitrary warps on the probabilities in the escaping case as well. While it might be important in practice to CM doing well, it's not a fundamental difference that can't be applied back to PPM-like methods. In fact distorting probability before escaping is a classic thing to do, for example exaggerating deterministic probabilities.<br /><br />What you never get in the escaping case is contribution to the probabilities of the characters in the overlapping set, which seems to me to be the biggest difference.<br /><br />The other huge difference is one of convention :<br /><br />In the mixing case you are effectively making a mixing weight for each context which depends on all the contexts (after you normalize the weights to sum to one). That is, the final normalized weight for c0 depends on all the other contexts. <br /><br />In the escaping case, the normal operation is to compute w0 only from context 0, ignoring all other contexts. This obviously is a big difference.<br /><br />Also in escaping you have to have a clear order of "I preffer this context, then this one, then this one", while in mixing you can have various contexts which are equally important.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-20981600918618332522010-09-13T17:50:41.235-07:002010-09-13T17:50:41.235-07:00Actually the context mixing in PAQ8 is done in the...Actually the context mixing in PAQ8 is done in the logarithmic domain.<br /><br />P = squash(W0 * stretch(P0) + W1 * stretch(P1))<br /><br />where<br /><br />stretch(P) = log(P/(1-P))<br /><br />squash(P) = 1/(1 + exp(-P)) = stretch^-1(P)<br /><br />This has the effect of favoring high confidence predictions near 0 or 1. So instead of combining 0.9 with 0.999 to get 0.95, you get something like 0.99 instead.Matt Mahoneyhttps://www.blogger.com/profile/13946883164366534088noreply@blogger.com