tag:blogger.com,1999:blog-5246987755651065286.post522703953925949523..comments2024-02-22T16:15:42.388-08:00Comments on cbloom rants: 07-02-10 - Length-Limitted Huffman Codescbloomhttp://www.blogger.com/profile/10714564834899413045noreply@blogger.comBlogger9125tag:blogger.com,1999:blog-5246987755651065286.post-43381351036629484312010-07-04T13:21:54.448-07:002010-07-04T13:21:54.448-07:00Yeah there's a better algorithm, it's call...Yeah there's a better algorithm, it's called package-merge ;)cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-82505901664005470872010-07-04T12:46:59.071-07:002010-07-04T12:46:59.071-07:00I wonder if there's a better heuristic algorit...I wonder if there's a better heuristic algorithm where you figure out the highest leaf you'll need to split/move-down to have enough room, you move that down, repeat, so that you don't move things down too far.<br /><br />It's just it seems like, even if you throw out the actual frequency data, the tree itself gives you information about the approximate frequencies that you ought to be able to leverage straightforwardly. (Maybe nobody's bothered looking at it because the gain is minimal in practice?)Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-63128020511191284282010-07-04T08:19:28.285-07:002010-07-04T08:19:28.285-07:00In this case there is a real problem with that len...<i>In this case there is a real problem with that length 1 code, but before it gets there it first bumps the length 4 code all the way up to 7.</i><br /><br />This is what I missed, thanks.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-23421400142948155252010-07-04T00:26:18.314-07:002010-07-04T00:26:18.314-07:00" I'm confused why a heuristic algorithm ..." I'm confused why a heuristic algorithm wouldn't have preserved a shorter code for the second symbol. "<br /><br />Because it's heuristic so it's not ideal.<br /><br />It bumps all the long code lens up to 8. Then it has to fix the codes by making others longer. It starts with the lowest count codes. In this case there is a real problem with that length 1 code, but before it gets there it first bumps the length 4 code all the way up to 7. But that isn't enough so it winds up making the length 1 code into length 2.<br /><br />Then it does the backwards pass to shorten some codes. You'd like it to shorten that second most likely one back to 5, but it can't do that unless it also lengthens some of the 7's up to 8, which it isn't considering any more.<br /><br />The way to solve this correctly of course is just to do package merge, which is addressed in the follow-up to this post.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-2253116777321629142010-07-04T00:01:36.401-07:002010-07-04T00:01:36.401-07:00What does the unconstrained version of that look l...What does the unconstrained version of that look like? I'm confused why a heuristic algorithm wouldn't have preserved a shorter code for the second symbol.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-50695761209847858992010-07-03T12:46:50.344-07:002010-07-03T12:46:50.344-07:00If you artificially put a lot of pressure on the s...If you artificially put a lot of pressure on the system you can get big differences. For example with a code length limit of 8 :<br /><br />heuristic :<br /><br />2 : 1 , 0.628931%<br />7 : 34 , 21.383648%<br />8 : 124 , 77.987419%<br />entropy : 2.655<br /><br />optimal :<br /><br />2 : 1 , 0.628931%<br />5 : 1 , 0.628931%<br />6 : 5 , 3.144654%<br />7 : 12 , 7.547170%<br />8 : 140 , 88.050316%<br />entropy : 2.607cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-67320577700393502362010-07-03T12:44:40.173-07:002010-07-03T12:44:40.173-07:00For example, on a real file ("pic") with...For example, on a real file ("pic") with limit = 16, these are the number of codes of each length generated by the heuristic or optimal builder :<br /><br />Heuristic :<br /><br />1 : 1 , 0.628931%<br />4 : 1 , 0.628931%<br />5 : 2 , 1.257862%<br />6 : 14 , 8.805032%<br />7 : 11 , 6.918239%<br />8 : 9 , 5.660378%<br />9 : 9 , 5.660378%<br />10 : 7 , 4.402516%<br />11 : 10 , 6.289308%<br />12 : 9 , 5.660378%<br />13 : 13 , 8.176101%<br />14 : 20 , 12.578616%<br />15 : 3 , 1.886792%<br />16 : 50 , 31.446541%<br /><br />Optimal :<br /><br />1 : 1 , 0.628931%<br />4 : 1 , 0.628931%<br />5 : 2 , 1.257862%<br />6 : 14 , 8.805032%<br />7 : 11 , 6.918239%<br />8 : 9 , 5.660378%<br />9 : 9 , 5.660378%<br />10 : 7 , 4.402516%<br />11 : 10 , 6.289308%<br />12 : 9 , 5.660378%<br />13 : 11 , 6.918239%<br />14 : 21 , 13.207547%<br />15 : 14 , 8.805032%<br />16 : 40 , 25.157232%<br /><br />you can see they are in fact quite different, but in both cases the actual coded entropy is exactly 1.661cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-74762019533978888952010-07-03T12:42:45.626-07:002010-07-03T12:42:45.626-07:00"--are the codes significantly different from..."--are the codes significantly different from what you'd get by building the regular huffman tree and then shuffling the leaves upwards?"<br /><br />Well of course "it depends".<br /><br />The short answer is that yes, they are very different, but it also doesn't matter.<br /><br />The question is how much pressure you are putting on the code length limit. eg. what % of the original symbols were over length limit. If the length limit is reasonably long, then that % will be very small because the least likely symbols are the long ones.<br /><br />So because of that, even though the heuristic way may be far from optimal, it just doesn't matter.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-69391266309699071942010-07-03T07:41:49.724-07:002010-07-03T07:41:49.724-07:00I've never actually looked at this either--are...I've never actually looked at this either--are the codes significantly different from what you'd get by building the regular huffman tree and then shuffling the leaves upwards?Anonymousnoreply@blogger.com