tag:blogger.com,1999:blog-5246987755651065286.post2618143199109877541..comments2024-02-22T16:15:42.388-08:00Comments on cbloom rants: 10-27-11 - Tiny LZ Decodercbloomhttp://www.blogger.com/profile/10714564834899413045noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-5246987755651065286.post-253865748409164742011-11-25T07:12:17.360-08:002011-11-25T07:12:17.360-08:00What's the purpose of this code?What's the purpose of this code?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-56207770725084199722011-11-02T11:02:22.518-07:002011-11-02T11:02:22.518-07:00@cbloom: Thanks, coming from you that means a lot ...@cbloom: Thanks, coming from you that means a lot to me. I have to give credit to Terje Mathesen though, since he inspired it. You're right, operating in words definitely hurts compression, but it was built for speed, not compression efficiency.<br /><br />@jeorg: Yep, ints can be on during decompression. They can NOT be on while you're setting up the new stack segment obviously :-)Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-23217764263455746402011-10-31T23:22:04.935-07:002011-10-31T23:22:04.935-07:00Forget about the last comment, any int happening w...Forget about the last comment, any int happening will only trash what has already been read. Nice :)Joerghttps://www.blogger.com/profile/13111217588104575785noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-90348638958841294632011-10-31T12:43:11.697-07:002011-10-31T12:43:11.697-07:00Have you been running this with all IRQs turned of...Have you been running this with all IRQs turned off? Just wondering what happens when a timer ticks in ...Joerghttps://www.blogger.com/profile/13111217588104575785noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-76348385906727213182011-10-31T00:02:26.412-07:002011-10-31T00:02:26.412-07:00Clever trick using the stack to pop the compressed...Clever trick using the stack to pop the compressed stream!<br /><br />But the two byte alignment has got to hurt compression quite a bit.<br /><br />BTW the one that someone posted earlier also had a pretty clever trick about jumping past the rep. I didn't really notice it at first, it's something like :<br /><br />rep<br />literal:<br />movsb<br /><br />that never occurred to me before and can save a byte for some decoders.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-83204316072859343172011-10-30T23:10:45.540-07:002011-10-30T23:10:45.540-07:00I've had a "fastest ever on 808x" LZ...I've had a "fastest ever on 808x" LZ decompression routine in my back pocket for a decade, which is extremely small. Since all symbols have a fixed bit length, it is possible to write an optimal parse compressor for it, but optimal parsing is difficult (for me) so I never got around to it.<br /><br />Because size = speed for 808x, the decompressor leverages single-byte opcodes where possible, and also keeps everything in registers so that the only memory accesses are symbol fetches and stores/copies. I had to use the stack segment for this, so I could leverage POP to get symbol info.<br /><br />Hm... looking at this for the first time in a decade, I think I might be able to rewrite it so that it supports overlapping decompression, which would also eliminate the need for the stack segment. Compression would be better (not limited to 16-bit symbols) but speed would be worse. Would be interesting to try it someday. Anyway, here's the block for the curious (and I'm sorry for the formatting, but your blogging platform doesn't allow the PRE tag or other means of source formatting in comments)<br /><br /><b><br />(this assumes you have set SS:SP to the start of your compressed data,<br />ES:DI to the start of the output buffer, and DS:SI equal to ES:DI)<br /><br /> POP DX ;total size of output, so we know when to stop!<br />@cbitloop:<br /> POP BX ;get 16 bits of literal/match data<br />;-------begin of code block that repeats a total of 16 times------<br /> SHL BX,1 ;get leftmost control bit into carry bit<br /> JC @process_literal ;if carry set, next token is a literal<br /> POP CX ;otherwise, it's a code: grab the length,<br /> SHR CX,1 ;SHR to both grab carry bit and adjust range<br /> JC @process_match ;carry bit set? It's a match<br /> POP AX ;not set? It's a run. Grab date to make run<br /> REP STOSW ;output the word AX, CX times<br /> JMP @token_finished ;Finished with this token<br />@process_match:<br /> POP SI ;next token is the offset to copy FROM<br /> REP MOVSW ;Copy CX words<br /> JMP @token_finished ;Finished with this token<br />@process_literal:<br /> POP AX ;grab the literal...<br /> STOSW ;...and store to output buffer<br />@token_finished:<br />;---------end of code block that repeats a total of 16 times-------<br /></b>Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-70916455579386066372011-10-29T19:02:37.260-07:002011-10-29T19:02:37.260-07:00It just seemed to me that when you have an instruc...It just seemed to me that when you have an instruction set that does less the instructions should be smaller.<br /><br />Like x86 has a ton of weird complicated microcoded stuff (like base-10 ascii instructions). If you just trimmed all that fat you should have an instruction set that you can encode a lot smaller.<br /><br />But yeah, that's not really what RISC is.<br /><br />Most RISCs do things like always put a register index and immediate on the instruction even if they aren't needed; there's the weird ARM thing where instructions always contain a shift, etc.<br /><br />In x86 you get lots of special cases like ops that imply a register and thus don't have to specify one at all, and cases like SHL or SHR by 1 that don't have to send the shift amount as an immediate, etc.cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-45778902311977419292011-10-29T18:53:37.454-07:002011-10-29T18:53:37.454-07:00x86 of course is a special case that's particu...<i>x86 of course is a special case that's particularly amenable to small LZ decoders; it's not really fair, all other instruction sets will be much larger. (which is sort of ironic because RISC is supposed to make instruction encoding smaller)</i><br />Where do you get the idea that RISC is supposed to have small instruction encodings?<br /><br />The original RISCs wanted to save area on instruction *decoding* logic, mostly by using more explicit encodings, fewer ops and getting rid of Microcode - with the idea that the area was better spent on more registers and cache. But their fixed 32 bits/instruction were much more than in contemporary CISCs (e.g. VAX had 1 or 2 bytes opcode, plus 1 byte per operand, plus immediates).ryghttps://www.blogger.com/profile/03031635656201499907noreply@blogger.com