tag:blogger.com,1999:blog-5246987755651065286.post4447000919480725725..comments2024-02-22T16:15:42.388-08:00Comments on cbloom rants: 03-15-15 - LZ Literal Correlation Imagescbloomhttp://www.blogger.com/profile/10714564834899413045noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-5246987755651065286.post-84404058044213210202015-05-03T10:37:54.414-07:002015-05-03T10:37:54.414-07:00What kind of LZ match are you doing? Is it a fixe...What kind of LZ match are you doing? Is it a fixed-length (3 byte?) match, or greedy, or what?<br /><br />--<br /><br />This is all interesting to me because I'm looking into cheap feature detectors and classifiers to figure out how to compress whatever input you get, without having to run a bunch of models and adapt between them like a context mixer...<br /><br />lzt99 looks like it may have at least 7-bit ASCII text in it... it's got obvious activity in the lowercase ASCII box as well as the digits box. Hard to tell if it's using any extended or multibyte chars (like enwik) because it's so faint.<br /><br />From just the order 1 raw picture, it looks like fez is multibyte integer data where the values only cover a fraction of the range, so that the the high few bits of the high bytes are all 1's for positive numbers, or all 0's for negative numbers, with no evident imbalance between the two, and a skew toward small absolute values, like in a variable-amplitude waveform.<br /><br /><br /><br /><br /><br /><br /><br /><br /><br />Paul W.https://www.blogger.com/profile/13909647399634037101noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-68053438845539789942015-05-03T10:09:19.933-07:002015-05-03T10:09:19.933-07:00I can't redist files from games.
You can get ...I can't redist files from games.<br /><br />You can get "Fez_Essentials.pak" by downloading or buying the game "Fez".<br /><br />It's a simple tar-like pak file; it has a small text header followed by the binary data.<br /><br />I can redist lzt24 since it is RAD-owned data. You can email me or maybe I'll just post it.<br /><br />cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-24661377728916599892015-05-03T10:03:11.084-07:002015-05-03T10:03:11.084-07:00Can you redistribute your test files? I'd be ...Can you redistribute your test files? I'd be interested in plotting them with (my modded version of) Matt Mahoney's fv program, which shows different regularities and usually makes discontinuities within a file clear.<br /><br />The order 1 raw picture of Fez shows a faint feature in approximately the same place on the diagonal as the bright box in enwik7 for ASCII lower-case letters following each other. (But it sorta looks like 4 faint blobs arranged in a square for some other reason.) Is there any ASCII at all in that file---maybe a text header or something?<br /><br />Paul W.https://www.blogger.com/profile/13909647399634037101noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-5072327250035104202015-05-03T09:42:31.735-07:002015-05-03T09:42:31.735-07:00Fez, lzt24 & lzt99 are some of my test files f...Fez, lzt24 & lzt99 are some of my test files from my collection of videogame test data. I picked them because they seem to be pretty good representatives of some data types (lzt99 is an aggregate of several files).<br /><br />(testing on enwik is considered harmful)<br /><br />The point is that we were investigating LZ literal compression and I thought it might be helpful to visualize the models and see if anything stands out.<br /><br />You can certainly see how different LO correlation is vs O1.<br /><br />You can see that Fez is in fact perfect sub data. You can see that lzt24 has some perfect sub data, but also some strong order0 peaks that are screwed up by sub and xor.<br />cbloomhttps://www.blogger.com/profile/10714564834899413045noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-12510028840797259042015-05-03T07:29:20.902-07:002015-05-03T07:29:20.902-07:00BTW, what is the purpose of this post? Is it to p...BTW, what is the purpose of this post? Is it to pose a puzzle for your readers---what are the sources that look like this under these transforms?<br /><br />Are the fez, lzt24, and lzt99 standard test files like enwik?<br />Paul W.https://www.blogger.com/profile/13909647399634037101noreply@blogger.comtag:blogger.com,1999:blog-5246987755651065286.post-85574583299838748012015-05-02T08:44:31.376-07:002015-05-02T08:44:31.376-07:00I think that the square circled in green in the fi...I think that the square circled in green in the first picture is from 48-57, not 48-58.<br /><br />That's the range of ASCII digit characters 0-9, and presumably what you're seeing is due to numeric data represented as text, and any digit being about equally likely to follow any other digit, once you've stripped away the (typically more redundant) leading digit strings with an LZ match.Paul W.https://www.blogger.com/profile/13909647399634037101noreply@blogger.com