1/23/2015

01-23-15 - LZA New Optimal Parse

Oodle LZA has a new optimal parse.

It's described at encode.ru and I don't feel like repeating myself today. So go read about it there. Thanks to Bulat Ziganshin for explaining the LZMA-style parse in a way that I could finally understand it.

First see Bulat's earlier post [LZ] Optimal parsing which is a description of Tornado's optimal parse and probably the clearest place to get started.

This style of parse is also described in the paper "Bit-Optimal Lempel-Ziv compression".

Some previous rants on the topic :

cbloom rants 10-10-08 - 7
cbloom rants 01-09-12 - LZ Optimal Parse with A Star Part 5
cbloom rants 09-04-12 - LZ4 Optimal Parse
cbloom rants 09-11-12 - LZ MinMatchLen and Parse Strategies
cbloom rants 09-24-12 - LZ String Matcher Decision Tree
cbloom rants 06-12-14 - Some LZMA Notes
cbloom rants 06-16-14 - Rep0 Exclusion in LZMA-like coders
cbloom rants 06-21-14 - Suffix Trie Note

I should note that the advantage of this optimal parse over my previous LZA optimal parse (backward-forward-chain-N) is that it can achieve roughly the same compression in much less time. The chain-N forward parse can get more compression if N is increased, but the time taken is exponential in N, so it becomes very slow for N over 2, and N over 4 is unusable in practice.

New LZA Optimal level 1 (-z5) uses the straightforward version of this parse. I flush the parse to update statistics any time there are no matches that cross a position (either because no match is possible, or a long match is possible in which case I force it to be chosen, ala LZMA). This looks like :

New LZA Optimal level 2 (-z6) stores the 4 cheapest arrivals to each position. This allows you to arrive from a previous parse which was not the cheapest on its prior subgraph, but results in a cheaper total parse to your current subgraph. You get this sort of braided 4-line thing.

When you flush the parse to output codes and update statistics, you choose one specific path through the parse. You trace it backwards from the end point, because you have arrivals, then you reverse it to output the codes.

Here's a drawing showing the trace backwards to flush the parse after it's been filled out :

At the flush pos, you take arrival slot 0 (the cheapest) and discard the other arrivals. When you resume from that spot to continue the parse you must resume from only slot 0. It sort of reminds me of the earlier QM post - the parse is uncertain, not set, with these 4 different realities that are possible at each position, until you make a Copenhagen measurement and actually flush the parse, at which point the alternative histories are discarded and you snap to just one. When you resume the parse, the number of arrivals very rapidly grows from 1 up to the maximum of 4, and then stays at 4 through the parse until you snap back to 0.

Unlike the level 1 parse, I do not sync at unambiguous points because you want to keep the possibilities alive for a while. There's a tradeoff between up-to-date statistics and longer flexible parse intervals. (the reason this tradeoff exists is that I don't bring the statistics forward in the parse, though it is possible to do as described by Bulat in the encode.ru thread, my tests indicate it's not worth the speed & memory use hit).

I'm going to show some prints of an actual parse. The columns are the 4 arrivals, and positions increase going down.

The first number is how many positions steps back to your previous point, and the next number is which thread (arrival context) you came from at that position.

So a literal is step back 1 :


  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3

This is a chunk of parse where only literals are possible, and the costs are about the same in each context so the states just carry forward.

  1|0   1|1   1|2   1|3
  1|0   1|2   1|1   1|3
  1|0   1|1   1|2   1|3

Literals don't always just carry forward, because the cost to code a literal depends on the last-offset context which can be different in each arrival. Here threads 1 & 2 swapped because of literal cost differences.

Through the vast majority of the parse, slot 0 (cheapest) arrives from slot 0. Through all that range, the alternate parses are playing no role. But once in a while they swap threads. Here's an example with the trace-back highlighted :


 [3|0]  3|1   3|2   1|0
 [1|0]  1|1   1|2   1|3
 [1|0]  1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  5|0  [6|0]  5|1   6|1
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|2   1|1   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  8|0  [8|1]  8|2   8|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  4|0  [4|1]  4|2   4|3
  1|0  [1|1]  1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  3|0  [3|1]  3|2   3|3
  1|0  [1|1]  1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
  1|0   1|1   1|2   1|3
 [7|1]  7|3   7|0   7|2

If the last pos there was a flush point, that would be the parse we trace back to output. Remember that the length is not the length coded at that pos, but the length to *arrive* at that pos (the length coded at the previous pos).

A more detailed printout showing how this type of thing happens. Here I'm also printing match vs. last-offset match, and the offset used in the match :


 [4L+708|0] 3L+708|0  4L+708|1  4L+708|2
 [1 +  0|0] 1 +  0|1  1 +  0|2  1 +  0|3
  1 +  0|0  1 +  0|1  1 +  0|2  1 +  0|3
  1 +  0|0  1 +  0|1  1 +  0|2  1 +  0|3
 [3L+708|0] 3L+708|1  3L+708|2  3L+708|3
  1 +  0|0  1 +  0|1  1 +  0|2  1 +  0|3
  1 +  0|0  1 +  0|1  1 +  0|2  1 +  0|3
  1 +  0|0  1 +  0|1  1 +  0|2  1 +  0|3
 [4M+228|0] 4M+228|1  4M+228|2  3M+228|0
  1 +  0|0  1 +  0|1  1 +  0|3  1 +  0|2
  1 +  0|0  1 +  0|1  1 +  0|2  1 +  0|3
  1 +  0|0  1 +  0|1  1 +  0|2  1 +  0|3
 [4M+732|0] 4M+732|1  1 +  0|0  3M+732|0
 [1 +  0|0] 1 +  0|1  1 +  0|2  1 +  0|3
  1 +  0|0  1 +  0|1  1 +  0|3  1 +  0|2
  1 +  0|0  1 +  0|1  1 +  0|3  1 +  0|2
  4M+ 12|0 [3L+732|0] 5L+ 12|0  4L+ 12|2
  1 +  0|0  1 +  0|1  1 +  0|2  1 +  0|3
  1 +  0|0  1 +  0|1  1 +  0|2  1 +  0|3
  1 +  0|0  1 +  0|1  1 +  0|2  1 +  0|3
  4M+252|0 [4M+252|1] 4M+252|2  4M+252|3
  1 +  0|0  1 +  0|1  1 +  0|2  1 +  0|3
  1 +  0|0  1 +  0|1  1 +  0|2  1 +  0|3
  1 +  0|0  1 +  0|1  1 +  0|2  1 +  0|3
 [4L+708|1] 4L+708|2  4L+708|3  4M+708|0

In the last position here we want to code a match of length 4 at offset 708. Arrivals 1-3 can code it as a last-offset, but arrival 0 (the cheapest) does not have 708 in the last-offset set (because it previously coded 252,12,732,228 and knocked the 708 out of its history).

You can see way back in the past 708 was used twice, and threads 1-3 only used three other offsets (my last-offset set keeps four offsets) so they still have 708.

This is really what the multiple arrivals is all about. You get an extremely non-local effect, that by keeping that 708 in the last offset set, you get a cheaper parse way off in the future when it's used again.

Another one because I find this amusing :


  1 +  0|0  1 +  0|1  1 +  0|2  1 +  0|3
 15M+360|0 12M+360|0[16M+720|0]12M+360|1
  1 +  0|0 [1 +  0|2] 1 +  0|1  1 +  0|3
 [1 +  0|1] 1 +  0|0  1 +  0|2  1 +  0|3
 [1 +  0|0] 1 +  0|1  1 +  0|2  1 +  0|3

What's happened here is the literal coding is cheaper with offset 720 as the last-offset. It's not until two literals later that it becomes the winner. The first literal after the match uses LAM exclusion, and then the next literal after that uses LAM as coding context. ( as described here )


I've been highlighting interesting bits of parse. It should be noted that the vast majority of every parse is just staying on the channel-0 (cheapest) thread. Like this :


     1688: [  1 +  0|0]   1 +  0|1    1 +  0|2    1 +  0|3 
     1689: [  1 +  0|0]   1 +  0|1    1 +  0|2    1 +  0|3 
     1690: [  1 +  0|0]   1 +  0|1    1 +  0|2    1 +  0|3 
     1691: [  1 +  0|0]   1 +  0|1    1 +  0|2    1 +  0|3 
     1692: [  1 +  0|0]   1 +  0|1    1 +  0|2    1 +  0|3 
     1693: [  1 +  0|0]   1 +  0|1    1 +  0|2    1 +  0|3 
     1694: [  1 +  0|0]   1 +  0|1    1 +  0|2    1 +  0|3 
     1695:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1696:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1697: [  3M+167|0]   3M+167|1    3M+167|2    1 +  0|0 
     1698:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1699:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1700: [  3M+ 45|0]   4M+ 45|0    3M+ 45|1    3M+ 45|2 
     1701:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1702:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1703:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1704:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1705:    3M+ 55|0    3M+ 55|1    3M+ 55|2    3M+ 55|3 
     1706:    6M+ 58|0    6M+ 58|1    6M+ 58|2    6M+ 58|3 
     1707:    7M+251|0    7M+251|1    1 +  0|0    1 +  0|1 
     1708: [  8M+330|0]   8M+330|1    8M+330|2    1 +  0|0 
     1709:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1710:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1711:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 

Anyway, here's one I spotted which exhibits unusual jumping around :

     3642: [  4L+612|0]
     3643: [  1 +  0|0]
     3644: [  1 +  0|0]
     3645: [  1 +  0|0]
     3646: [  1 +  0|0]
     3647: [  1 +  0|0]
     3648:    1 +  0|0 
     3649:    1 +  0|0 
     3650: [  3M+192|0]   1 +  0|0 
     3651:    1 +  0|0    1 +  0|1 
     3652:    1 +  0|0    1 +  0|1 
     3653:    1 +  0|0    1 +  0|1 
     3654:    5L+ 12|0 [  4M+ 12|0]   4L+ 12|1    3M+ 12|0 
     3655:    1 +  0|0 [  1 +  0|1]   1 +  0|2    1 +  0|3 
     3656:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     3657:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     3658:    3L+612|0 [  3L+612|1]   3L+612|2    3L+612|3 
     3659:    1 +  0|0 [  1 +  0|1]   1 +  0|2    1 +  0|3 
     3660:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     3661:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     3662:    3L+228|0 [  3L+192|1]   3L+228|2    3L+192|3 
     3663:    1 +  0|0 [  1 +  0|1]   1 +  0|2    1 +  0|3 
     3664:    1 +  0|0    1 +  0|2    1 +  0|1    1 +  0|3 
     3665:    1 +  0|0    1 +  0|2    1 +  0|1    1 +  0|3 
     3666:    3L+228|0    4M+624|0    4M+624|1 [  3L+612|1]
     3667:    1 +  0|0    1 +  0|1 [  1 +  0|3]   1 +  0|2 
     3668:    1 +  0|0    1 +  0|1    1 +  0|3    1 +  0|2 
     3669:    1 +  0|0    1 +  0|1    1 +  0|3    1 +  0|2 
     3670:    3M+576|0    3M+576|1 [  3M+576|2]   3M+576|3 
     3671:    1 +  0|0    1 +  0|1 [  1 +  0|2]   1 +  0|3 
     3672:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     3673:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     3674: [  3L+192|2]   3L+192|3    3M+192|0    3M+192|1 
     3675:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     3676:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     3677:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     3678: [  4L+ 12|0]   4M+ 12|1    3L+ 12|0    3L+624|1 

All the previous examples have been from highly structured binary files. Here's an example on text :


     1527: [  1 +  0|0]   1 +  0|1    1 +  0|2    1 +  0|3 
     1528:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1529:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1530:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1531: [  4M+100|0]   4M+100|1    4M+100|2    4M+100|3 
     1532:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1533:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1534:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1535:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1536:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1537:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1538:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1539:   10M+ 32|0   10M+ 32|1 [  8M+ 32|0]   8M+ 32|1 
     1540:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1541:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1542:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1543:    3M+ 47|0    3M+ 47|1    3M+ 47|2    3M+ 47|3 
     1544:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1545:    5M+ 49|0    5M+ 49|1    5M+ 49|2    5M+ 49|3 
     1546:    5M+  1|0    5M+  1|1    5M+  1|2    5M+  1|3 
     1547:    8M+ 50|0 [  8M+ 50|2]   8M+ 50|1    8M+ 50|3 
     1548:    1 +  0|0 [  1 +  0|1]   1 +  0|2    1 +  0|3 
     1549:    1 +  0|0    1 +  0|1    1 +  0|2    1 +  0|3 
     1550: [  2L+999|1]   2L+999|3    2L+999|0    2L+999|2 
     1551: [  1 +  0|0]   1 +  0|1    1 +  0|2    1 +  0|3 
     1552: [  1 +  0|0]   1 +  0|1    1 +  0|2    1 +  0|3 

The benefit of the N-arrival parse is far far lower on text than on binary.


text :

enwik8 -z5 : 25384698
enwik8 -z6 : 25358366

binary :

fez_essentials -z5 : 9780036
fez_essentials -z6 : 9512780

lzt02 -z5 : 146098
lzt02 -z6 : 141141

lzt25 -z5 : 63376
lzt25 -z6 : 56982


Some random notes :

In LZA I do not have the N^2 problem like Tornado. For each match, I only consider the maximum possible match length. The reason I can do this is because I enforce strict LAM exclusion. Shorter lengths are not allowed because they would not exclude the LAM.

This is not an "optimal" parse in the sense of being exactly right. To be a true optimal parse, you would have to carry the statistics forward through the tree, and you would have to store *all* arrivals to a given position, not just the best 4.

The A-Star parse I previously wrote about is pretty similar to this actually. Rather than only store 4 arrivals at each position, it can store an unbounded # of arrivals, and it tries to discard paths that cannot ever beat the best path (with a fuzzy tolerance to make that viable).

Storing the 4 cheapest arrivals is maybe not right. You want the 1 cheapest, but then what you want for the other 3 is for them to be interesting alternatives. For example you want their last-offset set to be different than the cheapest arrival, you don't just want a more expensive way to wind up with the same thing. I've done some limited experiments with this but haven't found anything definitive yet.

There's also the issue of deciding when to reset the statistical state.

Because statistics aren't carried forward, there's the issue of parse-statistics feedback. In particular, some hueristic guidance does help (like don't ever take a normal match when a longer repeat-match is possible). Also if the beginning of the file is unusual, it can lead you into a bad local minimum.

1/22/2015

01-22-15 - Nexus 5 Usability

The physical design of these phones is so rotten.

First of all, here's a fundamental principle of good device design :

It should not look symmetric unless it actually *is* symmetric.

I frequently pick it up and try to tap the on button, only to discover that I have it upside down. I shouldn't have to spend those seconds staring at the thing to figure out which end is up.

Since it's in fact behaviorally *not* symmetric, they shouldn't have tried so hard to disguise the buttons and make it look symmetric. It should have a strong different-color strip on one end to make it obvious which end is up. This whole "make it look like a single color brick" that came from Apple is fucking retarded design.

Also the side-buttons should not be black and blend in to the device. They should be a contrast color.

My grilling process would be like this : "why are the buttons the same color as the body?" "uhh, because it looks cool?" "you're fired". "which end is up?" "well, it looks cooler all the same color" "you're fired".

A related symmetry complaint is that it doesn't rotate the screen when you turn it upside down. Again if it looks fucking symmetric, then act symmetric. Don't leave me there going "why the fuck are you not rotating the screen? Oh it's fucking upside down". If I turn it upside then fucking rotate the screen the way I'm holding it.

And of course the fucking volume-up-down should be a physical slider so that I can tell the level at a glance and change it even when the thing is off. And tapping it by accident doesn't mute it or send the volume through the roof or any shit like that. Like all fucking knobs and sliders should *always* be physical not fucking digital shuttles OMG. Fucking AC controls and volume knobs in cars that are just digital shuttles make me want to murder everyone.

Maybe the neatest symmetry solution would be to make it slightly tapered. Make the bottom slightly narrower than the top, which would fit the hand better anyway. More curved at the bottom and more square at the top.

There's a huge hardware problem in that if you have a finger on the screen it won't register other taps. I don't understand how this acceptable. If I ham-fist it and accidentally touch the screen with my gripping hand, it just doesn't work. WTF. Perhaps worse, if my baby has her finger on it, then I can no longer operate it to change to a video for her or whatever.

Many of the standard Google Apps have major GUI flaws.

For example in the Camera it's way too hard to change the basic function, such as switching between pictures and video or switching the camera direction, you have to go through some fucking settings menu; they should just be buttons.

In the phone, it's way too easy to call someone when you're just trying to get info on your call history. (I had to add a "call confirm" app because I kept dialing numbers when I scrolled around)

In all the apps, if you have them set to only sync on wifi or only sync on charger, there's no override button to just "fucking sync now anyway". You have to dig into the menus and change the setting then change it back after syncing, very annoying.

In gmail it's nearly fucking impossible to change between threaded & non-threaded view, which would be handy to do frequently. (the threaded view also makes it really hard to find the newest mail)

But there's a worse problem in general with apps.

It's just completely hidden what you can actually do in them. What areas can I click on? What are buttons and what do they do? And then some spots are different actions if I hold a tap there. And some spots are a different action if I drag there. WTF WTF. This is such rotten usabibility GUI design.

One option to fix it would be to obviously color all GUI elements based on how they can be used. Red lines are draggable, blue elements are tappable, green have a tap-and-hold extra action.

Even aside from making it more obvious, there needs to be a way to highlight all GUI actions. Somewhere I can tap that will make them all pop up tool-tips, or animate "this is holdable, this is draggable" etc.

Why do I have to lecture people on fucking 1950's GUI principles. Make fucking buttons looks like buttons! Don't hide them as graphical elements! Function over form please!

The copy-paste interface is horrific. The main way I run into it is if I get something like an address in an email and it fails to auto-link to the maps app, and I want to manually get that address over there. For fucks sake it's impossible. Or like if I find a URL and it doesn't auto-link to the browser, and I have to try to select it and copy it, I want to kill myself. Which I guess is just a general phone problem - as long as the function you want is hard-coded in, it's okay, but god help you actually make apps interact in a nonstandard way. (like take a photo that's on your phone and upload it as an attachment in a web forum post)

Some other random complaints while I'm at it :

There's no fucking missed calls & texts notification on the home page or lock page. Uh, hello. Make it work as a basic fucking phone please. It's like the most basic fucking thing you need in a phone, it should tell you what calls and texts you got. I have to fucking browse around various apps just to find out if I got any calls or texts? WTF. (yes I know I can buy an app to add these features, but seriously WTF).

(addendum : seriously WTF. Recent missed calls and texts need to be on the home page. This is one complaint I actually need to fix because clicking around every time I turn the phone on is ridiculous.)

There seems to be no way to cache my entire local region in maps. For a real world where we have spotty cell coverage in the country, this is pretty awful. In general offline use (as you might want in areas of terrible reception, eg. everywhere) is extremely poor.

There seems to be no way to clear all private data on every boot. I guess nobody in the world but me even tries to have any privacy these days. I'd like to have no saved cookies, no web passwords, and zero app-sign-ins on every boot.

I should be able to get on the internet over the USB cable.

I should be able to remote control the phone from a PC app over a USB cable. I'd like to do all my setup and management from a PC with a fucking mouse and keyboard please, not by poking with my ham-fist. I'd like to have just a big tree-view of all the options so I can clearly see them and not have to poke around to find them.

I fucking hate the way maps is always trying to twist on me. Fucking up is north. Seems to not be a setting to lock north, I have to constantly tap the fucking icon. Very annoying.

There needs to be a global option to turn off all starring of passwords globally. Becaused starred out passwords and ham-fingers don't mix.

1/14/2015

01-14-15 - Spooky Action at a Distance

I loved "Only Lovers Left Alive". There's not much to the plot, but as just a bit of mood it's lovely. The design, the music, everything is perfect. It reminds me of my youth, actually. An aethestic and lifestyle that's no longer fashionable.

Adam's song - Spooky Action At A Distance - is fantastic.

BUT - there is no fucking spooky action at a distance!

Ooo, you say, Einstein was confused about it. It must be all mysterious and nobody understands it. NO!

Quantum Mechanics is not some crazy mysterious metaphysical thing that science can't explain and therefore allows all sorts of quackery.

First of all, the average person shouldn't read thoughts about Quantum Mechanics that were written in the 20's and 30's when it was still new and not very well understood yet. Hey, obviously they were a little confused back then. Read something modern!

Second of all, I think the way QM is taught in schools is still largely terrible. I don't know how it's done now, but when I went through it it was still being taught in a "historical" way. That is, going through the progression in the same way it was discovered, roughly. First you learn quantization of photons, then you get a wave function, you get copenhagen interpretation and measurement collapse, and then you finally start to get bra-ket and hilbert spaces and only much later (grad school) do you get decoherence theory.

The right way to teach QM is to jump straight to the modern understanding without confusing people with all that other gunk :

1. The universe is a linear vector space of *amplitudes* (this is the source of all the weirdness and is the fundamental postulate) (the universe seems to work in the square roots of things, and double-covers, and this leads to a lot of the non-classical weirdness)

2. The entire universe is QM, there is no classical "observer" that lies outside the QM system.

3. The universe takes all possible outcomes all the time. However, an observer in any given state is not in all of those outcomes.

4. Measurement is the process of entangling the QM state of the measurement device with the QM state of the observed process.

In a very brief nutshell measurement decoherence (entanglement of the measurement device / outside world with a quantum system) goes like this :


You have a measurement device which is initially in an undetermined state |M>

You have something to measure.  Let's take a quantum state which is either 0 or 1 
(eg. a single particle that's either spin up or down) 

|0> + |1>

Initially they are not entangled :

|M> ( |0> + |1> )

And measurement changes them to be entangled :

( |M0>|0> + |M1>|1> )

M0 means the measurement device has observed the state 0

Note that there is no "collapse". Both possibilities still exist, but if the measurement device observes a 0 then the particle must be in state 0.

Now, in the book teaching QM we'd have to go into more detail about what "measurement decoherence" actually is. Because the entire universe is governed by QM, we'd have to show that that process happens by QM as well. In fact that has been done and the details worked out in the past 40 years, so it's relatively new.

Measurement decoherence doesn't happen at the single-particle level. Rather it's a large-scale phenomenon that happens due to entropy, much like thermodynamics. This is why "measurement collapse" could be seen as a connection to an outside classical system for large objects, because it only happens at large system scales. It only happens one way - you can entangle the measurement device with the quantum particles, but you can never de-entangle them (this is why in the old copenhagen interpretation you could get away with talking about "collapse", because if the measurement device is classical-scale then it can never go backwards). The basic laws of QM are time-reversible, but decoherence is not. It's exactly like if you took a bucket of red water and a bucket of blue water and mixed them in one larger bucket - it is now effectively impossible to ever separate out all the particles and restore the two different buckets.

So. The so-called "EPR Paradox" that is not a paradox at all. (well it is if you assume a deterministic physics with hidden variables, which is just wrong; it should be called the EPR Proof that Einstein was Wrong Sometimes). The EPR thought experiment goes like this :

You have two particles that are either in state 0 or 1, and are entangled so that they are either both 0 or both 1 :


|0>|0> + |1>|1>

You separate them to opposite ends of the universe, where lie Alice and Bob :

|A>|B>( |0>|0> + |1>|1> )

Bob then measures his particle and becomes entangled with it, in state B0 or B1 :

|A>( |B0>|0>|0> + |B1>|1>|1> )

If Alice now measures her particle, she will get either a 0 or 1, and see the same thing as Bob. However, nothing has moved faster than light. Her entanglement happens entirely on her local particle :

|A>|0> -> |A0>|0>

it has nothing to do with the state of Bob far away. The initial entangled state of the particle simply gaurantees they will get the same result. But the physical interaction is still local and slower than light, and the information about the result must travel slower than light.

There is never any "action at a distance" or any "paradox" or anything "spooky".

Sheesh. Come on vampires, you've been alive long enough to get your physics right.

1/05/2015

01-05-15 - Daala PVQ Summary

A much belated attempt at summarizing the Daala PVQ coder. This post is in the wrong temporal order, it's sort of an introduction before the other rambling.


First consider Intra, that is the case of no prediction.

We have a block of some variable size (4x4 to 16x16) that has been transformed. The DC is sent by some other mechanism, so we are left with some AC's.

The AC's are grouped into subbands. These are just collections of likely-similar AC coefficients. Ideally they should be statistically similar. The current subbands in Daala are :

The AC coefficients in each subband are consider a vector. We first send the "gain" of the vector, which is the L2 norm (the "length" of the vector). (*1)

The gain can be sent with a non-uniform scalar quantizer. This allows you to build variance-adaptive quantization into the codec without side data. (H264 by sending per-block quantizer changes based on activity, which costs extra bits in signalling). Essentially you want to use larger quantization bins for larger gains, because specifying the exact amount of a large energy is not as important. Daala seems to take gain to the 2/3 power and uniformly quantize that. (*2) (*3)

Once we've sent the gain, we need to send how that energy is distributed on the coefficients. I won't go into the details of how it's done. Of course you have already sent the total energy so you don't want to just send the coefficient values, that would be highly redundant. You may think of it as dividing the vector by the gain, we are left with a unit vector. The Fischer "Pyramid Vector Quantizer" is one good starting point.

Note that Fischer PVQ is explicitly not quite applicable here, because it is only optimal if the coefficients in the vector are independent and have the same distribution, which is not the case in images. Because of that, just sending the index of the PVQ codebook without entropy coding is probably wrong, and a way of coding a PVQ codebook selection that uses the properties of the AC coefficients is preferrable. (eg. code one by one in a z-scan order so you know likelihood is geometrically decreasing, etc. Daala has some proposals along these lines).

A key issue is determining the PVQ codebook resolution. With PVQ this is K, the L1 norm of the unnormalized (integer coefficients) codebook vectors. Daala computes K without sending it. K is computed such that error due to the PVQ quantization is equal to error due to gain quantization. (*4). This makes K a function of the overall quantization level, and also of the gain of that block - more gain = more K = more resolution for where that gain goes.

A non-uniform quantization matrix is a bit murky in this scheme (ala the JPEG CSF factors) because it changes the scaling of each axis in the vector, as well as the average magnitude, which violates the assumptions of Pyramid VQ. Applying a different quantizer per subband is an easy compromise, but pretty coarse. (*5)

And that's it for intra.

Advantages vs. traditional residual coding :

1. You send the gain up front, which is a good summary of the character of the block. This allows for good modeling of that gain (eg. from neighbors, previous gains). It also allows that gain to be used as a context or coding model selector for the rest of the block.

2. Because you send the gain explicitly, it is better preserved than with traditional coding (especially with Trellis Quant which tends to kill gain). (deadzone quantizers also tend to kill gain; here a deadzone quantizer might be appropriate for the total gain, but that's better than doing it for each coefficient independently). Preserving gain is good perceptually. It also allows for separate loss factor for gain and distribution of energy which may or may not be useful.

3. Because you send the gain explicitly, you can use a non-linear quantizer on it.

4. You have a simple way of sending large subbands = zero without coding an extra redundant flag.

5. No patents! (unbelievable that this is even an issue, all you software patenters out there plz DIAGF)

asides :

*1 = In my studies I found preserving L1 norm of AC activity to be more visually important than preserving L2 norm. That doesn't mean it's better to code L1 norm though.

*2 = One advantage of this scheme is having VAQ built in. A disadvantage is that it's hard-coded into the codec so that the encoder isn't free to do adaptive quantization based on better human visual studies. Of course the encoder can send corrections to the built-in quantizer but you better be careful about tweak the VAQ that's in the standard!

*3 = I always found variable quantization of log(gain) to be appealing.

*4 = My perceptual studies indicate that there should probably be more error due to the distribution quantization (K) than from gain quantization.

*5 = Of course, applying a CSF-like matrix to *residuals* as is done in most video coders is also very murky.


Okay, but now we have prediction, from motion compensation or intra prediction or whatever. You have a current block to encode and a predicted block that you think is likely similar.

The problem is we can't just subtract off the predicted block and make a residual the way normal video codecs do. If you did that, you would no longer have a "gain" which was the correct energy level of the block - it would be an energy level of the *residual*, which is not a useful thing either for perceptual quality control or for non-uniform quantization to mimic VAQ.

Sending the gain is easy. You have the gain for each predicted subband, so you can just predict the gain for the current subband to be similar to that (send the delta, context code, etc.). You want to send the delta in quantized space (after the power mapping). The previous block might have more information than you can preserve at the current quantization level (it may have a finely specified gain which your quantizer cannot represent). With a normal residual method, we could just send zero residuals and keep whatever detail was in the previous frame. Daala makes it possible to carry this forward by centering a quantization bucket exactly on the previous gain.

Now we need the distribution of coefficients. The issue is you can't just send the delta using Pyramid VQ. We already have the gain, which is the length of the coefficient N-vector , it's not the length of the delta vector.

Geometrically, we are on an N-sphere (since we know the length of the current vector) and we have a point on that sphere where the predicted block was. So we need to send our current location on the sphere relative to that known previous position. Rather than mess around with spherical coordinate systems, we can take the two vectors, and then essentially just send the parallel part (the dot product, or angle between them), and then the perpendicular part.

JM Valin's solution for Daala using the Householder reflection is just a way of getting the "perpendicular part" in a convenient coordinate system, where you can isolate the N-2 degrees of freedom. You have the length of the perpendicular part (it's gain*sin(theta)), so it's a unit vector, and we can use Pyramid VQ to send it.

So, to transmit our coefficients we send the gain (as delta from previous gain), we send the extent to which the vectors are parallel (eg. by sending theta (*6)), we then know the length of the perpendicular part and just need to send the remaining N-2 directional DOF using Pyramid VQ.

As in the intra case, the K (quantizer resolution) of the Pyramid VQ is computed from other factors. Here obviously rather than being proportional to the total gain, it should be proportional to the length of the perpendicular part, eg. gain*sin(theta). In particular if you send a theta near zero, K goes toward zero.

One funny thing caused by the Householder reflection (coordinate system change to get the perpendicular part) is that you've scrambled up the axes in a way you can't really work with. So custom trained knowledge of the axes, like expected magnitudes and Z-scan order and things like that are gone. eg. with a normal coefficient delta, you know that it's very likely the majority of the delta is in the first few AC's, but after the rotation that's lost. (you can still build that in at the subband level, just not within the subbands).

Another funny thing is the degeneracy of polar conversion around the origin. When two vectors are very close (by Euclidean distance) they have a small angle between them, *if* they are long enough. Near the origin, the polar conversion has a pole (ha ha punny). This occurs for subbands near zero, eg. nearly flat, low energy blocks. Since the gain has previously been sent, it's possible that could be used to change to a different coder for gains near zero (eg. just use the Intra coder). (in Daala you would just send a "noref" flag). To be clear, the issue is that the vectors might be very close in Euclidean distance, and thus seem like good matches based on SAD motion search, but could easily be vectors pointing in completely opposite directions, hence be very bad to code using this theta scheme.

And I think that's about it.

The big goal of this funny business is to be able to send the gain (length) of the current subband vector, rather than sending the length of the delta. This gives you the advantages as discussed previously in the simpler Intra case.

asides :

*6 = send theta, or sin(theta) ? They have slightly different quantization bucket scaling. Most of this assumes that we have a reasonably good prediction so theta is small and theta ~= sin(theta).


Geometrically with white board drawing :

Traditional video coders just form the Euclidean "Delta" vector and send its components.

Daala Intra (no prediction) takes the "Current" vector, sends its length, and then its unit vector (direction) using Pyramid VQ.

Daala Predictive VQ sends the length of "Current", the angle (theta) from Predicted to Current, and the unit vector (direction) of the "Perpendicular" vector.

(of course Daala doesn't send the "Perpendicular" vector in the coordinate space draw above; it's reflected into the space where "Predicted" is aligned with an axis, that way Perpendicular is known to have a zero in that direction and is simply a vector in N-1 dimensions (and you've already sent the length so it has N-2 DOF))

old rants