5/19/2010

05-19-10 - Some quick notes on VP8

The VP8 release is exciting for what it might be in two years.

If it in fact becomes a clean open-source video standard with no major patent encumbrances, it might be well integrated in Firefox, Windows Media, etc. etc. - eg. we might actually have a video format that actually just WORKS! I don't even care if the quality/size is really competitive. How sweet would it be if there was a format that I knew I could download and it would just play back correctly and not give me any headaches. Right now that does not exist at all. (it's a sad fact that animated GIF is probably the most portable video format of the moment).

Now, you might well ask - why VP8 ? To that I have no good answer. VP8 seems like a messy cock-assed standard which has nothing in particular going for it. The entropy encoder in particular (much like H264) seems badly designed and inefficient. The basics are completely vanilla, in that it is block based, block modes, movecs, transforms, residual coding. In that sense it is just like MPEG1 or H265. That is a perfectly fine thing to do, and in fact it's what I've wound up doing, but you could pull a video standard like that out of your ass in about five minutes, there's no need to license code for that. If in fact VP8 does dodge all the existing patents then that would be a reason that it has value.

The VP8 code stream is probably pretty weak (I really don't know enough of the details to say for sure). However, what I have learned of late is that there is massive room for the encoder to make good output video even through a weak code stream. In fact I think a very good encoder could make good output from an MPEG2 level of code stream. Monty at Xiph has a nice page about work on Theora. There's nothing really cutting edge in there but it's nicely written and it's a good demonstration of the improvement you can get on a fixed standard code stream just with encoder improvements (and really their encoder is only up to "good but still basic" and not really into the realm of wicked-aggressive).

The only question we need to ask about the VP8 code stream is : is it flexible enough that it's possible to write a good encoder for it over the next few years? And it seems the answer is yes. (contrast this to VP3/Theora which has a fundamentally broken code stream which has made it very hard to write a good encoder).

ADDENDUM : this post by Greg Maxwell is pretty right on.

ADDENDUM 2 : Something major that's been missing from the web discussions and from the literature about video for a long time is the separation of code stream from encoder. The code stream basically gives the encoder a language and framework to work in. The things that Jason / Dark Shikary thinks are so great about x264 are almost entirely encoder-side things that could apply to almost any code stream (eg. "psy rdo" , "AQ", "mbtree", etc.). The literature doesn't discuss this much because they are trapped in the pit of PSNR comparisons, in which encoder side work is not that interesting. Encoder work for PSNR is not interesting because we generally know directly how to optimizing for MSE/SSD/L2 error - very simple ways like flat quantizers and DCT-space trellis quant, etc. What's more interesting is perceptual quality optimization in the encoder. In order to acheive good perceptual optimization, what you need is a good way to measure percpetual error (which we don't have), and the ability to try things in the code stream and see if they improve perceptual error (hard due to non-local effects), and a code stream that is flexible enough for the encoder to make choices that create different kinds of errors in the output. For example adding more block modes to your video coder with different types of coding is usually/often bad in a PSNR sense because all they do is create redundancy and take away code space from the normal modes, but it can be very good in a perceptual sense because it gives the encoder more choice.

ADDENDUM 3 : Case in point , I finally have noticed some x264 encoded videos showing up on the torrent sites. Well, about 90% of them don't play back on my media PC right. There's some glitching problem, or the audio & video get out of sync, or the framerate is off a tiny bit, or some shit and it's fucking annoying.

ADDENDUM 4 : I should be more clear - the most exciting thing about VP8 is that it (hopefully) provides an open patent-free standard that can then be played with and discussed openly by the development community. Hopefully encoders and decoder will also be open source and we will be able to talk about the techniques that go into them, and a whole new

5/13/2010

05-13-10 - P4 with NiftyPerforce and no P4SCC

I'm trying using P4 in MSDev with NiftyPerforce and no P4SCC.

What this means is VC thinks you have no SCC connection at all, your files are just on your disk. You need to change the default NiftyPerforce settings so that it checks out files for you when you edit/save etc.

Advantages of NiftyPerforce without P4SCC :

1. Much faster startup / project load, because it doesn't go and check the status of everything in the project with P4.

2. No clusterfuck when you start unconnected. This is one the worst problems with P4SCC, for example if you want to work on some work projects but can't VPN for some reason, P4SCC will have a total shit fit about working disconnected. With the NiftyPerforce setup you just attrib your files and go on with your business.

3. No difficulties with changing binding/etc. This is another major disaster with P4SCC. It's rare, but if you change the P4 location of a project or change your mappings or if you already have some files added to P4 but not the project, all these things give MSdev a complete shit-fit. That all goes away.

Disadvantages of NiftyPerforce without P4SCC :

1. The first few keystrokes are lost. When you try to edit a checked-in file, you can just start typing and Nifty will go check it out, but until the checkout is done your keystrokes go to never-never land. Mild suckitude. Alternatively you could let MSDev pop up the dialog for "do you want to edit this read only file" which would make you more aware of what's going on but doesn't actually fix the issue.

2. No check marks and locks in project browser to let you know what's checked in / checked out. This is not a huge big deal, but it is a nice sanity check to make sure things are working the way they should be. Instead you have to keep an eye on your P4Win window which is a mild productivity hit.

One note about making the changeover : for existing projects that have P4SCC bindings, if you load them up in VC and tell VC to remove the binding, it also will be "helpful" and go attrib all your files to make them writeable (it also will be unhelpful and not check out your projects to make the change to not have them bound). Then NiftyPerforce won't work because your files are already writeable. The easiest way to do this right is to just open your vcproj's and sln's in a text editor and rip out all the binding bits manually.

I'm not sure yet whether the pros/cons are worth it. P4SCC actually is pretty nice once it's set up, though the ass-pain it gives when trying to make it do something it doesn't want to do (like source control something that's out of the binding root) is pretty severe.

ADDENDUM :

I found the real pro & con of each way.

Pro P4SCC : You can just start editting files in VC and not worry about it. It auto-checks out files from P4 and you don't lose key presses. The most important case here is that it correctly handles files that you have not got the latest revision of - it will pop up "edit current or sync first" in that case. The best way to use Nifty seems to be Jim's suggestion - put checkout on Save, do not checkout on Edit, and make files read-only editable in memory. That works great if you are a single dev but is not super awesome in an actual shared environment with heavy contention.

Pro NiftyP4 : When you're working from home over an unreliable VPN, P4SCC is just unworkable. If you lose connection it basically hangs MSDev. This is so bad that it pretty much completely dooms P4SCC. ARG actually I take that back a bit, NiftyP4 also hangs MSDev when you lose connection, though it's not nearly as bad.

5/12/2010

05-12-10 - P4 By Dir

(ADDENDUM : see comments, I am dumb).

I mentioned this before :

(Currently that's not a great option for me because I talk to both my home P4 server and my work P4 server, and P4 stupidly does not have a way to set the server by local directory. That is, if I'm working on stuff in c:\home I want to use one env spec and if I'm in c:\work, use another env spec. This fucks up things like NiftyPerforce and p4.exe because they just use a global environment setting for server, so if I have some work code and some home code open at the same time they shit their pants. I think that I'll make my own replacement p4.exe that does this the right way at some point; I guess the right way is probably to do something like CVS/SVN does and have a config file in dirs, and walk up the dir tree and take the first config you find).

But I'm having second thoughts, because putting little config shitlets in my source dirs is one of the things I hate about CVS. Granted it would be much better in this case - I would only need a handful of them in my top level dirs, but another disadvantage is my p4bydir app would need to scan up the dir tree all the time to find config files.

And there's a better way. The thing is, the P4 Client specs already have the information of what dirs on my local machine go with what depot mappings. The problem is the client spec is not actually associated with a server. What you need is a "port client user" setting. These are stored as favorites in P4Win, but there is no authoritative list of the valid/good "port client user" setups on a machine.

So, my new idea is that I store my own config file somewhere that lists the valid "port client user" sets that I want to consider in p4bydir. I load that and then grab all the client specs. I use the client specs to identify what dirs to map to where, and the "port client user" settings to tell what p4 environment to set for that dir.

I then replace the global p4.exe with my own p4bydir so that all apps (like NiftyPerforce) will automatically talk to the right connection whenever they do a p4 on a file.

05-12-10 - Cleartype

Since I ranted about Cleartype I thought I'd go into a bit more detail. this article on Cleartype in Win7 is interesting, though also willfully dense.

Another research question we�ve asked ourselves is why do some people prefer bi-level rendering over ClearType? Is it due to hardware issues or is there some other attribute that we don�t understand about visual systems that is playing a role. This is an issue that has piqued our curiosity for some time. Our first attempt at looking further into this involved doing an informal and small-scale preference study in a community center near Microsoft.

Wait, this is a research question ? Gee, why would I prefer perfect black and white raster fonts to smudged and color-fringed cleartype. I just can't imagine it! Better do some community user testing...

1. 35 participants. 2. Comments for bi-level rendering: Washed out; jiggly; sketchy; if this were a printer, I�d say it needed a new cartridge; fading out � esp. the numbers, I have to squint to read this, is it my glasses or it is me?; I can�t focus on this; broken up; have to strain to read; jointed. 3. Comments for ClearType: More defined, Looks bold (several times), looks darker, clearer (4 times), looks like it�s a better computer screen (user suggested he�d pay $500 more for the better screen on a $2000 laptop), sort of more blue, solid, much easier to read (3 times), clean, crisp, I like it, shows up better, and my favorite: from an elderly woman who was rather put out that the question wasn�t harder: this seems so obvious (said with a sneer.)

Oh my god, LOL, holy crap. They are obviously comparing Cleartyped anti-aliased fonts to black-and-white rendered TrueType fonts, NOT to raster fonts. They're probably doing big fonts on a high DPI screen too. Try it again on a 24" LCD with an 8 point font please, and compare something that has an unhinted TrueType and an actual hand-crafted raster font. Jesus. Oh, but I must be wrong because the community survey says 94% prefer cleartype!

Anyway, as usual the annoying thing is that in pushing their fuck-tard agenda, they refuse to acknowledge the actual pros and cons of each method and give you the controls you really want. What I would like is a setting to make Windows always prefer bitmap fonts when they exist, but use ClearType if it is actually drawing anti-aliased fonts. Even then I still might not use it because I fucking hate those color fringes, but it would be way more reasonable. Beyond that obviously you could want even more control like switching preferrence for cleartype vs. bitmap per font, or turning on and off hinting per font or per app, etc. but just some more reasonable global default would get you 90% of the way there. I would want something like "always prefer raster font for sizes <= 14 point" or something like that.

Text editors are a simple case because you just to let the user set the font and get what they want, and it doesn't matter what size the text is because it's not layed out. PDF's and such I guess you go ahead and use TT all the time. The web is a weird hybrid which is semi-formatted. The problem with the web is that it doesn't tell you when formatting is important or not important. I'd like to override the basic firefox font to be my own choice nice bitmap font *when formatting is not important* (eg. in blocks of text like I make). But if you do that globally it hoses the layout of some pages. And then other pages will manually request fonts which are blurry bollocks.

CodeProject has a nice font survey with Cleartype/no-Cleartype screen caps.

GDI++ is an interesting hack to GDI32.dll to replace the font rendering.

Entropy overload has some decent hinted TTF fonts for programmers you can use in VS 2010.

Electronic Dissonance has the real awesome solution : sneak raster fonts into asian fonts so that VS 2010 / WPF will use them. This is money if you use VS 2010.

5/11/2010

05-11-10 - Some New Cblib Apps

Coded up some new goodies for myself today and released them in a new cblib and chuksh .

RunOrActivate : useful with a hot key program, or from the CLI. Use RunOrActivate [program name]. If a running process of that program exists, it will be activated and made foreground. If not, a new instance is started. Similar to the Windows built-in "shortcut key" functionality but not horribly broken like that is.

(BTW for those that don't know, Windows "shortcut keys" have had huge bugs ever since Win 95 ; they sometimes work great, basically doing RunOrActivate, but they use some weird mechanism which causes them to not work right with some apps (maybe they use DDE?), they also have bizarre latency semi-randomly, usually they launch the app instantly but occasionally they just decide to wait for 10 seconds or so).

RunOrActivate also has a bonus feature : if multiple instances of that process are running it will cycle you between them. So for example my Win-E now starts an explorer, goes to existing one if there was one, and if there were a few it cycles between explorers. Very nice. Also works with TCC windows and Firefox Windows. This actually solves a long-time useability problem I've had with shortcut keys that I never thought about fixing before, so huzzah.

WinMove : I've been using this forever, lets you move and resize the active window in various ways, either by manual coordinate or with some shorthands for "left half" etc. Anyway the new bit is I just added an option for "all windows" so that I can reproduce the Win-M minimize all behavior and Win-Shift-M restore all.

I think that gives me all Win-Key functions I actually want.

ADDENDUM : One slightly fiddly bit is the question of *which* window of a process to activate in RunOrActivate. Windows refuses to give you any concept of the "primary" window of a process, simply sticking to the assertion that processes can have many windows. However we all know this is bullshit because Alt-Tab picks out an isolated set of "primary" windows to switch between. So how do you get the list of alt-tab windows? You don't. It's "undefined", so you have to make it up somehow. Raymond Chen describes the algorithm used in one version of Windows.

5/09/2010

05-09-10 - Some Win7 Shite

Perforce Server was being a pain in my ass to start up because the fucking P4S service doesn't get my P4ROOT environment variable. Rather than try to figure out the fucking Win 7 per-user environment variable shite, the easy solution is just to move your P4S.exe into your P4ROOT directory, that way when it sees no P4ROOT setting it will just use current directory.

New P4 Installs don't include P4Win , but you can just copy it from your old install and keep using it.

This is not a Win7 problem so much as a "newer MS systems" problem, but non-antialiased / non-cleartype text rendering is getting nerfed. Old stuff that uses GDI will still render good old bitmap fonts fine, but newer stuff that uses WPF has NO BITMAP FONT SUPPORT. That is, they are always using antialiasing, which is totally inappropriate for small text (especially without cleartype). (For example MSVC 2010 has no bitmap font support (* yes I know there are some workarounds for this)).

This is a huge fucking LOSE for serious developers. MS used to actually have better small text than Apple, Apple always did way better at smooth large blurry WYSIWYG text shit. Now MS is just worse all around because they have intentionally nerfed the thing they were winning at. I'm very disappointed because I always run no-cleartype, no-antialias because small bitmap fonts are so much better. A human font craftsman carefully choosing which pixels should be on or off is so much better than some fucking algorithm trying to approximate a smooth curve in 3 pixels and instead giving me fucking blue and red fringes.

Obviously anti-aliased text is the *future* of text rendering, but that future is still pretty far away. My 24" 1920x1200 that I like to work on is 94 dpi (a 30" 2560x2600 is 100 dpi, almost the same). My 17" lappy at 1920x1200 has some of the highest pixel density that you can get for a reasonable price, it's pretty awesome for photos, but it's still only 133 dpi which is shit for text (*). To actually do good looking antialiased text you need at least 200 dpi, and 300 would be better. This is 5-10 years away for consumer price points. (In fact the lappy screen is the unfortunate uncanny valley; the 24" at 1920x1200 is the perfect res where non-atialiased stuff is the right size on screen and has the right amount of detail. If you just go to slightly higher dpi, like 133, then everything is too small. If you then scale it up in software to make it the right size for the eye, you don't actually have enough pixels to do that scale up. The problem is that until you get above 200 dpi where you can do arbitrary scaling of GUI elements, the physical size of the pixel is important, and the 100 dpi pixel is just about perfect). (* = shit for anti-aliased text, obviously great for raster fonts at 14 pels or so).

( ADDENDUM : Urg I keep trying to turn on Cleartype and be okay with it. No no no it's not okay. They should call it "Clear Chromatic Abberation" or "Clearly the Developers who thing this is okay are colorblind". Do they think our eyes only see luma !? WTF !? Introducing colors into my black and white text is just such a huge visual artifact that no amount of improvement to the curve shapes can make up for that. )

It's actually pretty sweet right now living in a world where our CPU's are nice and multi-core, but most apps are still single core. It means I can control the load on my machine myself, which is damn nice. For example I can run 4 apps and know that they will all be pretty nice and snappy. These days I am frequently keeping 3 copies of my video test app running various tests all the time, and since it's single core I know I have one free core to still fuck around on the computer and it's full speed. The sad thing is that once apps actually all go multi-core this is going to go away, because when you actually have to share cores, Windows goes to shit.

Christ why is the registry still so fucking broken? 1. If you are a developer, please please make your apps not use the registry. Put config files in the same dir as your .exe. 2. The Registry is just a bunch of text strings, why is it not fucking version controlled? I want a log of the changes and I want to know what app made the change when. WTF.

The only decent way to get environment variables set is with TCC "set /S" or "set /U".

"C:\Program Files (x86)" is a huge fucking annoyance. Not only does it break by muscle memory and break a ton of batch files I had that looked for program files, but now I have a fucking quandary every time I'm trying to hunt down a program.. err is it in x86 or not? I really don't like that decision. I understand it's needed for if you actually have an x86 and x64 version of the same app installed, but that is very rare, and you should have only bifurcated paths on apps that actually do have a dual install. (also because lots of apps hard code to c:\program files , they have a horrible hack where they let 32 bit apps think they are actually in c:\program files when they are in "C:\Program Files (x86)"). Blurg.


[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\PrefetchParameters]
"EnableSuperfetch"=dword:00000000
"EnablePrefetcher"=dword:00000000

[HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Policies\Explorer]
"NoWinKeys"=dword:00000001

[HKEY_CURRENT_USER\Control Panel\Desktop]
"MenuShowDelay"="10"

Some links :

Types - Vista/Win7 has borked the "File Associations" setup. You need a 3rd party app like Types now to configure your file types (eg. to change default icons).

Shark007.net - Windows 7 Codecs - WMP12 Codecs - seem to work.

Pismo Technic Inc. - Pismo File Mount - nicest ISO mounter I've found (Daemon tools feels like it's made out of spit and straw).

Hot Key Plus by Brian Apps - ancient app that still works and I like because it's super simple.

Change Windows 7 Default Folder Icon - Windows 7 Forums ; presumably you have the Preview stuff for Folders turned off, so now make the icon not so ugly.

- how to move your perforce depot Annoyingly I used a different machine name for new lappy and thus a different clientview, so MSVC P4SCC fails to make the connection and wants to rebind every project. The easiest way to fix this is just to not use P4SCC and kill all your bindings and just use NiftyPerforce without P4SCC.

(Currently that's not a great option for me because I talk to both my home P4 server and my work P4 server, and P4 stupidly does not have a way to set the server by local directory. That is, if I'm working on stuff in c:\home I want to use one env spec and if I'm in c:\work, use another env spec. This fucks up things like NiftyPerforce and p4.exe because they just use a global environment setting for server, so if I have some work code and some home code open at the same time they shit their pants. I think that I'll make my own replacement p4.exe that does this the right way at some point; I guess the right way is probably to do something like CVS/SVN does and have a config file in dirs, and walk up the dir tree and take the first config you find).

allSnap make all windows snap - AllSnap for x64/Win7 seems to be broken, but the old 32 bit one seems to work just fine still. (ADDENDUM : nope, old allsnap randomly crashes in Win 7, do not use)

KeyTweak Homepage - I used KeyTweak to remap my Caps Lock to Alt.

Firefox addons :


DownThemAll
Flashblock
PDF Download
Stop Autoplay
Adblock Plus
DownloadHelper

ADDENDUM : I found the last few guys who are ticking my disk :

One that you obviously want to disable is Windows Media Player Media sharing serivcee : Fix wmpnetwk.exe In Windows 7 . It just constantly scans your media dirs for shit to serve. Fuck you.

The next big culprit is the Windows Reliability stuff. Go ahead and disable the RAC scheduled task, but that's not the real problem. The nasty one is the "last alive stamp" which windows writes once a minute by default. This is to help diagnose crashes. You could change TimeStampInterval to 30 or so to make it once every 30 minutes, but I set it to zero to disable it. See :
Shutdown Event Tracker Tools and Settings System Reliability
How to disable hard disk thrashing with Vista - Page 7

And this is a decent summary/repeat of what I've said : How to greatly reduce harddisk grinding noises in Vista .

4/07/2010

04-07-10 - Video

Blurg, the complexity wheel turns. In the end, all the issues with video come down two huge fundamental problems :

1. Lack of the true distortion metric. That is, we make decisions to optimize for some D, but that D is not really what humans perceive as quality. So we try to bias the coder to make the right kind of error in a black art hacky way.

2. Inability to do full non-greedy optimization. eg. on each coding decision we try to do a local greedy decision and hope that is close to globally optimal, but in fact our decisions do have major effects on the future in numerous and complex ways. So we try to account for how current decisions might affect the future using ugly heuristics.

These two major issues underly all the difficulties and hacks in video coding, and they are unfortunately nigh intractable. Because of these issues, you get really annoying spurious results in coding. Some of the annoying shit I've seen :

A. I greatly improve my R/D optimizer. Global J goes up !! (lower J is better, it should have gone down). WTF happened !? On any one block, my R/D optimizer now has much more ability to make decisions and reach a J minimum on that block. The problem is that the local greedy optimization is taking my code stream to weird places that then hurt later blocks in ways I am not accounting for.

B. I introduce a new block type. I observe that the R/D chooser picks it reasonably often and global J goes down. All signs indicate this is good for coding. No, visual quality goes down! Urg. This can come from any number of problems, maybe the new block type has artifacts that are visually annoying. One that I have run into that's a bother is just that certain block types will have their J minimum on the R/D curve at very different places - the result of that is a lot of quality variation across the frame, which is visually annoying. eg. the block type might be good in a strict numerical sense, but its optimum point is at much higher or much lower quality than your other block types, which makes it stand out.

C. I completely screw up a block type, quality goes UP ! eg. I introduce a bug or some major coding inefficiency so a certain block type really sucks. But global quality is better, WTF. Well this can happen if that block type was actually bad. For one thing, block types can actually be bad for global J even if they are good for greedy local J, because they produce output that is not good as a future mocomp source, or even simply because they are redundant with other block types and are a waste of code space. A more complex problem which I ran into is that a broken block type can change the amount of bits allocated to various parts of the video, and that can randomly give you better bit allocation, which can make quality go up even though you broke your coder a bit. Most specifically, I broke my Intra ("I") block (no mocomp) coder, which caused more bits to go to I-like frames, which actually improved quality.

D. I improve my movec finder, so I'm more able to find truly optimal movecs in an R/D sense (eg. find the movec that actually optimizes J on the current block). Global J goes down. The problem here is that optimizing the current movec can make that movec very weird - eg. make the movec far from the "true motion". That then hurts future coding greatly.

In most cases these problems can be patched with hacks and heuristics. The goal of hacks and heuristics is basically to try to address the first two issues. Going back to the numbering of the two issues, what the hacks do is :

1. Try to force distortion to be "good distortion". Forbid too much quality variation between neighboring blocks. Forbid block mode decisions that you somehow decide is "ugly distortion" even if it optimized J. Try to tweak your D metric to make visual quality better. Note that the D tweakage here is a pretty nasty black art - you are NOT actually trying to make a D that approximates a true human visual D, you are trying to make a D under which your codec will make decisions that produce good global output.

2. To account for the greedy/non-greedy problem, you try to bias the greedy decisions towards things that you guess will be good for the future. This guess might be based on actually future data from a previous run. Basically you decide not to make the choice that is locally optimal if you have reason to believe it will hurt too much in the future. This is largely based on intuition and familiarity with the codec.

Now I'll mention a few random particular issues, but really these themes occur again and again.

I. Very simple block modes. Most coders have something like a "direct block copy" mode, or even a "solid single color", eg. DIRECT or "skip" or whatever. These type of blocks are generally quite high distortion and very low rate. The problem occurs when your lambda is sort of near the threshold for whether to prefer these blocks or not. Oddly the alternate choice mode might have much higher rate and much higher distortion. The result is that a bunch of very similar blocks near each other in an image might semi-randomly select between the high quality and low quality modes (which happen to have very similar J's at the current lambda). This is obviously ugly. Furthermore, there's a non-greedy optimization issue with these type of block modes. If we compare two choices that have similar J, one is a skip type block with high distortion, another is some detailed block mode - the skip type is bad for information conveyance to the future. That is, it doesn't add anything useful for future blocks to refer to. It just copies existing pixels (or even wipes some information out in some cases).

II. Gradual changes need to be send gradually. That is, if there is some part of the video which is slowly steadily changing, such as a slow cross fade, or very slow scale/rotate type motion, or whatever - you really need to send it as such. If you make a greedy best J decision, at low bit rate you will some times decide to send zero delta, zero delta, for a while because the difference is so small, and then it becomes too big where you need to correct it and you send a big delta. You've turned the gradual shift into a stutter and pop. Of course the decision to make a big correction won't happen all across the frame at the same time, so you'll see blocks speckle and move in waves. Very ugly.

III. Rigid translations need to preserved. The eye is very sensitive to rigid translations. If you just let the movec chooser optimize for J or D it can screw this up. One reason is that very small motions or movements of monotonous objects might slip to movec = 0 for code size purposes. That is, rather than send the correct small movec, it might decide that J is better by incorrectly sending a zero delta movec with a higher distortion. Another reason is that the actual best pixel match might not correspond to the motion, you can get anomalies, especialy on sliding patterned or semi-patterned objects like grass. In these cases, it actually looks better to use the true motion movec even if it has larger numerical distortion D to do so. Furthermore there is another greedy/non-greedy issue. Sometimes some non-true-motion movec might give you well the best J on the current block by reaching out and grabbing some random pixels that match really well. But that screws up your motion field for the future. That movec will be used to condition predictions of future movecs. So say you have some big slowly translating field - if everyone picks nice true motion movecs they will also be coherent, but if people just get to pick the best match for themselves greedily, they will be a big mess and not predict each other. That movec might also be used by the next frame, the previous B frame, etc.

IV. Full pel vs. half/quarter/sub-pel is a tricky issue. Sub-pel movecs often win in a strict SSD sense; this is partly because when the match is imperfct, sub-pel movecs act to sort of average two guess together; they produce a blurred prediction, which is optimal under L2 norm. There are some problems with this though; sub-pel movecs act to blur the image, they can stand out visually as blurrier bits; they also act to "destroy information" in the same way that simple block modes do. Full pel movecs have the advantage of giving you straight pixel copies, so there is no blur or destruction of information. But full pel movecs can have their own problem if the true motion is subpel - they can produce wiggling. eg. if an area should really have movecs around 0.5 , you might make some blocks where the movec is +0 and some where it is +1. The result is a visible dilation and contraction that wiggles along, rather than a perfect rigid motion.

V. A good example of all this evil is the movec search in x264. They observed that allowing very large movec search ranges actually decreases quality (vs a more local incremental searches). In theory if your movec chooser is using the right criterion, this should not be - more choices should never hurt, it should simply not choose them if they are worse. Their problem is twofold - 1. their movec chooser is obviously not perfect in that it doesn't account for current cost completely correctly, 2. of course it doesn't account for all the effects on the future. The result is that using some heuristic seed spots for the search which you believe are good coherent movecs for various reasons, and then doing small local searches actually gives you better global quality. This is a case where using "broken" code gives better results.

In fact it is a general pattern that using very accurate local decisions often hurts global quality, and often using some tweaked heuristic is better. eg. instead of using true code cost R in your J decision, you make some functional fit to the norms of the residuals; you then tweak that fit to optimize global quality - not to fit R. The result is that the fit can wind up compensating for the greedy/non-greedy and other weird factors, and the approximation can actually be better than the more accurate local criterion.

Did I say blurg?

3/18/2010

03-18-10 - Physics

Gravity is a force that acts proportionally to the two masses. (let's just assume classical Newtonian gravity is in fact the way the universe works)

People outside of science often want to know "but why?" or "how exactly? what is the mechanism? what carries the force?" . At first this seems like a reasonable question, you don't just want to have these rules, you want to know where they come from, what they mean exactly. But if you think a bit more, it should be clear that these questions are absurd.

Let's say you know the fundamental physical laws. These are expressed as mathematical rules that tell you the behavior of objects. Say for example we lived in a world with only Newtonian dynamics and gravity and that is all the laws. Someone asks "but what *is* gravity exactly?". I ask : How could you ever know? What could there ever be that "is" gravity? If something was faccilitating the force of gravity, there would have to be some description of that thing, some new law to describe it. That would mean some new rule to describe this thing that carried gravity. Then you would ask "well where does this rule for the carrier of gravity come from?" and you would need a new rule. Say you said "gravity is carried by the exchange of gravitons" ; then of course they could ask "why is there a graviton, what makes gravitons exactly, why do they couple in this way?" etc.

The fundamental physical laws cannot be explained by anything else.

That's almost a tautology because that's what I mean by "fundamental" - you take all the behavior of the universe, every little thing, like "I pushed on this rock with 10 pounds of force and it went 5 meters per second". You strip away every single law that can be explained with some other law. You strip and strip and finally you are left with a few laws that cannot be explained by anything else. These are the fundamental laws and there is no "why" or "how" for them. In fact the whole human question of "how" is imprecise; what we really should say is "what simpler physical law can explain this phenomenon?". And at some point there is no more answer to that.

Of course this is assuming that there *is* a fundamental physical law. Most physicists assume that to be true without questioning it, but I wrote here at cbloom.com long ago that in fact the set of physical laws might well be infinite - that is, maybe we will find some day that the electrical and gravitational force can be explained in terms of some new law which also adds some new behaviors at very small scale (if it didn't add new behaviors it would simply be a new expression of the same law and not count), and then maybe that new law is explained in terms of another new law which also adds new behaviors, etc. ad infinitum - a russian doll of physcial laws that never ends. This is possible, and furthermore I contend that it is irrelevant.

There is a certain human need to know "why" the physical laws are as they are, or to know the "absolute" "fundamental" laws - but I don't believe there's really much merit to that at all. What if they do finally work out string theory, and it explains all known phenomena for a while, but then we find that there is a small error in the mass of the Higgs Boson on the order of ten to the minus one billion, which tells us there must be some other physical law that we don't yet know. The fact that string theory then is only a very good model of the universe and not the "absolute law" of the universe changes nothing except our own silly human emotions in response to it (and surely crackpots would rise up and say that since it's not "100% right" then there must be angels and thetans at work).

What if we found laws that explained all phenomena that we know of perfectly. We might well think those laws are the "absolute fundamental" laws of the universe. But how would we ever know? Maybe there are other phenomena that can't be explained by those laws that we simply haven't seen yet. Maybe those other phenomena could *never* be seen! (for example there may be another entire set of particles and processes which have zero coupling to our known matter). The existance of this unexplained phenomena does not reduce the merit of the laws you know, even though they are now "not complete" or "don't describe all of nature".

It's funny to think about how our intuition of "mass" was screwed up by the fact that we evolved on the earth in a high gravity environment where we inherently think of mass as "weight" - eg. something is heavy. There's this thing which I will call K. It's the coefficient of inertia, it's how hard something is to move when you apply a certain force to it. F = K A if you will. Imagine we grew up in outer space with lots of large electrical charges around. If we apply an electric field to two charges of different K, one moves fast and one moves slow, the difference is the constant K. It's a very funny thing that this K, this resistance to changes of motion, is also the coupling to the gravitational field.

3/10/2010

03-10-10 - Distortion Measure

What are the things we might put in an ideal distortion measure? This is going to be rather stream of conscious rambling, so beware. Our goal is to make output that "looks like" the input, and also that just looks "good". Most of what I talk about will assume that you are running "D" on something like 4x4 or 8x8 blocks and comparing it to "D" on other blocks, but of course it could be run on a gaussian windowed patch, just some way of localizing distortion on a region.

I'm going to ignore the more "macroscopic" issues of which frame is more important than another frame, or even which object within a frame is more important - those are very important issues I'm sure, but they can be added on later, and are beyond the scope of current research anyway. I want to talk about the microscopic local distortion rating D. The key thing is that the numerical value of D assigns a way to "score" one distortion against another. This not only lets you choose the way your error looks on a given block (choosing the one with lowest score obviously), it also determines how your bits are allocated around the frame in an R/D framework (bits will go to places that D says are more important).

It should be intuitively obvious that just using D = SSD or SAD is very crude and badly broken. One pixel step of numerical error clearly has very different importance depending on where it is. How might we do better ?

1. Value Error. Obviously the plain old metric of "output value - input value" is useful even just as a sanity check and regularizer ; it's the background distortion metric that you will then add your other biasing factors to. All other things being equal you do want output pixels to exactly match input pixels. But even here there's a funny issue of what measure you use. Probably something in the L-N norms, (L1 = SAD, L2 = SSD). The standard old world metric is L2, because if you optimize for D = L2, then you will minimize your MSE and maximize your PSNR, which is the goal of old literature.

The L-N norms behave differently in the way they rate one error vs another. The higher N is, the more importance it puts on the largest error. eg. L-infinity only cares about the largest error. L-2 cares more about big errors than small ones. That is, L2 makes it better to change 100->99 than 1->0. Obviously you could also do hybrid things like use L1 and then add a penalty term for the largest error if you think minimizing the maximum error is important. I believe that *where* the error occurs is more important than what its value is, as we will discuss later.

2. DC Preservation. Changes in DC are very noticeable. Particularly in video, the eye is usually tracking mainly one or two foreground objects; what the means is that most of the frame we are only seeing with our secondary vision (I don't know a good term for this, it's not exactly peripheral vision since it's right in front of you, but it's not what your brain is focused on, so you see it at way lower detail). All this stuff that we see with secondary vision we are only seeing the gross properties of it, and one of those is the DC. Another issue is that if a bunch of blocks in the source have the same DC, and you change one of them in the output, that is sorely noticeable.

I'm not sure if it's most important to preserve the median or the mean or what exactly. Usually people preserve the mean, but there are certainly cases where that can screw you up. eg. if you have a big field of {80} with a single pixel spike on it, you want to preserve that background {80} everywhere no matter what the spike does in the output. eg. {80,80,255,80,80} -> {80,80,240,80,80} is better than making it go -> {83,83,240,83,83} even though the latter has better mean preservation.

3. Edge Preservation. Hard edges, especially long straight lines or smooth curves, are very visible to humans and any artifact in them stands out. The importance of edges varies though; it has something to do with the length of the edge (longer edges are more major visual features) and with the contrast range of the region around the edge : eg. an edge that separates two very smooth sections is super visible, but an edge that's one of many in a bunch of detail is less important (preserving the detail there is important, but the exact shape of the edge is not). eg. a patch of grass or leaves might have tons of edges, but their exact shape is not crucial. An image of hardwood floor has tons of long straight parallel edges and preserving those exactly is very important. The border between objects is typically very important.

Obviously there's the issue of keeping the edges that were in the original and also the issue of not making new edges that weren't in the original. eg. introducing edges at block boundaries or from ringing artifacts or whatever. As with edge preservation, the badness of these introduces edges depends on the neighborhood - it's much worse to make them in a smooth patch than once that's already noisy. (in fact in a noisy patch, ringing artifacts are sort of what you want, which is why JPEG can look better than naive wavelet coders on noisy data).

4. Smooth -> Smooth (and Flat -> Flat). Changing smooth input to not smooth is very bad. Old coders failed hard on this by making block boundaries. Most new coders now handle this easily inherently either because they are wavelet or use unblocking or something. There are still some tricky cases though, such as if you have a smooth ramp with a bit of gaussian noise speckle added to it. Visually the eye still sees this as "smooth ramp" (in fact if you squint your eyes the noise speckly goes away completely). It's very important for the output to preserve this underlying smooth ramp; many good modern coders see the noise speckle as "detail" that should be preserved and wind up screwing up the smooth ramp.

5. Detail/Energy Preservation. The eye is very sensitive to whether a region is "noisy" or "detailed", much more so than exactly what that detail is. Some of the JPEG style "threshold of visibility" stuff is misleading because it makes you think the eye is not sensitive to high frequency shapes - true, but you do see that there's "something" there. The usual solution to this is to try to preserve the amount of high frequency energy in a block.

There are various sub-cases of this. There's true noise (or real life video that's very similar to true noise) in which case the exact pixel values don't matter much at all as long as the frequency spectrum and distribution of the noise is reproduced. There's detail that is pretty close to noise, like tree leaves, grass, water, where again the exact pixels are not very important as long as the character of the source is preserved. Then there's "false noise" ; things like human hair or burlap or bricks can look a lot like noise to naive analysis metrics, but are in fact patterned texture in which case messing up the pattern is very visible.

There are two issues here - obviously there's trying to match the source, but there's also the issue of matching your neighbors. If you have a bunch of neighboring source blocks with a certain amount of energy, you want to reproduce that same patch in the output - you don't want to have single blocks with very different energy, because they will stand out. Block energy is almost like DC level in this way.

6. Dynamic range / sdev Preservation. Of course related to previous metrics, but you can definitely see when the dynamic range of a region changes. On an edge it's very easy to see if a high contrast edge becomes lower contrast. Also in noise/detail areas the main things you notice are the DC, the amount of noise, and the range of the noise. One reason its so visible is because of optical fusion and affects on DC brightness. That is, if you remove the bright specks from a background it makes the whole region look darker. Because of gamma correction, {80,120} is not the same brightness as {100,100}. Now theoretically you could do gamma-corrected DC preservation checks, but there are difficulties in trying to be gamma correct in your error metrics since the gamma remapping sort of does what you want in terms of making changes of dark values relatively more important; maybe you could do gamma-correct DC preservation and then scale it back using gamma to correct for that.

It's unclear to me whether the important thing is the absolute [low,high] range, or the statistical width [mean-sdev,mean+sdev]. Another option would be to sort the values from lowest to highest and look at the distribution; the middle is the median, then you have the low and high tails on each side; you sort of want to preserve the shape of that distribution. For example the input might have the high values in a kind of gaussian falloff tail with most values near median and fewer as it gets higher; then the output should have a similar distribution, but exactly matching the high value is not important. The same block might have all of its low values at exactly 0 ; in that case the output should also have those values at exactly 0.


Whatever all the final factors are, you are left with how to scale them and combine them. There are two issues on scaling : power and coefficient. Basically you're going to combine the sub-distortions something like this :


D = Sum_n { Cn * Dn^Pn }

Dn = distortion sub type n
Cn = coefficient n
Pn = power n

The power Pn lets you change the units that Dn are measured in; it lets you change how large values of Dn contribute vs. how small values contribute. The cofficient Cn obviously just overall scales the importance of each Dn vs. the other D's.

It's actually not that hard to come up with a good set of candidate distortion terms like I did above, the problem is once you have them (the various Dn) - what are the Cn and Pn to combine them?

old rants