5/26/2010

05-26-10 - Windows Page Cache

The correct way to cache things is through Windows' page cache. The advantage from doing this over using your own custom cache code is :

1. Automatically resizes based on amount of memory needed by other apps. eg. other apps can steal memory from your cache to run.

2. Automatically gives pages away to other apps or to file IO or whatever if they are touching their cache pages more often.

3. Automatically keeps the cache in memory between runs of your app (if nothing else clears it out). This is pretty immense.

Because of #3, your custom caching solution might slightly beat using the Windows cache on the first run, but on the second run it will stomp all over you.

To do this nicely, generally the cool thing to do is make a unique file name that is the key to the data you want to cache. Write the data to a file, then memory map it as read only to fetch it from the cache. It will now be managed by the Windows page cache and the memory map will just hand you a page that's already in memory if it's still in cache.

The only thing that's not completely awesome about this is the reliance on the file system. It would be nice if you could do this without ever going to the file system. eg. if the page is not in cache, I'd like Windows to call my function to fill that page rather than getting it from disk, but so far as I know this is not possible in any easy way.

For example : say you have a bunch of compressed images as JPEG or whatever. You want to keep uncompressed caches of them in memory. The right way is through the Windows page cache.

05-26-10 - Windows 7 Snap

My beloved "AllSnap" doesn't work on Windows 7 / x64. I can't find a replacement because fucking Windows has a feature called "Snap" now, so you can't fucking google for it. (also searching for "Windows 7" stuff in general is a real pain because solutions and apps for the different variants of windows don't always use the full name of the OS they are for in their page, so it's hard to search for; fucking operating systems really need unique code names that people can use to make it possible to search for them; "Windows" is awful awful in this regard).

I contacted the developer of AllSnap to see if he would give me the code so I could fix it, but he is ignoring me. I can tell from debugging apps when AllSnap is installed that it seems to work by injecting a DLL. This is similar to how I hacked the poker sites for GoldBullion, so I think I could probably reproduce that. But I dunno if Win7/x64 has changed anything about function injection and the whole DLL function pointer remap method.

BTW/FYI the standard Windows function injection method goes like this : Make a DLL that has some event handler. Run a little app that causes that event to trip inside the app you want to hijack. Your DLL is now invoked in that app's process to handle that event. Now you are running in that process so you can do anything you want - in particular you can find the function table to any of their DLL's, such as user32.dll, and stuff your own function pointer into that memory. Now when the app makes normal function calls, they go through your DLL.

5/25/2010

05-25-10 - Thread Insurance

I just multi-threaded my video test app recently, and it was reasonably easy, but I had a few nagging bugs because of hidden ways they were touching shared memory without protection deep inside functions. Okay, so I found them and fixed them, but I'm left with a problem - any time I touch one of those deep functions, I could screw up the threading without realizing it. And I might not get any indication of what I did for weeks if it's a rare race.

What I would like is a way to make this more robust. I have very strong threading primitives, I want a way to make sure that I use them! In particular, I want to be able to mark certain structs as only touchable when a critsec is locked or whatever.

I think that a lot of this could be done with Win32 memory page protections. So far as I know there's no way to associate protections per-thread, (eg. to make a page read/write for thread A but no-access for thread B). If I could do that it would be super sweet.

One idea is to make the page no access and then install my own exception handler that checks what thread it is, but that might be too much overhead (and not sure if that would fail for other reasons).

The main usage is not for protected crit-sec'ed structs, that is really the easiest case to maintain because it's very obvious right there in the code that you need to take the critsec to touch the variables. The hard case to maintain is the ad hoc "I know this is safe to touch without protection". In particular I have a lot of code that runs like this :


Phase 1 : I know no threads are touching shared data item A
main thread does lots of writing in A

Phase 2 : fire up threads.  They only read from A and do so without protection.  They each write to unique areas B,C,D.

Phase 3 : spin down threads.  Now main thread can write A and read B,C,D.

So what I would really like to do is :

Phase 1 : I know no threads are touching shared data item A
main thread does lots of writing in A

-- set A memory to be read-only !
-- set B,C,D memory to be read/write only for their own thread

Phase 2 : fire up threads.  They only read from A and do so without protection.  They each write to unique areas B,C,D.

-- make A,B,C,D read/write only for main thread !

Phase 3 : spin down threads.  Now main thread can write A and read B,C,D.

The thing that this saves me from is when I'm tinkering in DoComplicatedStuff() which is some function called deep inside Phase 2 somewhere and I change it to no longer follow the memory access rule that it is supposed to be following. This is just my hate for having rules for code correctness that are not enforced by the compiler or at least by run-time asserts.

5/21/2010

05-21-10 - Video coding beyond H265

In the end movec-residual coding is inherently limitted and inefficient. Let's review the big advantage of it and the big problem.

The advantage is that the encoder can reasonably easy consider {movec,residual} coding choices jointly. This is a huge advantage over just picking what's my best movec, okay now code the residual. Because movec affects the residual, you cannot make a good R/D decision if you do it separately. By using block movecs, it reduces the number of options that need to be considered to a small enough set that encoders can practically consider a few important choices and make a smart R/D decision. This is what is behind all current good video encoders.

The disadvantage of movec-residual coding is that they are redundant and connected in a complex and difficult to handle way. We send them independently, but really they have cross-information about each other, and that is impossible to use in the standard framework.

There are obviously edges and shapes in the image which occur in both the movecs and the residuals. eg. a moving object will have a boundary, and really this edge should be used for both the movec and residual. In the current schemes we send a movec for the block, and then the residuals per pixel, so we now have finer grain information in the residual that should have been used to give us finer movecs per pixel, but it's too late now.

Let's back up to fundamentals. Assume for the moment that we are still working on an 8x8 block. We want to send that block in the current frame. We have previous frames and previous blocks within the current frame to help us. There are 256^3^64 possible values for this block. If we are doing lossy coding, then not all possible values for the block can be sent. I won't get into details of lossiness, so just say there are a large number of possible values for the pixels of the block; we want to code an index to one of those values.

Each index should be sent with a different bit length based on its probability. Already we see a flaw with {movec-residual} coding - there are tons of {movec,residual} pairs that specify the same index. Of course in a flat area lots of movecs might point to the same pixels, but even if that is eliminated, you could go movec +1, residual +3, or movec +3, residual +1, and both ways get to +4. Redundant encoding = bit waste.

Now, this bit waste might not be critically bad with current simple {movec,residual} schemes - but it is a major encumbrance if we start looking at more sophisticated mocomp options. Say you want to be able to send movecs for shapes, eg. send edges and then send a movec on each side. There are lots of possibilities here - you might just send a movec per pixel (this seems absurdly expensive, but the motion fields are very smooth so should code well from neighbors), or you might send a polygon mesh to specify shapes. This should give you much better motion fields, and then the information in the motion fields can be used to predict the residuals as well. But the problem is there's too much redundancy. You have greatly expanded the number of ways to code the same output pixels.

We could consider more modest steps as well, such as sticking with block mocomp + residual, but expanding what we can do for "mocomp". For example, you could use two motion vectors + arbitrary linear combination of the source blocks. Or you could do trapesoidal texture-mapping style mocomp. Or mocomp with a vector and scale + rotation. None of these is very valuable, there are numerous problems : 1. too many ways to encode for the encoder to do thorough R/D analysis of all of them, 2. too much redundancy, 3. still not using the joint information across residual & motion.

In the end the problem is that you are using a 6-d value {velocity,pixel} to specify a 3-d color. What you really want is a 3-d coordinate which is not in pixel space, but rather is a sort of "screw" in motion/pixel space. That is, you want the adjacent coordinates in motion/pixel space to be the ones that are closest together in the 6-d space. So for example RGB {100,0,0} and {0,200,50} might be neighbors in motion/pixel space if they can be reached by small motion adjustments.

Okay this is turning into rambling, but another way of seeing it is like this : for each block, construct a custom basis transform. Don't send a separate movec or anything - the axes of the basis transform select pixels by stepping in movec and also residual.

ADDENDUM : let me try to be more clear by doing a simple example. Say you are trying to code a block of pixels which only has 10 possible values. You want to code with a standard motion then residual method. Say there are only 2 choices for motion. It is foolish to code all 10 possible values for both motion vectors! That is, currently all video coders do something like :


Code motion = [0 or 1]
Code residual = [0,1,2,3,4,5,6,7,8,9]

Or in tree form :

   0 - [0,1,2,3,4,5,6,7,8,9]
*<
   1 - [0,1,2,3,4,5,6,7,8,9]

Clearly this is foolish. For each movec, you only need to code the residual which encodes that resulting pixel block the smallest under that movec. So you only need each output value to occur in one spot on the tree, eg.

   0 - [0,1,2,3,4]
*<
   1 - [5,6,7,8,9]

or something. That is, it's foolish to have to ways to encode the residual to reach a certain target when there were already cheaper ways to reach that target in the movec coding portion. To minimize this defficiency, most current coders like H264 will code blocks by either putting almost all the bits in the movec and very few in the residual, or the other way (almost none in the movec and most in the residual). The loss occurs most when you have many bits in the motion and many in the residual, something like :


   0 - [0,1,2]
   1 - [3,4,5,6]
   2 - [7,8]
   3 - [9]

The other huge fundamental defficiency is that the probability modeling of movecs and residuals is done in a very primitive way based only on "they are usually small" assumptions. In particular, probability modeling of movecs needs to be done not just based on the vector, but on the content of what is pointed at. I mentioned long ago there is a lot of redundancy there when you have lots of movecs pointing at the same thing. Also, the residual coding should be aware of what was pointed to by the movec. For example if the movec pointed at a hard edge, then the residual will likely also have a similar hard edge because it's likely we missed by a little bit, so you could use a custom transform that handles that better. etc.

ADDENDUM 2 : there's something else very subtle going on that I haven't seen discussed much. The normal way of sending {movec,residual} is actually over-complete. Mostly that's bad, too much over-completeness means you are just wasting bits, but actually some amount of over-completeness here is a good thing. In particular for each frame we are sending a little bit of extra side information that is useful for *later* frames. That is, we are sending enough information to decode the current frame to some quality level, plus some extra that is not really worth it for just the current frame, but is worth it because it helps later frames.

The problem is that the amount of extra information we are sending is not well understood. That is, in the current {movec,residual} schemes we are just sending extra information without being in control and making a specific decision. We should be choosing how much extra information to send by evaluating whether it is actually helpful on future frames. Obviously the last frames of the video (or a sequence before a cut) you shouldn't send any extra information.

In the examples above I'm showing how to reduce the overcomplete information down to a minimal set, but sometimes you might not want to do that. As a very course example say the true motion at a given pixel is +3, movec=3 to get to final pixel=7 , but you can code the same result smaller by using movec=1 - deciding whether to send the true motion or not should be done based on whether it actually helps in the future, but more importantly the code stream could collapse {3,7} and {1,7} so there is no redundant way to code if the difference is not helpful.

This becomes more important of course if you have a more complex motion scheme, like per-pixel motion or trapezoidal motion or whatever.

5/20/2010

05-20-10 - Some quick notes on H265

Since we're talking about VP8 I'd like to take this chance to briefly talk about some of the stuff coming in the future. H265 is being developed now, though it's still a long ways away. Basically at this point people are throwing lots of shit at the wall to see what sticks (and hope they get a patent in). It is interesting to see what kind of stuff we may have in the future. Almost none of it is really a big improvement like "duh we need to have that in our current stuff", it's mostly "do the same thing but use more CPU".

The best source I know of at the moment is H265.net , but you can also find lots of stuff just by searching for video on citeseer. (addendum : FTP to Dresen April meeting downloads ).

H265 is just another movec + residual coder, with block modes and quadtree-like partitions. I'll write another post about some ideas that are outside of this kind of scheme. Some quick notes on the kind of things we may see :

Super-resolution mocomp. There are some semi-realtime super-resolution filters being developed these days. Super-resolution lets you take a series of frames and great an output that's higher fidelity than any one source. In particular given a few assumptions about the underlying source material, it can reconstruct a good guess of the higher resolution original signal before sampling to the pixel grid. This lets you do finer subpel mocomp. Imagine for example that you have some black and white text that is slowly translating. On any one given frame there will be lots of gray edges due to the antialiased pixel sampling. Even if you perfectly know the subpixel location of that text on the target frame, you have no single reference frame to mocomp from. Instead you create super-resolution reference frame of the original signal and subpel mocomp from that.

Partitioned block transforms. One of the minor improvements in image coding lately, which is natural to move to video coding, is PBT with more flexible sizes. This means 8x16, 4x8, 4x32, whatever, lots of partition sizes, and having block transforms for that size of partitition. This lets the block transform match the data better. Which also leads us to -

Directional transforms and trained transforms. Another big step is not always using an X & Y oriented orthogonal DCT. You can get a big win by doing directional transforms. In particular, you find the directions of edges and construct a transform that has its bases aligned along those edges. This greatly reduces ringing and improves energy compaction. The problem is how do you signal the direction or the transform data? One option is to code the direction as extra side information, but that is probably prohibitive overhead. A better option is to look at the local pixels (you already have decoded neighbors) and run edge detection on them and find the local edge directions and use that to make your transform bases. Even more extreme would be to do a fully custom transform construction from local pixels (and the same neighborhood in the last frame), either using competition (select from a set of of transforms based on which one would have done best on those areas) or training (build the KLT for those areas). Custom trained bases are especially useful for "weird" images like Barb. These techniques can also be used for ...

Intra prediction. Like residual transforms, you want directional intra prediction that runs along the edges of your block, and ideally you don't want to send bits to flag that direction, rather figure it out from neighbors & previous frame (at least to condition your probabilities). Aside from finding direction, neighbors could be used to vote for or train fully custom intra predictors. One of the H265 proposals is basically GLICBAWLS applied to intra prediction - that is, train a local linear predictor by doing weighted LSQR on the neighborhood. There are some other equally insane intra prediction proposals - basically any texture synthesis or prediction paper over the last 10 years is fair game for insane H265 intra prediction proposals, so for example you have suggestions like Markov 2x2 block matching intra prediction which builds a context from the local pixel neighborhood and then predicts pixels that have been seen in similar contexts in the image so far.

Unblocking filters ("loop filtering" huh?) are an obvious area for improvement. The biggest area for improvement is deciding when a block edge has been created by the codec and when it is in the source data. This can actually usually be figured out if the unblocking filter has access to not just the pixels, but how they were coded and what they were mocomped from. In particular, it can see whether the code stream was *trying* to send a smooth curve and just couldn't because of quantization, or whether the code stream intentionally didn't send a smooth curve (eg. it could have but chose not to).

Subpel filters. There are a lot of proposal on improved sub-pixel filters. Obviously you can use more taps to get better (sharper) frequency response, and you can add 1/8 pel or finer. The more dramatic proposals are to go to non-separable filters, non-axis aligned filters (eg. oriented filters), and trained/adaptive filters, either with the filter coefficients transmitted per frame or again deduced from the previous frame. The issue is that what you have is just a pixel sampled aliased previous frame; in order to do sub-pel filtering you need to make some assumptions about the underlying image signal; eg. what is the energy in frequencies higher than the sampling limit? Different sub-pel filters correspond to different assumptions about the beyond-nyquist frequency content. As usual orienting filters along edges helps.

Improved entropy coding. So far as I can tell there's nothing too interesting here. Current video coders (H264) use entropy coders from the 1980's (very similar to the Q-coder stuff in JPEG-ari), and the proposals are to bring the entropy coding into the 1990's, on the level of ECECOW or EZDCT.

5/19/2010

05-19-10 - Some quick notes on VP8

The VP8 release is exciting for what it might be in two years.

If it in fact becomes a clean open-source video standard with no major patent encumbrances, it might be well integrated in Firefox, Windows Media, etc. etc. - eg. we might actually have a video format that actually just WORKS! I don't even care if the quality/size is really competitive. How sweet would it be if there was a format that I knew I could download and it would just play back correctly and not give me any headaches. Right now that does not exist at all. (it's a sad fact that animated GIF is probably the most portable video format of the moment).

Now, you might well ask - why VP8 ? To that I have no good answer. VP8 seems like a messy cock-assed standard which has nothing in particular going for it. The entropy encoder in particular (much like H264) seems badly designed and inefficient. The basics are completely vanilla, in that it is block based, block modes, movecs, transforms, residual coding. In that sense it is just like MPEG1 or H265. That is a perfectly fine thing to do, and in fact it's what I've wound up doing, but you could pull a video standard like that out of your ass in about five minutes, there's no need to license code for that. If in fact VP8 does dodge all the existing patents then that would be a reason that it has value.

The VP8 code stream is probably pretty weak (I really don't know enough of the details to say for sure). However, what I have learned of late is that there is massive room for the encoder to make good output video even through a weak code stream. In fact I think a very good encoder could make good output from an MPEG2 level of code stream. Monty at Xiph has a nice page about work on Theora. There's nothing really cutting edge in there but it's nicely written and it's a good demonstration of the improvement you can get on a fixed standard code stream just with encoder improvements (and really their encoder is only up to "good but still basic" and not really into the realm of wicked-aggressive).

The only question we need to ask about the VP8 code stream is : is it flexible enough that it's possible to write a good encoder for it over the next few years? And it seems the answer is yes. (contrast this to VP3/Theora which has a fundamentally broken code stream which has made it very hard to write a good encoder).

ADDENDUM : this post by Greg Maxwell is pretty right on.

ADDENDUM 2 : Something major that's been missing from the web discussions and from the literature about video for a long time is the separation of code stream from encoder. The code stream basically gives the encoder a language and framework to work in. The things that Jason / Dark Shikary thinks are so great about x264 are almost entirely encoder-side things that could apply to almost any code stream (eg. "psy rdo" , "AQ", "mbtree", etc.). The literature doesn't discuss this much because they are trapped in the pit of PSNR comparisons, in which encoder side work is not that interesting. Encoder work for PSNR is not interesting because we generally know directly how to optimizing for MSE/SSD/L2 error - very simple ways like flat quantizers and DCT-space trellis quant, etc. What's more interesting is perceptual quality optimization in the encoder. In order to acheive good perceptual optimization, what you need is a good way to measure percpetual error (which we don't have), and the ability to try things in the code stream and see if they improve perceptual error (hard due to non-local effects), and a code stream that is flexible enough for the encoder to make choices that create different kinds of errors in the output. For example adding more block modes to your video coder with different types of coding is usually/often bad in a PSNR sense because all they do is create redundancy and take away code space from the normal modes, but it can be very good in a perceptual sense because it gives the encoder more choice.

ADDENDUM 3 : Case in point , I finally have noticed some x264 encoded videos showing up on the torrent sites. Well, about 90% of them don't play back on my media PC right. There's some glitching problem, or the audio & video get out of sync, or the framerate is off a tiny bit, or some shit and it's fucking annoying.

ADDENDUM 4 : I should be more clear - the most exciting thing about VP8 is that it (hopefully) provides an open patent-free standard that can then be played with and discussed openly by the development community. Hopefully encoders and decoder will also be open source and we will be able to talk about the techniques that go into them, and a whole new

5/13/2010

05-13-10 - P4 with NiftyPerforce and no P4SCC

I'm trying using P4 in MSDev with NiftyPerforce and no P4SCC.

What this means is VC thinks you have no SCC connection at all, your files are just on your disk. You need to change the default NiftyPerforce settings so that it checks out files for you when you edit/save etc.

Advantages of NiftyPerforce without P4SCC :

1. Much faster startup / project load, because it doesn't go and check the status of everything in the project with P4.

2. No clusterfuck when you start unconnected. This is one the worst problems with P4SCC, for example if you want to work on some work projects but can't VPN for some reason, P4SCC will have a total shit fit about working disconnected. With the NiftyPerforce setup you just attrib your files and go on with your business.

3. No difficulties with changing binding/etc. This is another major disaster with P4SCC. It's rare, but if you change the P4 location of a project or change your mappings or if you already have some files added to P4 but not the project, all these things give MSdev a complete shit-fit. That all goes away.

Disadvantages of NiftyPerforce without P4SCC :

1. The first few keystrokes are lost. When you try to edit a checked-in file, you can just start typing and Nifty will go check it out, but until the checkout is done your keystrokes go to never-never land. Mild suckitude. Alternatively you could let MSDev pop up the dialog for "do you want to edit this read only file" which would make you more aware of what's going on but doesn't actually fix the issue.

2. No check marks and locks in project browser to let you know what's checked in / checked out. This is not a huge big deal, but it is a nice sanity check to make sure things are working the way they should be. Instead you have to keep an eye on your P4Win window which is a mild productivity hit.

One note about making the changeover : for existing projects that have P4SCC bindings, if you load them up in VC and tell VC to remove the binding, it also will be "helpful" and go attrib all your files to make them writeable (it also will be unhelpful and not check out your projects to make the change to not have them bound). Then NiftyPerforce won't work because your files are already writeable. The easiest way to do this right is to just open your vcproj's and sln's in a text editor and rip out all the binding bits manually.

I'm not sure yet whether the pros/cons are worth it. P4SCC actually is pretty nice once it's set up, though the ass-pain it gives when trying to make it do something it doesn't want to do (like source control something that's out of the binding root) is pretty severe.

ADDENDUM :

I found the real pro & con of each way.

Pro P4SCC : You can just start editting files in VC and not worry about it. It auto-checks out files from P4 and you don't lose key presses. The most important case here is that it correctly handles files that you have not got the latest revision of - it will pop up "edit current or sync first" in that case. The best way to use Nifty seems to be Jim's suggestion - put checkout on Save, do not checkout on Edit, and make files read-only editable in memory. That works great if you are a single dev but is not super awesome in an actual shared environment with heavy contention.

Pro NiftyP4 : When you're working from home over an unreliable VPN, P4SCC is just unworkable. If you lose connection it basically hangs MSDev. This is so bad that it pretty much completely dooms P4SCC. ARG actually I take that back a bit, NiftyP4 also hangs MSDev when you lose connection, though it's not nearly as bad.

5/12/2010

05-12-10 - P4 By Dir

(ADDENDUM : see comments, I am dumb).

I mentioned this before :

(Currently that's not a great option for me because I talk to both my home P4 server and my work P4 server, and P4 stupidly does not have a way to set the server by local directory. That is, if I'm working on stuff in c:\home I want to use one env spec and if I'm in c:\work, use another env spec. This fucks up things like NiftyPerforce and p4.exe because they just use a global environment setting for server, so if I have some work code and some home code open at the same time they shit their pants. I think that I'll make my own replacement p4.exe that does this the right way at some point; I guess the right way is probably to do something like CVS/SVN does and have a config file in dirs, and walk up the dir tree and take the first config you find).

But I'm having second thoughts, because putting little config shitlets in my source dirs is one of the things I hate about CVS. Granted it would be much better in this case - I would only need a handful of them in my top level dirs, but another disadvantage is my p4bydir app would need to scan up the dir tree all the time to find config files.

And there's a better way. The thing is, the P4 Client specs already have the information of what dirs on my local machine go with what depot mappings. The problem is the client spec is not actually associated with a server. What you need is a "port client user" setting. These are stored as favorites in P4Win, but there is no authoritative list of the valid/good "port client user" setups on a machine.

So, my new idea is that I store my own config file somewhere that lists the valid "port client user" sets that I want to consider in p4bydir. I load that and then grab all the client specs. I use the client specs to identify what dirs to map to where, and the "port client user" settings to tell what p4 environment to set for that dir.

I then replace the global p4.exe with my own p4bydir so that all apps (like NiftyPerforce) will automatically talk to the right connection whenever they do a p4 on a file.

05-12-10 - Cleartype

Since I ranted about Cleartype I thought I'd go into a bit more detail. this article on Cleartype in Win7 is interesting, though also willfully dense.

Another research question we�ve asked ourselves is why do some people prefer bi-level rendering over ClearType? Is it due to hardware issues or is there some other attribute that we don�t understand about visual systems that is playing a role. This is an issue that has piqued our curiosity for some time. Our first attempt at looking further into this involved doing an informal and small-scale preference study in a community center near Microsoft.

Wait, this is a research question ? Gee, why would I prefer perfect black and white raster fonts to smudged and color-fringed cleartype. I just can't imagine it! Better do some community user testing...

1. 35 participants. 2. Comments for bi-level rendering: Washed out; jiggly; sketchy; if this were a printer, I�d say it needed a new cartridge; fading out � esp. the numbers, I have to squint to read this, is it my glasses or it is me?; I can�t focus on this; broken up; have to strain to read; jointed. 3. Comments for ClearType: More defined, Looks bold (several times), looks darker, clearer (4 times), looks like it�s a better computer screen (user suggested he�d pay $500 more for the better screen on a $2000 laptop), sort of more blue, solid, much easier to read (3 times), clean, crisp, I like it, shows up better, and my favorite: from an elderly woman who was rather put out that the question wasn�t harder: this seems so obvious (said with a sneer.)

Oh my god, LOL, holy crap. They are obviously comparing Cleartyped anti-aliased fonts to black-and-white rendered TrueType fonts, NOT to raster fonts. They're probably doing big fonts on a high DPI screen too. Try it again on a 24" LCD with an 8 point font please, and compare something that has an unhinted TrueType and an actual hand-crafted raster font. Jesus. Oh, but I must be wrong because the community survey says 94% prefer cleartype!

Anyway, as usual the annoying thing is that in pushing their fuck-tard agenda, they refuse to acknowledge the actual pros and cons of each method and give you the controls you really want. What I would like is a setting to make Windows always prefer bitmap fonts when they exist, but use ClearType if it is actually drawing anti-aliased fonts. Even then I still might not use it because I fucking hate those color fringes, but it would be way more reasonable. Beyond that obviously you could want even more control like switching preferrence for cleartype vs. bitmap per font, or turning on and off hinting per font or per app, etc. but just some more reasonable global default would get you 90% of the way there. I would want something like "always prefer raster font for sizes <= 14 point" or something like that.

Text editors are a simple case because you just to let the user set the font and get what they want, and it doesn't matter what size the text is because it's not layed out. PDF's and such I guess you go ahead and use TT all the time. The web is a weird hybrid which is semi-formatted. The problem with the web is that it doesn't tell you when formatting is important or not important. I'd like to override the basic firefox font to be my own choice nice bitmap font *when formatting is not important* (eg. in blocks of text like I make). But if you do that globally it hoses the layout of some pages. And then other pages will manually request fonts which are blurry bollocks.

CodeProject has a nice font survey with Cleartype/no-Cleartype screen caps.

GDI++ is an interesting hack to GDI32.dll to replace the font rendering.

Entropy overload has some decent hinted TTF fonts for programmers you can use in VS 2010.

Electronic Dissonance has the real awesome solution : sneak raster fonts into asian fonts so that VS 2010 / WPF will use them. This is money if you use VS 2010.

5/11/2010

05-11-10 - Some New Cblib Apps

Coded up some new goodies for myself today and released them in a new cblib and chuksh .

RunOrActivate : useful with a hot key program, or from the CLI. Use RunOrActivate [program name]. If a running process of that program exists, it will be activated and made foreground. If not, a new instance is started. Similar to the Windows built-in "shortcut key" functionality but not horribly broken like that is.

(BTW for those that don't know, Windows "shortcut keys" have had huge bugs ever since Win 95 ; they sometimes work great, basically doing RunOrActivate, but they use some weird mechanism which causes them to not work right with some apps (maybe they use DDE?), they also have bizarre latency semi-randomly, usually they launch the app instantly but occasionally they just decide to wait for 10 seconds or so).

RunOrActivate also has a bonus feature : if multiple instances of that process are running it will cycle you between them. So for example my Win-E now starts an explorer, goes to existing one if there was one, and if there were a few it cycles between explorers. Very nice. Also works with TCC windows and Firefox Windows. This actually solves a long-time useability problem I've had with shortcut keys that I never thought about fixing before, so huzzah.

WinMove : I've been using this forever, lets you move and resize the active window in various ways, either by manual coordinate or with some shorthands for "left half" etc. Anyway the new bit is I just added an option for "all windows" so that I can reproduce the Win-M minimize all behavior and Win-Shift-M restore all.

I think that gives me all Win-Key functions I actually want.

ADDENDUM : One slightly fiddly bit is the question of *which* window of a process to activate in RunOrActivate. Windows refuses to give you any concept of the "primary" window of a process, simply sticking to the assertion that processes can have many windows. However we all know this is bullshit because Alt-Tab picks out an isolated set of "primary" windows to switch between. So how do you get the list of alt-tab windows? You don't. It's "undefined", so you have to make it up somehow. Raymond Chen describes the algorithm used in one version of Windows.

old rants