05-29-10 - Some more x64

Okay , MASM/MSDev support for x64 is a bit fucked. MSDev has built-in support for .ASM starting in VC 2005 which does everything for you, sets up custom build rule, etc. The problem is, it hard-codes to ML.EXE - not ML64. Apparently they have fixed this for VC 2010 but it is basically impossible to back-fix. (in VC 2008 the custom build rule for .ASM is in an XML file, so you can fix it yourself thusly )

The workaround goes like this :

Go to "c:\program files (x86)\microsoft visual studio 8\vc\bin". Find the occurance of ML64.exe ; copy them to ML.exe . Now you can add .ASM files to your project. Go to the Win32 platform config and exclude them from build in Win32.

You now have .ASM files for ML64. For x86/32 - just use inline assembly. For x64, you extern from your ASM file.

Calling to x64 ASM is actually very easy, even easier than x86, and there are more volatile registers and the convention is that caller has to do all the saving. All of this means that you as a little writer of ASM helper routines can get away with doing very little. Usually your args are right there in {rcx,rdx,r8,r9} , and then you can use {rax,r10,r11} as temp space, so you don't even have to bother with saving space on the stack or any of that. See list of volatile registers

BTW the best docs are just the full AMD64 manuals .

For example here's a full working .ASM file :

public my_cmpxchg64


align 8
my_cmpxchg64 PROC 

  mov rax, [rdx]
  lock cmpxchg [rcx], r8
  jne my_cmpxchg64_fail
  mov rax, 1

align 8
  mov [rdx], rax
  mov rax, 0
align 8
my_cmpxchg64 ENDP


And how to get to it from C :

extern "C"  extern int my_cmpxchg64( uint64 * val, uint64 * oldV, const uint64 newV );

BTW one of the great things about posting things on the net is just that it makes me check myself. That cmpxchg64 has a stupid branch, I think this version is better :

align 8
my_cmpxchg64 PROC
  mov rax, [rdx]
  lock cmpxchg [rcx], r8
  sete cl
  mov [rdx], rax
  movzx rax,cl
my_cmpxchg64 ENDP

and you can probably do better. (for example it's probably better to just define your function as returning unsigned char and then you can avoid the movzx and let the caller worry about that)

ADDENDUM : I just found a new evil secret way I'm fucked. Unions with size mismatches appears not to even be a warning of any kind. So for example you can silently have this in your code :

union Fucked
        void * p1;
        int t;
    } s;
    uint64  i;

build in 64 bit and it's just hose city. BTW I think using unions as a datatype in general is probably bad practice. If you need to be doing that for some fucked reason, you should just store the member as raw bits, and then same_size_bit_cast() to convert it to the various types. In other words, the dual identity of that memory should be a part of the imperative code, not a part of the data declaration.

05-29-10 - Lock Free in x64

I mentioned long ago in the low level threading articles that some of the algorithms are a bit problematic on with 64 bit pointers because we don't have large enough atomic operations.

The basic problem is that for many of the lock-free algorithms we need to be able to do a DCAS , that is a CAS of two pointer-sized values, or a pointer and a counter. When our pointer was 32 bits, we could use a 64 bit CAS to implement DCAS. If our pointer is 64 bits then we need a 128 bit CAS to implement DCAS the same way. There are various solutions to this :

1. Use 128 bit CAS. x64 has cmpxchg16b now which is exactly what you need. This is obviously simple and nice. There are a few minor problems :

1.A. There are not other 128 bit atomics, eg. Exchange and Add and such are missing. These can be implemented in terms of loops of CAS, but that is a very minor suckitude.

1.B. Early AMD64 chips do not have cmpxchg16b. You have to check for its presence with a CPUID call. If it doesn't exist you are seriously fucked. Fortunately these chips are pretty rare, so you can just use a really evil fallback to keep working on them : either disable threading completely on them, or simply run the 32 bit version of your app. The easiest way to do that is to have your installer check the CPUID flag and install the 32 bit x86 version of your app instead of the 64 bit version.

1.C. All your lock-free nodes become 16 bytes instead of 8 bytes. This does things like make your minimum alloc size 16 bytes instead of 8 bytes. This is part of the general bloating of 64 bit structs and mildly sucks. (BTW you can see this in winnt.h as MEMORY_ALLOCATION_ALIGMENT is 16 on Win64 and 8 on Win32).

1.D. _InterlockedCompareExchange128 only exists on newer versions of MSVC so you have to write it yourself in ASM for older versions. Urg.

So #1 is an okay solution, but what are the alternative ?

2. Pack {Pointer,Count} into 64 bits. This is of course what Windows does for SLIST, so doing this is actually very safe. Currently pointers on Windows are only 44 bits because of this. They will move to 48 and then 52. You can easily store a 52 bit pointer + a 16 bit count in 64 bits (the 52 bit pointer has the bottom four bits zero so you actually have 16 bits to work with). Then you can just keep using 64 bit CAS. This has no disadvantage that I know of other than the fact that twenty years from now you'll have to touch your code again.

3. You can implement arbitrary-sized CAS in terms of pointer CAS. The powerful standard paradigm for this type of thing is to use pointers to data instead of data by value, so you are just swapping pointers instead of swapping values. It's very simple, when you want to change a value, you malloc a copy of it and change the copy, and then swap in the pointer to the new version. You CAS on the pointer swap. The "malloc" can just be taking data from a recycled buffer which uses hazard pointers to keep threads from using the same temp item at the same time. This is a somewhat more complex way to do things conceptually, but it is very powerful and general, and for anyone doing really serious lockfree work, a hazard pointer system is a good thing to have. See for example "Practical Lock-Free and Wait-Free LL/SC/VL Implementations Using 64-Bit CAS".

You could also of course use a hybrid of 2 & 3. You could use a packed 64 bit {pointer,count} until your pointer becomes more than 52 bits, and then switch to a pointer to extra data.

05-29-10 - x64 so far

x64 linkage that's been useful so far :

__asm__ cmpxchg8bcmpxchg16b - comp.programming.threads Google Groups
_InterlockedCompareExchange Intrinsic Functions
x86-64 Tour of Intel Manuals
x64 Starting Out in 64-Bit Windows Systems with Visual C++
Writing 64-bit programs
Windows Data Alignment on IPF, x86, and x86-64
Use of __m128i as two 64 bits integers
Tricks for Porting Applications to 64-Bit Windows on AMD64
The history of calling conventions, part 5 amd64 - The Old New Thing - Site Home - MSDN Blogs
Snippets lifo.h
Predefined Macros (CC++)
Physical Address Extension - PAE Memory and Windows
nolowmem (Windows Driver Kit)
New Intrinsic Support in Visual Studio 2008 - Visual C++ Team Blog - Site Home - MSDN Blogs
Moving to Windows Vista x64
Moving to Windows Vista x64 - CodeProject
Mark Williams Blog jmp'ing around Win64 with ml64.exe and Assembly Language
Kernel Exports Added for Version 6.0
Is there a portable equivalent to DebugBreak()__debugbreak - Stack Overflow
How to Log Stack Frames with Windows x64 - Stack Overflow
BCDEdit Command-Line Options
Available Switch Options for Windows NT Boot.ini File
AMD64 Subpage
AMD64 (EM64T) architecture - CodeProject
20 issues of porting C++ code on the 64-bit platform

One unexpected annoyance has been that a lot of the Win32 function signatures have changed. For example LRESULT is now a pointer not a LONG. This is a particular problem because Win32 has always made heavy use of cramming the wrong type into various places, eg. for GetWindowLong and stuffing pointers in LPARAM's and all that kind of shit. So you wind up having tons of C-style casts when you write Windows code. I have made good use of these guys :

// same_size_bit_cast casts the bits in memory
//  eg. it's not a value cast
template < typename t_to, typename t_fm >
t_to & same_size_value_cast( t_fm & from )
    COMPILER_ASSERT( sizeof(t_to) == sizeof(t_fm) );
    // just value cast :
    return (t_to) from;

// same_size_bit_cast casts the bits in memory
//  eg. it's not a value cast
template < typename t_to, typename t_fm >
t_to & same_size_bit_cast_p( t_fm & from )
    COMPILER_ASSERT( sizeof(t_to) == sizeof(t_fm) );
    // cast through char * to make aliasing work ?
    char * ptr = (char *) &from;
    return *( (t_to *) ptr );

// same_size_bit_cast casts the bits in memory
//  eg. it's not a value cast
// cast with union is better for gcc / Xenon :
template < typename t_to, typename t_fm >
t_to & same_size_bit_cast_u( t_fm & from )
    COMPILER_ASSERT( sizeof(t_to) == sizeof(t_fm) );
    union _bit_cast_union
        t_fm fm;
        t_to to;        
    _bit_cast_union converter = { from };
    return converter.to;

// check_value_cast just does a static_cast and makes sure you didn't wreck the value
template < typename t_to, typename t_fm >
t_to check_value_cast( const t_fm & from )
    t_to to = static_cast<t_to>(from);
    ASSERT( static_cast<t_fm>(to) == from );
    return to;

inline int ptr_diff_32( ptrdiff_t diff )
    return check_value_cast<int>(diff);

BTW this all has made me realize that the recent x86-32 monotony on PC's has been a delightful stable period for development. I had almost forgotten that it used to be always like this. Now to do simple shit in my code, I have to detect if it's x86 or x64 , if it is x64, do I have an MSC version that has the intrinsics I need? if not I have to write a got damn MASM file. Oh and I often have the check for Vista vs. XP to tell if I have various kernel calls. For example :

#if _MSC_VER > 1400

// have intrinsic

#elif _X86_NOT_X64_

// I can use inline asm
__asm { cmpxchg8b ... }


// kernel library call available


// X64 , not Vista (or want to be XP compatible) , older compiler without intrinsic,
//  FUCK !

#error just use a new newer MSVC version for X64 because I don't want to fucking write MASM rouintes


Even ignoring the pain of the last FUCK branch which requires making a .ASM file, the fact that I had to do a bunch of version/target checks to get the right code for the other paths is a new and evil pain.

Oh, while I'm ranting, fucking MSDN is now showing all the VS 2010 documentation by default, and they don't fucking tell you what version things became available in.

This actually reminds me of the bad old days when I got started, when processors and instruction sets were changing rapidly. You actually had to make different executables for 386/486 and then Pentium, and then PPro/P3/etc (not to mention the AMD chips that had their own special shiznit). Once we got to the PPro it really settled down and we had a wonderful monotony of well developed x86 on out-of-order machines that continued up to the new Core/Nehalem chips (only broken by the anomalous blip of Itanium that we all ignored as it went down in flames like the Hindenburg). Obviously we've had consoles and Mac and other platforms to deal with, but that was for real products that want portability to deal with, I could write my own Wintel code for home and not think about any of that. Well Wintel is monoflavor no more.

The period of CISC and chips with fancy register renaming and so-on was pretty fucking awesome for software developers, because you see the same interface for all those chips, and then behind the scenes they do magic mumbo jumbo to turn your instructions into fucking gene sequences that multiply and create bacterium that actually execute the instructions, but it doesn't matter because the architecture interface still just looks the same to the software developer.


05-28-10 - Foolishness

Many of the food blog snobistas descend into this foolishness of making everything at home when some things are just not a wise use of time. I mean I'm sure these homemade chicharrones are delicious and all, but fuck that's a lot of work when you could just walk over to the local Mexicatessen and buy a big bag that was also freshly fried in their house-made lard. And while you're there buy some carnitas and tortillas too and have a much better meal for cheaper and less work.

It's real foolishness when people say things like climbing is not a hard workout . I just have to roll my eyes pretty much anytime anybody talks about exercise because you just all don't get it. *Anything* is a workout if you make it a workout. There's no inherent difficulty level of any activity, it depends how hard you do it. You hear retards all the time saying "yoga's not a hard workout" , well maybe if you do it like a moron it's not, use some more intensity, make it more difficult for yourself if you need more work. I've heard plenty of people tell me "biking's not a hard enough workout". Oh really? Go faster, dumbass, or maybe try going up some hills.

My anger at the drivers around here grows and grows until it hits a boiling point where I just become depressed about how fucking stupid and selfish you all are. It really is amazing to me that people here constantly blow through yellows, roll through stop signs, and yet take forever to get moving when a light turns green and are busy-bodies about my speed (my speed which is almost always less than that of the SUV that's screeching around the corner at 90% of its limit so it would be unable to make a quick correction if anything surprising happened). What a boring topic, I apologize. At least in places like LA people are more consistently aggressive assholes; it's less hypocritical.

I've been working at home recently and it's been a great boost of productivity for me. It's so good to not have to worry about when I'm going to try to make the commute to avoid traffic, and it's awesome to be able to go directly from morning coffee to coding, which is the most productive instant of the day for me. There are two problems : 1. I got a nice standing desk all set up at work and I miss having it at home, I'm hurting my body spending too much time in a chair. 2. it's a little hard because N is also home many days, and there's a bit of difficult tension when I have to say "leave me alone I'm working".

I really don't like the way the financial meltdown narrative has been crafted by the media. One of the false narratives is the "black swan" story - that everything was being done with mathematical models and that it was a very unlikely but high impact event that was not accounted for in the models that caused the crisis. The other false narrative is that it was evil investment bankers at Goldman or similar that somehow caused it all. The reality is it was caused by ignorance and greed and corruption at almost every level of society. From presidents and congress who stripped regulation from the finance and mortgage industry, to the Fed keeping rates way too low and not monitoring banks well, to Fannie Mae et.al. underwriting too many loans, to Countrywide et.al. intentionally issuing loans they knew were bad to make more profit, to Goldman et.al. for packaging loans they knew were bad and selling them as safer than they really were, to the ratings agencies asleep at the wheel, to individual real estate investors getting in way over their heads trying to make an easy buck, etc. etc.


05-27-10 - Weird Compiler Error

Blurg just fought one of the weirder problems I've ever seen.

Here's the simple test case I cooked up :

void fuck()
#ifdef RR_ASSERT
#pragma RR_PRAGMA_MESSAGE("yes")
#pragma RR_PRAGMA_MESSAGE("no")


And here is the compiler error :

1>.\rrSurfaceSTBI.cpp(43) : message: yes
1>.\rrSurfaceSTBI.cpp(48) : error C3861: 'RR_ASSERT': identifier not found

Eh !? Serious WTF !? I know RR_ASSERT is defined, and then it says it's not found !? WTF !?

Well a few lines above that is the key. There was this :

#undef  assert
#define assert  RR_ASSERT

which seems like it couldn't possibly cause this, right? It's just aliasing the standard C assert() to mine. Not possible related, right? But when I commented out that bit the problem went away. So of course my first thought is clean-rebuild all, did I have precompiled headers on by mistake? etc. I assume the compiler has gone mad.

Well, it turns out that somewhere way back in RR_ASSERT I was in a branch that caused me to have this definition for RR_ASSERT :

#define RR_ASSERT(exp)  assert(exp)

This creates a strange state for the preprocessor. RR_ASSERT is now a recursive macro. When you actually try to use it in code, the preprocessor apparently just bails and doesn't do any text substitution. But, the name of the preprocessor symbol is still defined, so my ifdef check still saw RR_ASSERT existing. Evil.

BTW the thing that kicked this off is that fucking VC x64 doesn't support inline assembly. ARGH YOU COCK ASS. Because of that we had long ago written something like

#ifdef _X86
#define RR_ASSERT_BREAK()  __asm int 3
#define RR_ASSERT_BREAK()  assert(0)

which is what caused the difficulty.

05-27-10 - Loop Branch Inversion

A major optimization paradigm I'm really missing from C++ is something I will call "loop branch inversion". The problem is for code sharing and cleanliness you often wind up with cases where you have a lot of logic in some outer loops that find all the things you should work on, and then in the inner loop you have to do a conditional to pick what operation to do. eg :

    Make bounding area
    Do Kd-tree descent .. 
    loop ( tree nodes )
        bounding intersection, etc.
        found an object

The problem is that DoPerObjectWork then is some conditional, maybe something like :


or even worse - it's a function pointer that you call back.

Instead you would like the switch on workType to be on the outside. WorkType is a constant all the way through the code, so I can just propagate that branch up through the loops, but there's way to express it neatly in C.

The only real option is with templates. You make DoPerObjectWork a functor and you make LoopAndDoWork a template. The other option is to make an outer loop dispatcher to constants. That is, make workType a template parameter instead of an integer :

template < int workType >
void t_LoopAndDoWork(query)

and then provide a dispatcher which does the branch outside :

    case 0 : t_LoopAndDoWork<0>(query); break;
    case 1 : t_LoopAndDoWork<1>(query); break;

this is an okay solution, but it means you have to reproduce the branch on workType in the outer loop and inner loop. This is not a speed penalty becaus the inner loop is a branch on constant which goes away, it's just ugly for code maintenance purposes because they have to be kept in sync and can be far apart in the code.

This is a general pattern - use templates to turn a variable parameter into a constant and then use an outer dispatcher to turn a variable into the right template call. But it's ugly.

BTW when doing this kind of thing you are often wind up with loops on constants. The compiler often can't figure out that a loop on a constant can be unrolled. It's better to rearrange the loop on constant into branches. For example I'm often doing all this on pixels where the pixel can have between 1 and 4 channels. Instead of this :

for(int c=0;c<channels;c++)

where channels is a constant (template parameter), it's better to do :

if ( channels > 1 ) DoStuff(1);
if ( channels > 2 ) DoStuff(2);
if ( channels > 3 ) DoStuff(3);

because those ifs reliably go away.


05-26-10 - Windows Page Cache

The correct way to cache things is through Windows' page cache. The advantage from doing this over using your own custom cache code is :

1. Automatically resizes based on amount of memory needed by other apps. eg. other apps can steal memory from your cache to run.

2. Automatically gives pages away to other apps or to file IO or whatever if they are touching their cache pages more often.

3. Automatically keeps the cache in memory between runs of your app (if nothing else clears it out). This is pretty immense.

Because of #3, your custom caching solution might slightly beat using the Windows cache on the first run, but on the second run it will stomp all over you.

To do this nicely, generally the cool thing to do is make a unique file name that is the key to the data you want to cache. Write the data to a file, then memory map it as read only to fetch it from the cache. It will now be managed by the Windows page cache and the memory map will just hand you a page that's already in memory if it's still in cache.

The only thing that's not completely awesome about this is the reliance on the file system. It would be nice if you could do this without ever going to the file system. eg. if the page is not in cache, I'd like Windows to call my function to fill that page rather than getting it from disk, but so far as I know this is not possible in any easy way.

For example : say you have a bunch of compressed images as JPEG or whatever. You want to keep uncompressed caches of them in memory. The right way is through the Windows page cache.

05-26-10 - Windows 7 Snap

My beloved "AllSnap" doesn't work on Windows 7 / x64. I can't find a replacement because fucking Windows has a feature called "Snap" now, so you can't fucking google for it. (also searching for "Windows 7" stuff in general is a real pain because solutions and apps for the different variants of windows don't always use the full name of the OS they are for in their page, so it's hard to search for; fucking operating systems really need unique code names that people can use to make it possible to search for them; "Windows" is awful awful in this regard).

I contacted the developer of AllSnap to see if he would give me the code so I could fix it, but he is ignoring me. I can tell from debugging apps when AllSnap is installed that it seems to work by injecting a DLL. This is similar to how I hacked the poker sites for GoldBullion, so I think I could probably reproduce that. But I dunno if Win7/x64 has changed anything about function injection and the whole DLL function pointer remap method.

BTW/FYI the standard Windows function injection method goes like this : Make a DLL that has some event handler. Run a little app that causes that event to trip inside the app you want to hijack. Your DLL is now invoked in that app's process to handle that event. Now you are running in that process so you can do anything you want - in particular you can find the function table to any of their DLL's, such as user32.dll, and stuff your own function pointer into that memory. Now when the app makes normal function calls, they go through your DLL.


05-25-10 - Thread Insurance

I just multi-threaded my video test app recently, and it was reasonably easy, but I had a few nagging bugs because of hidden ways they were touching shared memory without protection deep inside functions. Okay, so I found them and fixed them, but I'm left with a problem - any time I touch one of those deep functions, I could screw up the threading without realizing it. And I might not get any indication of what I did for weeks if it's a rare race.

What I would like is a way to make this more robust. I have very strong threading primitives, I want a way to make sure that I use them! In particular, I want to be able to mark certain structs as only touchable when a critsec is locked or whatever.

I think that a lot of this could be done with Win32 memory page protections. So far as I know there's no way to associate protections per-thread, (eg. to make a page read/write for thread A but no-access for thread B). If I could do that it would be super sweet.

One idea is to make the page no access and then install my own exception handler that checks what thread it is, but that might be too much overhead (and not sure if that would fail for other reasons).

The main usage is not for protected crit-sec'ed structs, that is really the easiest case to maintain because it's very obvious right there in the code that you need to take the critsec to touch the variables. The hard case to maintain is the ad hoc "I know this is safe to touch without protection". In particular I have a lot of code that runs like this :

Phase 1 : I know no threads are touching shared data item A
main thread does lots of writing in A

Phase 2 : fire up threads.  They only read from A and do so without protection.  They each write to unique areas B,C,D.

Phase 3 : spin down threads.  Now main thread can write A and read B,C,D.

So what I would really like to do is :

Phase 1 : I know no threads are touching shared data item A
main thread does lots of writing in A

-- set A memory to be read-only !
-- set B,C,D memory to be read/write only for their own thread

Phase 2 : fire up threads.  They only read from A and do so without protection.  They each write to unique areas B,C,D.

-- make A,B,C,D read/write only for main thread !

Phase 3 : spin down threads.  Now main thread can write A and read B,C,D.

The thing that this saves me from is when I'm tinkering in DoComplicatedStuff() which is some function called deep inside Phase 2 somewhere and I change it to no longer follow the memory access rule that it is supposed to be following. This is just my hate for having rules for code correctness that are not enforced by the compiler or at least by run-time asserts.

05-25-10 - State and the Web

There's a major way that the whole iPple device thing is taking us backwards. Plain old HTML (eg. not apps) is awesome in that they actually get something really right :

Minimal state. Recordable state at every transition point. This lets you bookmark your point anywhere in your work, go backwards and forwards, save your spot and come back to it, etc.

This all goes back to the entire state being a little token that you can just grab and store off. Granted, lots of web pages fuck this up because they use some server-side shit and they don't show you all the public state or whatever fucking dick-ass thing they do. But good old fashioned Web gets this awesomely right.

It's actually a paradigm that I think more developers should espouse in their Win32 apps, both publicly and internally.

By "publically" I mean you should expose it to the user - let the user drag off the current spot to a link, and let them restore. This should be in like every app. The full state of the app should be in an edit box somewhere that I can copy/paste or drag to the desktop. I should be able to double-click it to jump back into the app at that same point.

"Internally" I mean it's nice to make sure your state some very simple plain C structures, so that you can just push & pop or save old versions of the state, like :

State save(curState);



curState = save;

this is actually one of the new things I'm doing in my Video Test framework and it has been awesomely useful.

Yeah yeah the C++ way is to give every member a stream-in/stream-out, but it's too hard to maintain that robustly all the time.

This is actually related to another very important programming paradigm in general : minimize state, and avoid redundancy. Don't store variables that are computed from other variables. Don't copy values from one place to another. Always go get them at the original source. This is a massive bug reducer. Every time I see something like "this variable must be kept in sync with this variable" I think "why not just get rid of one of them?".


05-23-10 - Two Windows Woes

Slow net. My god WTF is wrong with Windows networking. (I don't mean the TCP/IP stack, I mean shared computer browsing). What the fuck is wrong with networking in general? Why are there such massive stalls? I mean for browsing my local network, how in the world can it take so fucking long to discover the machines on my fucking LAN !? And if a machine is not there, can't you just fail in like a millisecond !? I mean a fucking millisecond is FOREVER to send a packet of light out on some wires and get a reply back.

I do have one major practical problem with Windows slow networking : my file copy and dir listing routines are ungodly slow across the net. I know this can be done faster. TeraCopy for example is pretty fast, I would love to know what they are doing. The super brute force solution would be to just run my own file system client/server and send packets to my own port. For example if I want to get a dir listing, I just send one packet saying "list this dir" and the listener on the other side does it locally and then sends me back one big packet with the full dir listing. I could run that on TCP/IP and it would be like instant. So how do I get speed like that over proper Windows networking? Or maybe that is the way to go and I could just remote-run my listener app on any machine I want to talk to?

Kill stuck apps. WTF I know you are capable of killing stuck apps, because if I use my own "killproc" app I can kill them cleanly (or another nice way is to attach the debugger to the app and then kill it from there). But sometimes even fucking Task Manager refuses to kill it, and why can't I just kill it from the fucking X box. Okay maybe not the X box because that's just a GUI widget on the app, but let me fucking right-click in the non responding Window and say "yes really fucking kill the fuck out of this piece of shit app".

I might have to write my own app that uses IsHungAppWindow and then hard-kills whatever is not responding. I could put it on a hot key and it would save my bacon when the fucking Task Manager won't run (my god why is Task Manager still just a fucking app like everything else, which means that it can't always get enough CPU or screen display rights; there should be a machine monitor in the ctrl-alt-del screen that is always accessible).

05-23-10 - Misc

I went driving at Pacific Raceways with the Porsche DE ("Driver's Education" , which is the euphemism they use to make it sound safe, it's actually run your fucking suped-up old 911 around a race track at insane speeds and occasionally spin out and toss it into the bushes). Maybe I will write up some more details on it, but it was a fucking blast, the car was *amazing*, I was exhausted, I used about a year's worth of tires and brakes. I highly recommend it.

It is kind of funny to me how people take something that is pure hooliganism and laughs and adrenaline (driving cars fast) and have to turn it into something they can be anal about and practice and study and be "right" about have "the way to do things". It always happens, I mean wine and food and such are the same way, the people who really love something become way too obsessive about it and make it way too analytical and lose focus on the simple joy of it.

Coding for RAD is kind of fucking me up. I have lots of bits of good code that I know I've written but I don't know where they are anymore. For example I know I wrote a bunch of careful stuff to try to sleep to framerate well but I can't find it anymore. Is it in my cblib stuff, or is it in my RAD/Oodle stuff, or is it in my RAD/shared stuff? Urg.

Dell lappy is pretty great. I dropped it and slightly dented the case. Metal cases feel awesome and give that impression of "quality" but in fact plastic is a pretty fucking amazing material to make things out of. Plastic does not get hot, it's super lightweight, it's very tough, and it has this amazing property that it can take an impact, deform, and then return to its original shape. (same goes for car interiors of course).

Part of the problem with plastic car exteriors was that the paints weren't good enough. That's no longer true, there are new amazing paints that can make plastic cars look like metal.

Top Chef Masters is pretty good, way better than the first season of TCM, though fucking Kelly Choi is a real drag (even worse than Padma; at least Padma actually hot, and it's amusing when she's stoned off her ass and says everything is delicious, whereas Kelly is freakish looking with her stick body and giant head, and makes that weird forced-smile face all the time; they both share the inability to just read a freaking cue card smoothly). (it's a real pet peeve of mine when people think that someone is hot just for being thin; thinness is correlated to hotness, but it is not causal!). (and of course anybody who's on TV at all will have a million weirdos who insist she's super hot).


05-21-10 - Video coding beyond H265

In the end movec-residual coding is inherently limitted and inefficient. Let's review the big advantage of it and the big problem.

The advantage is that the encoder can reasonably easy consider {movec,residual} coding choices jointly. This is a huge advantage over just picking what's my best movec, okay now code the residual. Because movec affects the residual, you cannot make a good R/D decision if you do it separately. By using block movecs, it reduces the number of options that need to be considered to a small enough set that encoders can practically consider a few important choices and make a smart R/D decision. This is what is behind all current good video encoders.

The disadvantage of movec-residual coding is that they are redundant and connected in a complex and difficult to handle way. We send them independently, but really they have cross-information about each other, and that is impossible to use in the standard framework.

There are obviously edges and shapes in the image which occur in both the movecs and the residuals. eg. a moving object will have a boundary, and really this edge should be used for both the movec and residual. In the current schemes we send a movec for the block, and then the residuals per pixel, so we now have finer grain information in the residual that should have been used to give us finer movecs per pixel, but it's too late now.

Let's back up to fundamentals. Assume for the moment that we are still working on an 8x8 block. We want to send that block in the current frame. We have previous frames and previous blocks within the current frame to help us. There are 256^3^64 possible values for this block. If we are doing lossy coding, then not all possible values for the block can be sent. I won't get into details of lossiness, so just say there are a large number of possible values for the pixels of the block; we want to code an index to one of those values.

Each index should be sent with a different bit length based on its probability. Already we see a flaw with {movec-residual} coding - there are tons of {movec,residual} pairs that specify the same index. Of course in a flat area lots of movecs might point to the same pixels, but even if that is eliminated, you could go movec +1, residual +3, or movec +3, residual +1, and both ways get to +4. Redundant encoding = bit waste.

Now, this bit waste might not be critically bad with current simple {movec,residual} schemes - but it is a major encumbrance if we start looking at more sophisticated mocomp options. Say you want to be able to send movecs for shapes, eg. send edges and then send a movec on each side. There are lots of possibilities here - you might just send a movec per pixel (this seems absurdly expensive, but the motion fields are very smooth so should code well from neighbors), or you might send a polygon mesh to specify shapes. This should give you much better motion fields, and then the information in the motion fields can be used to predict the residuals as well. But the problem is there's too much redundancy. You have greatly expanded the number of ways to code the same output pixels.

We could consider more modest steps as well, such as sticking with block mocomp + residual, but expanding what we can do for "mocomp". For example, you could use two motion vectors + arbitrary linear combination of the source blocks. Or you could do trapesoidal texture-mapping style mocomp. Or mocomp with a vector and scale + rotation. None of these is very valuable, there are numerous problems : 1. too many ways to encode for the encoder to do thorough R/D analysis of all of them, 2. too much redundancy, 3. still not using the joint information across residual & motion.

In the end the problem is that you are using a 6-d value {velocity,pixel} to specify a 3-d color. What you really want is a 3-d coordinate which is not in pixel space, but rather is a sort of "screw" in motion/pixel space. That is, you want the adjacent coordinates in motion/pixel space to be the ones that are closest together in the 6-d space. So for example RGB {100,0,0} and {0,200,50} might be neighbors in motion/pixel space if they can be reached by small motion adjustments.

Okay this is turning into rambling, but another way of seeing it is like this : for each block, construct a custom basis transform. Don't send a separate movec or anything - the axes of the basis transform select pixels by stepping in movec and also residual.

ADDENDUM : let me try to be more clear by doing a simple example. Say you are trying to code a block of pixels which only has 10 possible values. You want to code with a standard motion then residual method. Say there are only 2 choices for motion. It is foolish to code all 10 possible values for both motion vectors! That is, currently all video coders do something like :

Code motion = [0 or 1]
Code residual = [0,1,2,3,4,5,6,7,8,9]

Or in tree form :

   0 - [0,1,2,3,4,5,6,7,8,9]
   1 - [0,1,2,3,4,5,6,7,8,9]

Clearly this is foolish. For each movec, you only need to code the residual which encodes that resulting pixel block the smallest under that movec. So you only need each output value to occur in one spot on the tree, eg.

   0 - [0,1,2,3,4]
   1 - [5,6,7,8,9]

or something. That is, it's foolish to have to ways to encode the residual to reach a certain target when there were already cheaper ways to reach that target in the movec coding portion. To minimize this defficiency, most current coders like H264 will code blocks by either putting almost all the bits in the movec and very few in the residual, or the other way (almost none in the movec and most in the residual). The loss occurs most when you have many bits in the motion and many in the residual, something like :

   0 - [0,1,2]
   1 - [3,4,5,6]
   2 - [7,8]
   3 - [9]

The other huge fundamental defficiency is that the probability modeling of movecs and residuals is done in a very primitive way based only on "they are usually small" assumptions. In particular, probability modeling of movecs needs to be done not just based on the vector, but on the content of what is pointed at. I mentioned long ago there is a lot of redundancy there when you have lots of movecs pointing at the same thing. Also, the residual coding should be aware of what was pointed to by the movec. For example if the movec pointed at a hard edge, then the residual will likely also have a similar hard edge because it's likely we missed by a little bit, so you could use a custom transform that handles that better. etc.

ADDENDUM 2 : there's something else very subtle going on that I haven't seen discussed much. The normal way of sending {movec,residual} is actually over-complete. Mostly that's bad, too much over-completeness means you are just wasting bits, but actually some amount of over-completeness here is a good thing. In particular for each frame we are sending a little bit of extra side information that is useful for *later* frames. That is, we are sending enough information to decode the current frame to some quality level, plus some extra that is not really worth it for just the current frame, but is worth it because it helps later frames.

The problem is that the amount of extra information we are sending is not well understood. That is, in the current {movec,residual} schemes we are just sending extra information without being in control and making a specific decision. We should be choosing how much extra information to send by evaluating whether it is actually helpful on future frames. Obviously the last frames of the video (or a sequence before a cut) you shouldn't send any extra information.

In the examples above I'm showing how to reduce the overcomplete information down to a minimal set, but sometimes you might not want to do that. As a very course example say the true motion at a given pixel is +3, movec=3 to get to final pixel=7 , but you can code the same result smaller by using movec=1 - deciding whether to send the true motion or not should be done based on whether it actually helps in the future, but more importantly the code stream could collapse {3,7} and {1,7} so there is no redundant way to code if the difference is not helpful.

This becomes more important of course if you have a more complex motion scheme, like per-pixel motion or trapezoidal motion or whatever.


05-20-10 - Some quick notes on H265

Since we're talking about VP8 I'd like to take this chance to briefly talk about some of the stuff coming in the future. H265 is being developed now, though it's still a long ways away. Basically at this point people are throwing lots of shit at the wall to see what sticks (and hope they get a patent in). It is interesting to see what kind of stuff we may have in the future. Almost none of it is really a big improvement like "duh we need to have that in our current stuff", it's mostly "do the same thing but use more CPU".

The best source I know of at the moment is H265.net , but you can also find lots of stuff just by searching for video on citeseer. (addendum : FTP to Dresen April meeting downloads ).

H265 is just another movec + residual coder, with block modes and quadtree-like partitions. I'll write another post about some ideas that are outside of this kind of scheme. Some quick notes on the kind of things we may see :

Super-resolution mocomp. There are some semi-realtime super-resolution filters being developed these days. Super-resolution lets you take a series of frames and great an output that's higher fidelity than any one source. In particular given a few assumptions about the underlying source material, it can reconstruct a good guess of the higher resolution original signal before sampling to the pixel grid. This lets you do finer subpel mocomp. Imagine for example that you have some black and white text that is slowly translating. On any one given frame there will be lots of gray edges due to the antialiased pixel sampling. Even if you perfectly know the subpixel location of that text on the target frame, you have no single reference frame to mocomp from. Instead you create super-resolution reference frame of the original signal and subpel mocomp from that.

Partitioned block transforms. One of the minor improvements in image coding lately, which is natural to move to video coding, is PBT with more flexible sizes. This means 8x16, 4x8, 4x32, whatever, lots of partition sizes, and having block transforms for that size of partitition. This lets the block transform match the data better. Which also leads us to -

Directional transforms and trained transforms. Another big step is not always using an X & Y oriented orthogonal DCT. You can get a big win by doing directional transforms. In particular, you find the directions of edges and construct a transform that has its bases aligned along those edges. This greatly reduces ringing and improves energy compaction. The problem is how do you signal the direction or the transform data? One option is to code the direction as extra side information, but that is probably prohibitive overhead. A better option is to look at the local pixels (you already have decoded neighbors) and run edge detection on them and find the local edge directions and use that to make your transform bases. Even more extreme would be to do a fully custom transform construction from local pixels (and the same neighborhood in the last frame), either using competition (select from a set of of transforms based on which one would have done best on those areas) or training (build the KLT for those areas). Custom trained bases are especially useful for "weird" images like Barb. These techniques can also be used for ...

Intra prediction. Like residual transforms, you want directional intra prediction that runs along the edges of your block, and ideally you don't want to send bits to flag that direction, rather figure it out from neighbors & previous frame (at least to condition your probabilities). Aside from finding direction, neighbors could be used to vote for or train fully custom intra predictors. One of the H265 proposals is basically GLICBAWLS applied to intra prediction - that is, train a local linear predictor by doing weighted LSQR on the neighborhood. There are some other equally insane intra prediction proposals - basically any texture synthesis or prediction paper over the last 10 years is fair game for insane H265 intra prediction proposals, so for example you have suggestions like Markov 2x2 block matching intra prediction which builds a context from the local pixel neighborhood and then predicts pixels that have been seen in similar contexts in the image so far.

Unblocking filters ("loop filtering" WTF retarded name is that) are an obvious area for improvement. The biggest area for improvement is deciding when a block edge has been created by the codec and when it is in the source data. This can actually usually be figured out if the unblocking filter has access to not just the pixels, but how they were coded and what they were mocomped from. In particular, it can see whether the code stream was *trying* to send a smooth curve and just couldn't because of quantization, or whether the code stream intentionally didn't send a smooth curve (eg. it could have but chose not to).

Subpel filters. There are a lot of proposal on improved sub-pixel filters. Obviously you can use more taps to get better (sharper) frequency response, and you can add 1/8 pel or finer. The more dramatic proposals are to go to non-separable filters, non-axis aligned filters (eg. oriented filters), and trained/adaptive filters, either with the filter coefficients transmitted per frame or again deduced from the previous frame. The issue is that what you have is just a pixel sampled aliased previous frame; in order to do sub-pel filtering you need to make some assumptions about the underlying image signal; eg. what is the energy in frequencies higher than the sampling limit? Different sub-pel filters correspond to different assumptions about the beyond-nyquist frequency content. As usual orienting filters along edges helps.

Improved entropy coding. So far as I can tell there's nothing too interesting here. Current video coders (H264) use entropy coders from the 1980's (very similar to the Q-coder stuff in JPEG-ari), and the proposals are to bring the entropy coding into the 1990's, on the level of ECECOW or EZDCT.


05-19-10 - Some quick notes on VP8

The VP8 release is exciting for what it might be in two years.

If it in fact becomes a clean open-source video standard with no major patent encumbrances, it might be well integrated in Firefox, Windows Media, etc. etc. - eg. we might actually have a video format that actually just WORKS! I don't even care if the quality/size is really competitive. How sweet would it be if there was a format that I knew I could download and it would just play back correctly and not give me any headaches. Right now that does not exist at all. (it's a sad fact that animated GIF is probably the most portable video format of the moment).

Now, you might well ask - why VP8 ? To that I have no good answer. VP8 seems like a messy cock-assed standard which has nothing in particular going for it. The entropy encoder in particular (much like H264) seems badly designed and inefficient. The basics are completely vanilla, in that it is block based, block modes, movecs, transforms, residual coding. In that sense it is just like MPEG1 or H265. That is a perfectly fine thing to do, and in fact it's what I've wound up doing, but you could pull a video standard like that out of your ass in about five minutes, there's no need to license code for that. If in fact VP8 does dodge all the existing patents then that would be a reason that it has value.

The VP8 code stream is probably pretty weak (I really don't know enough of the details to say for sure). However, what I have learned of late is that there is massive room for the encoder to make good output video even through a weak code stream. In fact I think a very good encoder could make good output from an MPEG2 level of code stream. Monty at Xiph has a nice page about work on Theora. There's nothing really cutting edge in there but it's nicely written and it's a good demonstration of the improvement you can get on a fixed standard code stream just with encoder improvements (and really their encoder is only up to "good but still basic" and not really into the realm of wicked-aggressive).

The only question we need to ask about the VP8 code stream is : is it flexible enough that it's possible to write a good encoder for it over the next few years? And it seems the answer is yes. (contrast this to VP3/Theora which has a fundamentally broken code stream which has made it very hard to write a good encoder).

ADDENDUM : this post by Greg Maxwell is pretty right on.

ADDENDUM 2 : Something major that's been missing from the web discussions and from the literature about video for a long time is the separation of code stream from encoder. The code stream basically gives the encoder a language and framework to work in. The things that Jason / Dark Shikary thinks are so great about x264 are almost entirely encoder-side things that could apply to almost any code stream (eg. "psy rdo" , "AQ", "mbtree", etc.). The literature doesn't discuss this much because they are trapped in the pit of PSNR comparisons, in which encoder side work is not that interesting. Encoder work for PSNR is not interesting because we generally know directly how to optimizing for MSE/SSD/L2 error - very simple ways like flat quantizers and DCT-space trellis quant, etc. What's more interesting is perceptual quality optimization in the encoder. In order to acheive good perceptual optimization, what you need is a good way to measure percpetual error (which we don't have), and the ability to try things in the code stream and see if they improve perceptual error (hard due to non-local effects), and a code stream that is flexible enough for the encoder to make choices that create different kinds of errors in the output. For example adding more block modes to your video coder with different types of coding is usually/often bad in a PSNR sense because all they do is create redundancy and take away code space from the normal modes, but it can be very good in a perceptual sense because it gives the encoder more choice.

ADDENDUM 3 : Case in point , I finally have noticed some x264 encoded videos showing up on the torrent sites. Well, about 90% of them don't play back on my media PC right. There's some glitching problem, or the audio & video get out of sync, or the framerate is off a tiny bit, or some shit and it's fucking annoying.

ADDENDUM 4 : I should be more clear - the most exciting thing about VP8 is that it (hopefully) provides an open patent-free standard that can then be played with and discussed openly by the development community. Hopefully encoders and decoder will also be open source and we will be able to talk about the techniques that go into them, and a whole new


05-13-10 - P4 with NiftyPerforce and no P4SCC

I'm trying using P4 in MSDev with NiftyPerforce and no P4SCC.

What this means is VC thinks you have no SCC connection at all, your files are just on your disk. You need to change the default NiftyPerforce settings so that it checks out files for you when you edit/save etc.

Advantages of NiftyPerforce without P4SCC :

1. Much faster startup / project load, because it doesn't go and check the status of everything in the project with P4.

2. No clusterfuck when you start unconnected. This is one the worst problems with P4SCC, for example if you want to work on some work projects but can't VPN for some reason, P4SCC will have a total shit fit about working disconnected. With the NiftyPerforce setup you just attrib your files and go on with your business.

3. No difficulties with changing binding/etc. This is another major disaster with P4SCC. It's rare, but if you change the P4 location of a project or change your mappings or if you already have some files added to P4 but not the project, all these things give MSdev a complete shit-fit. That all goes away.

Disadvantages of NiftyPerforce without P4SCC :

1. The first few keystrokes are lost. When you try to edit a checked-in file, you can just start typing and Nifty will go check it out, but until the checkout is done your keystrokes go to never-never land. Mild suckitude. Alternatively you could let MSDev pop up the dialog for "do you want to edit this read only file" which would make you more aware of what's going on but doesn't actually fix the issue.

2. No check marks and locks in project browser to let you know what's checked in / checked out. This is not a huge big deal, but it is a nice sanity check to make sure things are working the way they should be. Instead you have to keep an eye on your P4Win window which is a mild productivity hit.

One note about making the changeover : for existing projects that have P4SCC bindings, if you load them up in VC and tell VC to remove the binding, it also will be "helpful" and go attrib all your files to make them writeable (it also will be unhelpful and not check out your projects to make the change to not have them bound). Then NiftyPerforce won't work because your files are already writeable. The easiest way to do this right is to just open your vcproj's and sln's in a text editor and rip out all the binding bits manually.

I'm not sure yet whether the pros/cons are worth it. P4SCC actually is pretty nice once it's set up, though the ass-pain it gives when trying to make it do something it doesn't want to do (like source control something that's out of the binding root) is pretty severe.


I found the real pro & con of each way.

Pro P4SCC : You can just start editting files in VC and not worry about it. It auto-checks out files from P4 and you don't lose key presses. The most important case here is that it correctly handles files that you have not got the latest revision of - it will pop up "edit current or sync first" in that case. The best way to use Nifty seems to be Jim's suggestion - put checkout on Save, do not checkout on Edit, and make files read-only editable in memory. That works great if you are a single dev but is not super awesome in an actual shared environment with heavy contention.

Pro NiftyP4 : When you're working from home over an unreliable VPN, P4SCC is just unworkable. If you lose connection it basically hangs MSDev. This is so bad that it pretty much completely dooms P4SCC. ARG actually I take that back a bit, NiftyP4 also hangs MSDev when you lose connection, though it's not nearly as bad.


05-12-10 - P4 By Dir

(ADDENDUM : see comments, I am dumb).

I mentioned this before :

(Currently that's not a great option for me because I talk to both my home P4 server and my work P4 server, and P4 stupidly does not have a way to set the server by local directory. That is, if I'm working on stuff in c:\home I want to use one env spec and if I'm in c:\work, use another env spec. This fucks up things like NiftyPerforce and p4.exe because they just use a global environment setting for server, so if I have some work code and some home code open at the same time they shit their pants. I think that I'll make my own replacement p4.exe that does this the right way at some point; I guess the right way is probably to do something like CVS/SVN does and have a config file in dirs, and walk up the dir tree and take the first config you find).

But I'm having second thoughts, because putting little config shitlets in my source dirs is one of the things I hate about CVS. Granted it would be much better in this case - I would only need a handful of them in my top level dirs, but another disadvantage is my p4bydir app would need to scan up the dir tree all the time to find config files.

And there's a better way. The thing is, the P4 Client specs already have the information of what dirs on my local machine go with what depot mappings. The problem is the client spec is not actually associated with a server. What you need is a "port client user" setting. These are stored as favorites in P4Win, but there is no authoritative list of the valid/good "port client user" setups on a machine.

So, my new idea is that I store my own config file somewhere that lists the valid "port client user" sets that I want to consider in p4bydir. I load that and then grab all the client specs. I use the client specs to identify what dirs to map to where, and the "port client user" settings to tell what p4 environment to set for that dir.

I then replace the global p4.exe with my own p4bydir so that all apps (like NiftyPerforce) will automatically talk to the right connection whenever they do a p4 on a file.

05-12-10 - Cleartype

Since I ranted about Cleartype I thought I'd go into a bit more detail. this article on Cleartype in Win7 is interesting, though also willfully retarded.

Another research question we�ve asked ourselves is why do some people prefer bi-level rendering over ClearType? Is it due to hardware issues or is there some other attribute that we don�t understand about visual systems that is playing a role. This is an issue that has piqued our curiosity for some time. Our first attempt at looking further into this involved doing an informal and small-scale preference study in a community center near Microsoft.

Wait, this is a research question ? Gee, why would I prefer perfect black and white raster fonts to smudged and color-fringed cleartype. I just can't imagine it! Better do some community user testing...

1. 35 participants. 2. Comments for bi-level rendering: Washed out; jiggly; sketchy; if this were a printer, I�d say it needed a new cartridge; fading out � esp. the numbers, I have to squint to read this, is it my glasses or it is me?; I can�t focus on this; broken up; have to strain to read; jointed. 3. Comments for ClearType: More defined, Looks bold (several times), looks darker, clearer (4 times), looks like it�s a better computer screen (user suggested he�d pay $500 more for the better screen on a $2000 laptop), sort of more blue, solid, much easier to read (3 times), clean, crisp, I like it, shows up better, and my favorite: from an elderly woman who was rather put out that the question wasn�t harder: this seems so obvious (said with a sneer.)

Oh my god, LOL, holy crap. They are obviously comparing Cleartyped anti-aliased fonts to black-and-white rendered TrueType fonts, NOT to raster fonts. They're probably doing big fonts on a high DPI screen too. Try it again on a 24" LCD with an 8 point font please, and compare something that has an unhinted TrueType and an actual hand-crafted raster font. Jesus. Oh, but I must be wrong because the community survey says 94% prefer cleartype!

Anyway, as usual the annoying thing is that in pushing their fuck-tard agenda, they refuse to acknowledge the actual pros and cons of each method and give you the controls you really want. What I would like is a setting to make Windows always prefer bitmap fonts when they exist, but use ClearType if it is actually drawing anti-aliased fonts. Even then I still might not use it because I fucking hate those color fringes, but it would be way more reasonable. Beyond that obviously you could want even more control like switching preferrence for cleartype vs. bitmap per font, or turning on and off hinting per font or per app, etc. but just some more reasonable global default would get you 90% of the way there. I would want something like "always prefer raster font for sizes <= 14 point" or something like that.

Text editors are a simple case because you just to let the user set the font and get what they want, and it doesn't matter what size the text is because it's not layed out. PDF's and such I guess you go ahead and use TT all the time. The web is a weird hybrid which is semi-formatted. The problem with the web is that it doesn't tell you when formatting is important or not important. I'd like to override the basic firefox font to be my own choice nice bitmap font *when formatting is not important* (eg. in blocks of text like I make). But if you do that globally it hoses the layout of some pages. And then other pages will manually request fonts which are blurry bollocks.

CodeProject has a nice font survey with Cleartype/no-Cleartype screen caps.

GDI++ is an interesting hack to GDI32.dll to replace the font rendering.

Entropy overload has some decent hinted TTF fonts for programmers you can use in VS 2010.

Electronic Dissonance has the real awesome solution : sneak raster fonts into asian fonts so that VS 2010 / WPF will use them. This is money if you use VS 2010.


05-11-10 - Note from the Mail Man

For reference, this is the ferocious beast that is terrorizing the poor mailman :

LOL. It would actually be pretty damn sweet if I could stop getting mail. Don't think the duplex neighbor would like that though.

05-11-10 - Some New Cblib Apps

Coded up some new goodies for myself today and released them in a new cblib and chuksh .

RunOrActivate : useful with a hot key program, or from the CLI. Use RunOrActivate [program name]. If a running process of that program exists, it will be activated and made foreground. If not, a new instance is started. Similar to the Windows built-in "shortcut key" functionality but not horribly broken like that is.

(BTW for those that don't know, Windows "shortcut keys" have had huge bugs ever since Win 95 ; they sometimes work great, basically doing RunOrActivate, but they use some weird mechanism which causes them to not work right with some apps (maybe they use DDE?), they also have bizarre latency semi-randomly, usually they launch the app instantly but occasionally they just decide to wait for 10 seconds or so).

RunOrActivate also has a bonus feature : if multiple instances of that process are running it will cycle you between them. So for example my Win-E now starts an explorer, goes to existing one if there was one, and if there were a few it cycles between explorers. Very nice. Also works with TCC windows and Firefox Windows. This actually solves a long-time useability problem I've had with shortcut keys that I never thought about fixing before, so huzzah.

WinMove : I've been using this forever, lets you move and resize the active window in various ways, either by manual coordinate or with some shorthands for "left half" etc. Anyway the new bit is I just added an option for "all windows" so that I can reproduce the Win-M minimize all behavior and Win-Shift-M restore all.

I think that gives me all Win-Key functions I actually want.

ADDENDUM : One slightly fiddly bit is the question of *which* window of a process to activate in RunOrActivate. Windows refuses to give you any concept of the "primary" window of a process, simply sticking to the assertion that processes can have many windows. However we all know this is bullshit because Alt-Tab picks out an isolated set of "primary" windows to switch between. So how do you get the list of alt-tab windows? You don't. It's "undefined", so you have to make it up somehow. Raymond Chen describes the algorithm used in one version of Windows.


05-09-10 - Some Win7 Shite

Perforce Server was being a pain in my ass to start up because the fucking P4S service doesn't get my P4ROOT environment variable. Rather than try to figure out the fucking Win 7 per-user environment variable shite, the easy solution is just to move your P4S.exe into your P4ROOT directory, that way when it sees no P4ROOT setting it will just use current directory.

New P4 Installs don't include P4Win , but you can just copy it from your old install and keep using it.

This is not a Win7 problem so much as a "newer MS systems" problem, but non-antialiased / non-cleartype text rendering is getting nerfed. Old stuff that uses GDI will still render good old bitmap fonts fine, but newer stuff that uses WPF has NO BITMAP FONT SUPPORT. That is, they are always using antialiasing, which is totally inappropriate for small text (especially without cleartype). (For example MSVC 2010 has no bitmap font support (* yes I know there are some workarounds for this)).

This is a huge fucking LOSE for serious developers. MS used to actually have better small text than Apple, Apple always did way better at smooth large blurry WYSIWYG text shit. Now MS is just worse all around because they have intentionally nerfed the thing they were winning at. I'm very disappointed because I always run no-cleartype, no-antialias because small bitmap fonts are so much better. A human font craftsman carefully choosing which pixels should be on or off is so much better than some fucking algorithm trying to approximate a smooth curve in 3 pixels and instead giving me fucking blue and red fringes.

Obviously anti-aliased text is the *future* of text rendering, but that future is still pretty far away. My 24" 1920x1200 that I like to work on is 94 dpi (a 30" 2560x2600 is 100 dpi, almost the same). My 17" lappy at 1920x1200 has some of the highest pixel density that you can get for a reasonable price, it's pretty awesome for photos, but it's still only 133 dpi which is shit for text (*). To actually do good looking antialiased text you need at least 200 dpi, and 300 would be better. This is 5-10 years away for consumer price points. (In fact the lappy screen is the unfortunate uncanny valley; the 24" at 1920x1200 is the perfect res where non-atialiased stuff is the right size on screen and has the right amount of detail. If you just go to slightly higher dpi, like 133, then everything is too small. If you then scale it up in software to make it the right size for the eye, you don't actually have enough pixels to do that scale up. The problem is that until you get above 200 dpi where you can do arbitrary scaling of GUI elements, the physical size of the pixel is important, and the 100 dpi pixel is just about perfect). (* = shit for anti-aliased text, obviously great for raster fonts at 14 pels or so).

( ADDENDUM : Urg I keep trying to turn on Cleartype and be okay with it. No no no it's not okay. They should call it "Clear Chromatic Abberation" or "Clearly the Developers who thing this is okay are colorblind". Do they think our eyes only see luma !? WTF !? Introducing colors into my black and white text is just such a huge visual artifact that no amount of improvement to the curve shapes can make up for that. )

It's actually pretty sweet right now living in a world where our CPU's are nice and multi-core, but most apps are still single core. It means I can control the load on my machine myself, which is damn nice. For example I can run 4 apps and know that they will all be pretty nice and snappy. These days I am frequently keeping 3 copies of my video test app running various tests all the time, and since it's single core I know I have one free core to still fuck around on the computer and it's full speed. The sad thing is that once apps actually all go multi-core this is going to go away, because when you actually have to share cores, Windows goes to shit.

Christ why is the registry still so fucking broken? 1. If you are a developer, please please make your apps not use the registry. Put config files in the same dir as your .exe. 2. The Registry is just a bunch of text strings, why is it not fucking version controlled? I want a log of the changes and I want to know what app made the change when. WTF.

The only decent way to get environment variables set is with TCC "set /S" or "set /U".

"C:\Program Files (x86)" is a huge fucking annoyance. Not only does it break by muscle memory and break a ton of batch files I had that looked for program files, but now I have a fucking quandary every time I'm trying to hunt down a program.. err is it in x86 or not? I really don't like that decision. I understand it's needed for if you actually have an x86 and x64 version of the same app installed, but that is very rare, and you should have only bifurcated paths on apps that actually do have a dual install. (also because lots of apps hard code to c:\program files , they have a horrible hack where they let 32 bit apps think they are actually in c:\program files when they are in "C:\Program Files (x86)"). Blurg.

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\PrefetchParameters]


[HKEY_CURRENT_USER\Control Panel\Desktop]

Some links :

Types - Vista/Win7 has borked the "File Associations" setup. You need a 3rd party app like Types now to configure your file types (eg. to change default icons).

Shark007.net - Windows 7 Codecs - WMP12 Codecs - seem to work.

Pismo Technic Inc. - Pismo File Mount - nicest ISO mounter I've found (Daemon tools feels like it's made out of spit and straw).

Hot Key Plus by Brian Apps - ancient app that still works and I like because it's super simple.

Change Windows 7 Default Folder Icon - Windows 7 Forums ; presumably you have the Preview stuff for Folders turned off, so now make the icon not so ugly.

- how to move your perforce depot Annoyingly I used a different machine name for new lappy and thus a different clientview, so MSVC P4SCC fails to make the connection and wants to rebind every project. The easiest way to fix this is just to not use P4SCC and kill all your bindings and just use NiftyPerforce without P4SCC.

(Currently that's not a great option for me because I talk to both my home P4 server and my work P4 server, and P4 stupidly does not have a way to set the server by local directory. That is, if I'm working on stuff in c:\home I want to use one env spec and if I'm in c:\work, use another env spec. This fucks up things like NiftyPerforce and p4.exe because they just use a global environment setting for server, so if I have some work code and some home code open at the same time they shit their pants. I think that I'll make my own replacement p4.exe that does this the right way at some point; I guess the right way is probably to do something like CVS/SVN does and have a config file in dirs, and walk up the dir tree and take the first config you find).

allSnap make all windows snap - AllSnap for x64/Win7 seems to be broken, but the old 32 bit one seems to work just fine still. (ADDENDUM : nope, old allsnap randomly crashes in Win 7, do not use)

KeyTweak Homepage - I used KeyTweak to remap my Caps Lock to Alt.

Firefox addons :

PDF Download
Stop Autoplay
Adblock Plus

ADDENDUM : I found the last few guys who are ticking my disk :

One that you obviously want to disable is Windows Media Player Media sharing serivcee : Fix wmpnetwk.exe In Windows 7 . It just constantly scans your media dirs for shit to serve. Fuck you.

The next big culprit is the Windows Reliability stuff. Go ahead and disable the RAC scheduled task, but that's not the real problem. The nasty one is the "last alive stamp" which windows writes once a minute by default. This is to help diagnose crashes. You could change TimeStampInterval to 30 or so to make it once every 30 minutes, but I set it to zero to disable it. See :
Shutdown Event Tracker Tools and Settings System Reliability
How to disable hard disk thrashing with Vista - Page 7

And this is a decent summary/repeat of what I've said : How to greatly reduce harddisk grinding noises in Vista .


05-07-10 - New Lappy

I got my new lappy, a Dell Precision M6500 big 17" behemoth. I'm setting up Win 7 today which is mostly going smoothly.

First impressions of the M6500 : build quality is very nice, lightweight all metal like the Lattitude series. The screen is pretty great, 1920x1200 matte, pretty bright, decent contrast, the only complaint is that it's not super viewing-angle-independent. It is a very nice bonus that the lappy LCD res is the same res that I run my 24" external, so I can switch between lappy LCD and external LCD and it doesn't hose my layouts (however, on the minus side of that equation, because the lappy LCD is so pixel dense, I have to run in large fonts on it, and switch back to small fonts for the external LCD, boo). The internal peripherals and the case are well designed for popping things in and out. It takes two 2.5" thin lappy disks and has 4 RAM slots. The disks are easy to get in and out, and 2 of the RAMs are easy, but the other two are under the keyboard and take a bit more work. The thing is pretty amazingly quiet, the only auditory annoyance is that the fan oscillates up and down too much, I wish I could program the fan to just be on all the time in low speed instead of jumping up and down. The keyboard action is very nice, like the Lattitude series, but it is the standard fucking retarded 17" lappy thing of just using a normal 15" lappy keyboard and sticking it in the larger case. Jebus can't you people actually make a custom keyboard for the 17" form factor that takes advantage of the extra space to give me better layout!? Come on! Anyway, since I don't really use lappy as a lappy, this is mostly academic.

Win 7 reinstall went very smoothly - it autodetects all the hardware well enough that you can at least boot up. You then need to install a few things - the GPU drivers, the USB3 driver, the Touchpad driver. That's about it. Smoothest Windows install I've ever had, by far.

I had one minor annoyance during install - turns out the lappy came with 1066 Mhz RAM and I bought 1333 Mhz RAM to add. Well, when you do that the lappy boots right up and doesn't complain at all, and runs through memory check (and even the Windows heavy duty memory check) and doesn't find any errors. But it will in fact give you random RAM failures and blue-screen you. So I had to reformat the disk and pull the 1066 RAM and reinstall windows. Then I discovered that the lappy doesn't work with memory only in the C/D slots and not the A/B slots. So I pulled it apart again, and then finally got going. Sigh.

Make sure you switch the BIOS to AHCI, not Intel's fucking raid thing (*before* the Windows reinstall). Currently I am running the MS AHCI driver which supports TRIM. Unclear whether I will ever install the Intel driver.

I put in an Intel X25-M SSD. Holy shit this thing is so good. If you are a serious computer and you are not on an SSD - GET ONE RIGHT NOW.

So far I have disabled : Indexing, ReadyBoot, Prefetch, Superfetch, Indexing, Scheduled Defrag, Defender, Updater, System Restore, Page File, Hibernate, Wifi polling. That has cleaned things up nicely, but some fucking Win7 service is still pinging my disk every 10 seconds or so and I haven't tracked down what it is yet. (fuckers). When I am doing nothing on my computer, it should go to 0% CPU and never ever touch my disk.

Win7 is mostly good so far. As usual there's the annoyance that MS loves to randomly rename things and move them around. Perhaps the worst thing so far is that fucking Backspace is no longer "go up a dir" in Explorer. Yes, I know Alt-Up does that, but it should be fucking backspace god dammit! This was widely complained about during Beta, but like fuckers they refused to provide a config switch to let me have my damn old backspace behavior. I found an AHK solution to map Backspace to Alt-Up , but AHK is a fucking bloated beast of flakey crapware so I'm hoping to find another solution to this (probably just write my own).

The Win-# hot keys are almost a good thing, except that using a number which is their position in the task bar is fucking awful. You should let me assign my own hotkeys, that way I can use Win-V for VisualStudio and Win-F for Firefox or whatever, so that I can actually memorize the keypresses and be fast instead of having to count each time.

Finally, it annoys me to all fuck that MS refuses to give me the one feature that would make UAC usable : just a button to always allow promotion of a given app. When it says "do you want to allow?" it should be "Yes/No/Always" , not just "Yes/No". As is, the fucking Shareware "ProcessGuard" is much much better than UAC because with ProcessGuard you can actually say "always allow this app" or "always forbid this app" blah blah. It's so fucking obvious and such a major usability fuckup. The result is that 99% of power users just turn off UAC, whereas if you had a "always allow for this app" I would totally leave UAC on. I dunno, it just boggles my mind, it would be so easy to make UAC useful and functional. You just have to understand how computing works. I want to have a host of programs installed on my computer which I have marked trusted and let them do whatever they want. Then I want to be able to download random junk from the internet and run them in safe mode where they are forbidden from doing various things like installing stuff in startup or fucking with my windows dir. WTF, why do I not have this !?

Oh, I guess I'm going to try going to MSVC 2010. Since this is supposed to be my machine for the next 10 years, I'd rather just eat the pain of getting on new stuff all at once and then hopefully not have to do it ever again. We'll see about that...

ADDENDUM : Urg. I take it all back, Win 7 is a FUCKING ABORTION (later addendum : that might have been a slight exaggeration). I am in a constant hell of fucking file ownerships and user privilidges and shit. Here's just a minor sample :

You Map a network drive. Everything seems fine and dandy. Now you open a command prompt in "run as administator" mode. Your mapped drive is not there !? WTF !? Oh, brilliant fuckers that they are, the drive mapping is *PER USER* so the fucking administrator doesn't see it, so you have to remap it for the administrator account.

I install Perforce. Of course Perforce Server installs as "owned by" the Administrator account. So when I am logged in as my account if I try to do anything to those files it says "fuck you I'm going to pop up annoying boxes".

I'm trying to copy files from my old lappy's disk to my new one. Of course now Win 7 pays attention to the NTSF security tags and it sees those files are owned by some other user, so I get a bunch of random "Access Denied" messages with no explanation. Of course I can just go and do a fucking recursive "take ownership" of all those files, but that's really just a hack fix and if I plug that disk in somewhere else I'll have to do it again.

Jesus christ. Somebody give me an operating system where my files are just fucking files without some security or owner bullshit and I can run whatever I want and it just fucking works. I want Windows 95 please.

It's easy enought to disable UAC popups, but that's only the fucking tip of the iceberg. UAC is fucking you up back and forth all the time. Win 7 UAC also does some super nasty shit that most people don't know about, in that it remaps a bunch of virtual directory names *per user* (and also for 32 bit vs 64 bit and other compatibility modes), so depending on how your program is run it can see very different things on the same box. DO NOT WANT.

old rants