12/22/2012

12-22-12 - Data Considered Harmful

I believe that the modern trend of doing some very superficial data analysis to prove a point, or support your argument is extremely harmful. It leads to a false impression of a scientific basis to arguments that is in fact spurious.

I've been thinking about this for a while, but this washingtonpost blog about the correlation of video games and gun violence recently popped into my blog feed, so I'll use it as an example.

The Washington Post blog leads you to believe that the data shows an unequivocal lack of correlation between videogames and gun violence. That's nonsense. It only takes one glance at the chart to see that the data is completely dominated by other factors, like probably most strongly the gun ownership rate. You can't possibly try to find the effect of a minor contributing factor without normalizing for other factors, which most of these "analyses" fail to do, which makes them totally bogus. Furthermore, as usual, you would need a much larger sample size to have any confidence in the data, and you'd have to question the selection of data that was done. Also the entire thing being charted is wrong; it shouldn't be video game spending per capita, it should be video games played per capita (especially with China on there), and it shouldn't be gun-related murders, it should be all murders (because the fraction of murders that is gun related varies strongly by gun control laws, while the all murders rate varies more directly with the level of economic and social development in a country).

(Using data and charts and graphs has been a very popular way to respond to the recent shootings. Every single one that I've seen is complete nonsense. People just want to make a point that they've previously decided, so they trot out some data to "prove it" or make it "non-partisan" as if their bogus charts somehow make it "factual". It's pathetic. Here's a good example of using tons of data to show absolutely nothing . If you want to make an editorial point, just write your opinion, don't trot out bogus charts to "back it up". )

It's extremely popular these days to "prove" that some intuition is wrong by finding some data that shows a reverse correlation. (blame Freakonomics, among other things). You get lots of this in the smarmy TED talks - "you may expect that stabbing yourself in the eye with a pencil is harmful, but in fact these studies show that stabbing yourself in the eye is correlated to longer life expectancy!" (and then everyone claps). The problem with all this cute semi-intellectualism is that it's very often just wrong.

Aside from just poor data analysis, one of the major flaws with this kind of reasoning is the assumption that you are measuring all the inputs and all the outputs.

An obvious case is education, where you get all kinds of bogus studies that show such-and-such program "improves learning". Well, how did you actually measure learning? Obviously something like cutting music programs out of schools "improves learning" if you measure "learning" in a myopic way that doesn't include the benefits of music. And of course you must also ask what else was changed between the measured kids and the control (selection bias, novelty effect, etc; essentially all the studies on charter schools are total nonsense since any selection of students and new environment will produce a short term improvement).

I believe that choosing the wrong inputs and outputs is even worse than the poor data analysis, because it can be so hidden. Quite often there are some huge (bogus) logical leaps where the article will measure some narrow output and then proceed to talk about it as if it was just "better". Even when your data analysis was correct, you did not show it was better, you showed that one specific narrow output that you chose to measure improved, and you have to be very careful to not start using more general words.

(one of the great classic "wrong output" mistakes is measuring GDP to decide if a government financial policy was successful; this is one of those cases where economists have in fact done very sophisticated data analysis, but with a misleadingly narrow output)

Being repetitive : it's okay if you are actually very specific and careful not to extrapolate. eg. if you say "lowering interest rates increased GDP" and you are careful not to ever imply that "increased GDP" necessarily means "was good for the economy" (or that "was good for the economy" meant "was good for the population"); the problem is that people are sloppy, in their analysis and their implications and their reading, so it becomes "lowering interest rates improved the well-being of the population" and that becomes accepted wisdom.

Of course you can transparently see the vapidity of most of these analyses because they don't propagate error bars. If they actually took the errors of the measurement, corrected for the error of the sample size, propagated it through the correlation calculation and gave a confidence at the end, you would see things like "we measured a 5% improvement (+- 50%)" , which is no data at all.

I saw Bryan Cox on QI recently, and there was some point about the US government testing whether heavy doses of LSD helped schizophrenics or not. Everyone was aghast but Bryan popped up with "actually I support data-based medicine; if it had been shown to help then I would be for that therapy". Now obviously this was a jokey context so I'll cut Cox some slack, but it does in fact reflect a very commonly held belief these days (that we should trust the data more than our common sense that it's a terrible idea). And it's just obviously wrong on the face of it. If the study had shown it to help, then obviously something was wrong with the study. Medical studies are almost always so flawed that it's hard to believe any of them. Selection bias is huge, novelty and placebo effect are huge; but even if you really have controlled for all that, the other big failure is that they are too short term, and the "output" is much too narrow. You may have improved the thing you were measuring for, but done lots of other harm that you didn't see. Perhaps they did measure a decrease in certain schizophrenia symptoms (but psychotic lapses and suicides were way up; oops that wasn't part of the output we measured).

Exercise/dieting and child-rearing are two major topics where you are just bombarded with nonsense pseudo-science "correlations" all the time.

Of course political/economic charts are useless and misleading. A classic falsehood that gets trotted out regularly is the charts showing "the economy does better under democrats" ; for one thing the sample size is just so small that it could be totally random ; for another the economy is more effected by the previous president than the current ; and in almost every case huge external factors are massively more important (what's the Fed rate, did Al Gore recently invent the internet, are we in a war or an oil crisis, etc.). People love to show that chart but it is *pure garbage* , it contains zero information. Similarly the charts about how the economy does right after a tax raise or decrease; again there are so many confounding factors and the sample size is so tiny, but more importantly tax raises tend to happen when government receipts are low (eg. economic growth is already slowing), while tax cuts tend to happen in flush times, so saying "tax cuts lead to growth" is really saying "growth leads to growth".

What I'm trying to get at in this post is not the ridiculous lack of science in all these studies and "facts", but the way that the popular press (and the semi-intellectual world of blogs and talks and magazines) use charts and graphs to present "data" to legitimize the bogus point.

I believe that any time you see a chart or graph in the popular press you should look away.

I know they are seductive and fun, and they give you a vapid conversation piece ("did you know that christmas lights are correlated with impotence?") but they in fact poison the brain with falsehoods.

Finally, back to the issue of video games and violence. I believe it is obvious on the face of it that video games contribute to violence. Of course they do. Especially at a young age, if a kid grows up shooting virtual men in the face it has to have some effect (especially on people who are already mentally unstable). Is it a big factor? Probably not; by far the biggest factor in violence is poverty, then government instability and human rights, then the gun ownership rate, the ease of gun purchasing, etc. I suspect that the general gun glorification in America is a much bigger effect, as is growing up in a home where your parents had guns, going to the shooting range as a child, rappers glorifying violence, movies and TV. Somewhere after all that, I'm sure video games contribute. The only thing we can actually say scientifically is that the effect is very small and almost impossible to measure due to the presence of much larger and highly uncertain factors.

(of course we should also recognize that these kind of crazy school shooting events are completely different than ordinary violence, and statistically are a drop in the bucket. I suspect the rare mass-murder psycho killer things are more related to a country's mental health system than anything else. Pulling out the total murder numbers as a response to these rare psychotic events is another example of using the wrong data and then glossing over the illogical jump.)

I think in almost all cases if you don't play pretend with data and just go and sit quietly and think about the problem and tap into your own brain, you will come to better conclusions.

12/21/2012

12-21-12 - File Name Namespaces on Windows

A little bit fast and loose but trying to summarize some insanity from a practical point of view.

Windows has various "namespaces" or classes of file names :

1. DOS Names :

"c:\blah" and such.

Max path of 260 including drive and trailing null. Different cases refer to the same file, *however* different unicode encodings of the same character do *NOT* refer to the same file (eg. things like "accented e" and "e + accent previous char" are different files). See previous posts about code pages and general unicode disaster on Windows.

I'm going to ignore the 8.3 legacy junk, though it still has some funny lingering effects on even "long" DOS names. (for example, the longest path name length allowed is 244 characters, because they require room for an 8.3 name after the longest path).

2. Win32 Names :

This includes all DOS names plus all network paths like "\\server\blah".

The Win32 APIs can also take the "\\?\" names, which are sort of a way of peeking into the lower-level NT names.

Many people incorrectly think the big difference with the "\\?\" names is that the length can be much longer (32768 instead of 260), but IMO the bigger difference is that the name that follows is treated as raw characters. That is, you can have "/" or "." or ".." or whatever in the name - they do not get any processing. Very scary. I've seen lots of code that blindly assumes it can add or remove "\\?\" with impunity - that is not true!


"\\?\c:\" is a local path

"\\?\UNC\server\blah" is a network name like "\\server\blah"

Assuming you have your drives shared, you can get to yourself as "\\localhost\c$\"

I think the "\\?\" namespace is totally insane and using it is a Very Bad Idea. The vast majority of apps will do the wrong thing when given it, and many will crash.

3. NT names :

Win32 is built on "ntdll" which internally uses another style of name. They start with "\" and then refer to the drivers used to access them, like :

"\Device\HarddiskVolume1\devel\projects\oodle"

In the NT namespace network shares are named :


Pre-Vista :

\Device\LanmanRedirector\<some per-user stuff>\server\share

Vista+ : Lanman way and also :

\Device\Mup\Server\share

And the NT namespace has a symbolic link to the entire Win32 namespace under "\Global??\" , so


"\Global??\c:\whatever"

is also a valid NT name, (and "\??\" is sometimes valid as a short version of "\Global??\").

What fun.

12-21-12 - File Handle to File Name on Windows

There are a lot of posts about this floating around, most not quite right. Trying to sort it out once and for all. Note that in all cases I want to resolve back to a "final" name (that is, remove symlinks, substs, net uses, etc.) I do not believe that the methods I present here guarantee a "canonical" name, eg. a name that's always the same if it refers to the same file - that would be a nice extra step to have.

This post will be code-heavy and the code will be ugly. This code is all sloppy about buffer sizes and string over-runs and such, so DO NOT copy-paste it into production unless you want security holes. (a particular nasty point to be wary of is that many of the APIs differ in whether they take a buffer size in bytes or chars, which with unicode is different)

We're gonna use these helpers to call into windows dlls :


template <typename t_func_type>
t_func_type GetWindowsImport( t_func_type * pFunc , const char * funcName, const char * libName , bool dothrow)
{
    if ( *pFunc == 0 )
    {
        HMODULE m = GetModuleHandle(libName);
        if ( m == 0 ) m = LoadLibrary(libName); // adds extension for you
        ASSERT_RELEASE( m != 0 );
        t_func_type f = (t_func_type) GetProcAddress( m, funcName );
        if ( f == 0 && dothrow )
        {
            throw funcName;
        }
        *pFunc = f;
    }
    return (*pFunc); 
}

// GET_IMPORT can return NULL
#define GET_IMPORT(lib,name) (GetWindowsImport(&STRING_JOIN(fp_,name),STRINGIZE(name),lib,false))

// CALL_IMPORT throws if not found
#define CALL_IMPORT(lib,name) (*GetWindowsImport(&STRING_JOIN(fp_,name),STRINGIZE(name),lib,true))
#define CALL_KERNEL32(name) CALL_IMPORT("kernel32",name)
#define CALL_NT(name) CALL_IMPORT("ntdll",name)

I also make use of the cblib strlen, strcpy, etc. on wchars. Their implementation is obvious.

Also, for reference, to open a file handle just to read its attributes (to map its name) you use :


    HANDLE f = CreateFile(from,
        FILE_READ_ATTRIBUTES |
        STANDARD_RIGHTS_READ
        ,FILE_SHARE_READ,0,OPEN_EXISTING,FILE_FLAG_BACKUP_SEMANTICS,0);

(also works on directories).

Okay now : How to get a final path name from a file handle :

1. On Vista+ , just use GetFinalPathNameByHandle.

GetFinalPathNameByHandle gives you back a "\\?\" prefixed path, or "\\?\UNC\" for network shares.

2. Pre-Vista, lots of people recommend mem-mapping the file and then using GetMappedFileName.

This is a bad suggestion. It doesn't work on directories. It requires that you actually have the file open for read, which is of course impossible in some scenarios. It's just generally a non-robust way to get a file name from a handle.

For the record, here is the code from MSDN to get a file name from handle using GetMappedFileName. Note that GetMappedFileName gives you back an NT-namespace name, and I have factored out the bit to convert that to Win32 into MapNtDriveName, which we'll come back to later.



BOOL GetFileNameFromHandleW_Map(HANDLE hFile,wchar_t * pszFilename,int pszFilenameSize)
{
    BOOL bSuccess = FALSE;
    HANDLE hFileMap;

    pszFilename[0] = 0;

    // Get the file size.
    DWORD dwFileSizeHi = 0;
    DWORD dwFileSizeLo = GetFileSize(hFile, &dwFileSizeHi); 

    if( dwFileSizeLo == 0 && dwFileSizeHi == 0 )
    {
        lprintf(("Cannot map a file with a length of zero.\n"));
        return FALSE;
    }

    // Create a file mapping object.
    hFileMap = CreateFileMapping(hFile, 
                    NULL, 
                    PAGE_READONLY,
                    0, 
                    1,
                    NULL);

    if (hFileMap) 
    {
        // Create a file mapping to get the file name.
        void* pMem = MapViewOfFile(hFileMap, FILE_MAP_READ, 0, 0, 1);

        if (pMem) 
        {
            if (GetMappedFileNameW(GetCurrentProcess(), 
                                 pMem, 
                                 pszFilename,
                                 MAX_PATH)) 
            {
                //pszFilename is an NT-space name :
                //pszFilename = "\Device\HarddiskVolume4\devel\projects\oodle\z.bat"

                wchar_t temp[2048];
                strcpy(temp,pszFilename);
                MapNtDriveName(temp,pszFilename);


            }
            bSuccess = TRUE;
            UnmapViewOfFile(pMem);
        } 

        CloseHandle(hFileMap);
    }
    else
    {
        return FALSE;
    }

    return(bSuccess);
}

3. There's a more direct way to get the name from file handle : NtQueryObject.

NtQueryObject gives you the name of any handle. If it's a file handle, you get the file name. This name is an NT namespace name, so you have to map it down of course.

The core code is :


typedef enum _OBJECT_INFORMATION_CLASS {

ObjectBasicInformation, ObjectNameInformation, ObjectTypeInformation, ObjectAllInformation, ObjectDataInformation

} OBJECT_INFORMATION_CLASS, *POBJECT_INFORMATION_CLASS;

typedef struct _UNICODE_STRING {
  USHORT Length;
  USHORT MaximumLength;
  PWSTR  Buffer;
} UNICODE_STRING, *PUNICODE_STRING;

typedef struct _OBJECT_NAME_INFORMATION {

    UNICODE_STRING Name;
    WCHAR NameBuffer[1];

} OBJECT_NAME_INFORMATION, *POBJECT_NAME_INFORMATION;


NTSTATUS
(NTAPI *
fp_NtQueryObject)(
IN HANDLE ObjectHandle, IN OBJECT_INFORMATION_CLASS ObjectInformationClass, OUT PVOID ObjectInformation, IN ULONG Length, OUT PULONG ResultLength )
= 0;


{
    char infobuf[4096];
    ULONG ResultLength = 0;

    CALL_NT(NtQueryObject)(f,
        ObjectNameInformation,
        infobuf,
        sizeof(infobuf),
        &ResultLength);

    OBJECT_NAME_INFORMATION * pinfo = (OBJECT_NAME_INFORMATION *) infobuf;

    wchar_t * ps = pinfo->NameBuffer;
    // info->Name.Length is in BYTES , not wchars
    ps[ pinfo->Name.Length / 2 ] = 0;

    lprintf("OBJECT_NAME_INFORMATION: (%S)\n",ps);
}

which will give you a name like :

    OBJECT_NAME_INFORMATION: (\Device\HarddiskVolume1\devel\projects\oodle\examples\oodle_future.h)

and then you just have to pull off the drive part and call MapNtDriveName (mentioned previously but not yet detailed).

Note that there's another call that looks appealing :


    CALL_NT(NtQueryInformationFile)(f,
        &block,
        infobuf,
        sizeof(infobuf),
        FileNameInformation);

but NtQueryInformationFile seems to always give you just the file name without the drive. In fact it seems possible to use NtQueryInformationFile and NtQueryObject to separate the drive part and path part.

That is, you get something like :


t: is substed to c:\trans

LogDosDrives prints :

T: : \??\C:\trans

we ask about :

fmName : t:\prefs.js

we get :

NtQueryInformationFile: "\trans\prefs.js"
NtQueryObject: "\Device\HarddiskVolume4\trans\prefs.js"

If there was a way to get the drive letter, then you could just use NtQueryInformationFile , but so far as I know there is no simple way, so we have to go through all this mess.

On network shares, it's similar but a little different :


y: is net used to \\charlesbpc\C$

LogDosDrives prints :

Y: : \Device\LanmanRedirector\;Y:0000000000034569\charlesbpc\C$

we ask about :

fmName : y:\xfer\path.txt

we get :

NtQueryInformationFile: "\charlesbpc\C$\xfer\path.txt"
NtQueryObject: "\Device\Mup\charlesbpc\C$\xfer\path.txt"

so in that case you could just prepend a "\" to NtQueryInformationFile , but again I'm not sure how to know that what you got was a network share and not just a directory, so we'll go through all the mess here to figure it out.

4. MapNtDriveName is needed to map an NT-namespace drive name to a Win32/DOS-namespace name.

I've found two different ways of doing this, and they seem to produce the same results in all the tests I've run, so it's unclear if one is better than the other.

4.A. MapNtDriveName by QueryDosDevice

QueryDosDevice gives you the NT name of a dos drive. This is the opposite of what we want, so we have to reverse the mapping. The way is to use GetLogicalDriveStrings which gives you all the dos drive letters, then you can look them up to get all the NT names, and thus create the reverse mapping.

Here's LogDosDrives :


void LogDosDrives()
{
    #define BUFSIZE 2048
    // Translate path with device name to drive letters.
    wchar_t szTemp[BUFSIZE];
    szTemp[0] = '\0';

    // GetLogicalDriveStrings
    //  gives you the DOS drives on the system
    //  including substs and network drives
    if (GetLogicalDriveStringsW(BUFSIZE-1, szTemp)) 
    {
      wchar_t szName[MAX_PATH];
      wchar_t szDrive[3] = (L" :");

      wchar_t * p = szTemp;

      do 
      {
        // Copy the drive letter to the template string
        *szDrive = *p;

        // Look up each device name
        if (QueryDosDeviceW(szDrive, szName, MAX_PATH))
        {
            lprintf("%S : %S\n",szDrive,szName);
        }

        // Go to the next NULL character.
        while (*p++);
        
      } while ( *p); // double-null is end of drives list
    }

    return;
}

/**

LogDosDrives prints stuff like :

A: : \Device\Floppy0
C: : \Device\HarddiskVolume1
D: : \Device\HarddiskVolume2
E: : \Device\CdRom0
H: : \Device\CdRom1
I: : \Device\CdRom2
M: : \??\D:\misc
R: : \??\D:\ramdisk
S: : \??\D:\ramdisk
T: : \??\D:\trans
V: : \??\C:
W: : \Device\LanmanRedirector\;W:0000000000024326\radnet\raddevel
Y: : \Device\LanmanRedirector\;Y:0000000000024326\radnet\radmedia
Z: : \Device\LanmanRedirector\;Z:0000000000024326\charlesb-pc\c

**/

Recall from the last post that "\??\" is the NT-namespace way of mapping back to the win32 namespace. Those are substed drives. The "net use" drives get the "Lanman" prefix.

MapNtDriveName using QueryDosDevice is :


bool MapNtDriveName_QueryDosDevice(const wchar_t * from,wchar_t * to)
{
    #define BUFSIZE 2048
    // Translate path with device name to drive letters.
    wchar_t allDosDrives[BUFSIZE];
    allDosDrives[0] = '\0';

    // GetLogicalDriveStrings
    //  gives you the DOS drives on the system
    //  including substs and network drives
    if (GetLogicalDriveStringsW(BUFSIZE-1, allDosDrives)) 
    {
        wchar_t * pDosDrives = allDosDrives;

        do 
        {
            // Copy the drive letter to the template string
            wchar_t dosDrive[3] = (L" :");
            *dosDrive = *pDosDrives;

            // Look up each device name
            wchar_t ntDriveName[BUFSIZE];
            if ( QueryDosDeviceW(dosDrive, ntDriveName, ARRAY_SIZE(ntDriveName)) )
            {
                size_t ntDriveNameLen = strlen(ntDriveName);

                if ( strnicmp(from, ntDriveName, ntDriveNameLen) == 0
                         && ( from[ntDriveNameLen] == '\\' || from[ntDriveNameLen] == 0 ) )
                {
                    strcpy(to,dosDrive);
                    strcat(to,from+ntDriveNameLen);
                            
                    return true;
                }
            }

            // Go to the next NULL character.
            while (*pDosDrives++);

        } while ( *pDosDrives); // double-null is end of drives list
    }

    return false;
}

4.B. MapNtDriveName by IOControl :

There's a more direct way using DeviceIoControl. You just send a message to the "MountPointManager" which is the guy who controls these mappings. (this is from "Mehrdad" on Stackoverflow) :


struct MOUNTMGR_TARGET_NAME { USHORT DeviceNameLength; WCHAR DeviceName[1]; };
struct MOUNTMGR_VOLUME_PATHS { ULONG MultiSzLength; WCHAR MultiSz[1]; };

#define MOUNTMGRCONTROLTYPE ((ULONG) 'm')
#define IOCTL_MOUNTMGR_QUERY_DOS_VOLUME_PATH \
    CTL_CODE(MOUNTMGRCONTROLTYPE, 12, METHOD_BUFFERED, FILE_ANY_ACCESS)

union ANY_BUFFER {
    MOUNTMGR_TARGET_NAME TargetName;
    MOUNTMGR_VOLUME_PATHS TargetPaths;
    char Buffer[4096];
};

bool MapNtDriveName_IoControl(const wchar_t * from,wchar_t * to)
{
    ANY_BUFFER nameMnt;
    
    int fromLen = strlen(from);
    // DeviceNameLength is in *bytes*
    nameMnt.TargetName.DeviceNameLength = (USHORT) ( 2 * fromLen );
    strcpy(nameMnt.TargetName.DeviceName, from );
    
    HANDLE hMountPointMgr = CreateFile( ("\\\\.\\MountPointManager"),
        0, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
        NULL, OPEN_EXISTING, 0, NULL);
        
    ASSERT_RELEASE( hMountPointMgr != 0 );
        
    DWORD bytesReturned;
    BOOL success = DeviceIoControl(hMountPointMgr,
        IOCTL_MOUNTMGR_QUERY_DOS_VOLUME_PATH, &nameMnt,
        sizeof(nameMnt), &nameMnt, sizeof(nameMnt),
        &bytesReturned, NULL);

    CloseHandle(hMountPointMgr);
    
    if ( success && nameMnt.TargetPaths.MultiSzLength > 0 )
    {    
        strcpy(to,nameMnt.TargetPaths.MultiSz);

        return true;    
    }
    else
    {    
        return false;
    }
}

5. Fix MapNtDriveName for network names.

I said that MapNtDriveName_IoControl and MapNtDriveName_QueryDosDevice produced the same results and both worked. Well, that's only true for local drives. For network drives they both fail, but in different ways. MapNtDriveName_QueryDosDevice just won't find network drives, while MapNtDriveName_IoControl will hang for a long time and eventually time out with a failure.

We can fix it easily though because the NT path for a network share contains the valid win32 path as a suffix, so all we have to do is grab that suffix.


bool MapNtDriveName(const wchar_t * from,wchar_t * to)
{
    // hard-code network drives :
    if ( strisame(from,L"\\Device\\Mup") || strisame(from,L"\\Device\\LanmanRedirector") )
    {
        strcpy(to,L"\\");
        return true;
    }

    // either one :
    //return MapNtDriveName_IOControl(from,to);
    return MapNtDriveName_QueryDosDevice(from,to);
}

This just takes the NT-namespace network paths, like :

"\Device\Mup\charlesbpc\C$\xfer\path.txt"

->

"\\charlesbpc\C$\xfer\path.txt"

And we're done.

12-21-12 - Coroutine-centric Architecture

I've been talking about this for a while but maybe haven't written it all clearly in one place. So here goes. My proposal for a coroutine-centric architecture (for games).

1. Run one thread locked to each core.

(NOTE : this is only appropriate on something like a game console where you are in control of all the threads! Do not do this on an OS like Windows where other apps may also be locking to cores, and you have the thread affinity scheduler problems, and so on).

The one-thread-per-core set of threads is your thread pool. All code runs as "tasks" (or jobs or whatever) on the thread pool.

The threads never actually do ANY OS Waits. They never switch. They're not really threads, you're not using any of the OS threading any more. (I suppose you still are using the OS to handle signals and such, and there are probably some OS threads that are running which will grab some of your time, and you want that; but you are not using the OS threading in your code).

2. All functions are coroutines. A function with no yields in it is just a very simple coroutine. There's no special syntax to be a coroutine or call a coroutine.

All functions can take futures or return futures. (a future is just a value that's not yet ready). Whether you want this to be totally implicit or not is up to your taste about how much of the operations behind the scenes are visible in the code.

For example if you have a function like :


int func(int x);

and you call it with a future<int> :

future<int> y;
func(y);

it is promoted automatically to :

future<int> func( future<int> x )
{
    yield x;
    return func( x.value );
}

When you call a function, it is not a "branch", it's just a normal function call. If that function yields, it yields the whole current coroutine. That is, it's just like threading and waits, but rather with coroutines and yields.

To branch I would use a new keyword, like "start" :


future<int> some_async_func(int x);

int current_func(int y)
{

    // execution will step directly into this function;
    // when it yields, current_func will yield

    future<int> f1 = some_async_func(y);

    // with "start" a new coroutine is made and enqueued to the thread pool
    // my coroutine immediately continues to the f1.wait
    
    future<int> f2 = start some_async_func(y);
    
    return f1.wait();
}

"start" should really be an abbreviation for a two-phase launch, which allows a lot more flexibility. That is, "start" should be a shorthand for something like :


start some_async_func(y);

is

coro * c = new coro( some_async_func(y); );
c->start();

because that allows batch-starting, and things like setting dependencies after creating the coro, which I have found to be very useful in practice. eg :

coro * c[32];

for(i in 32)
{
    c[i] = new coro( );
    if ( i > 0 )
        c[i-1]->depends( c[i] );
}

start_all( c, 32 );

Batch starting is one of those things that people often leave out. Starting tasks one by one is just like waiting for them one by one (instead of using a wait_all), it causes bad thread-thrashing (waking up and going back to sleep over and over, or switching back and forth).

3. Full stack-saving is crucial.

For this to be efficient you need a very small minimum stack size (4k is probably good) and you need stack-extension on demand.

You may have lots of pending coroutines sitting around and you don't want them gobbling all your memory with 64k stacks.

Full stack saving means you can do full variable capture for free, even in a language like C where tracking references is hard.

4. You stop using the OS mutex, semaphore, event, etc. and instead use coroutine variants.

Instead of a thread owning a lock, a coroutine owns a lock. When you block on a lock it's a yield of the coroutine instead a full OS wait.

Getting access to a mutex or semaphore is an event that can trigger coroutines being run or resumed. eg. it's a future just like the return from an async procedural call. So you can do things like :


future<int> y = some_async_func();

yield( y , my_mutex.when_lock() );

which yields your coroutine until the joint condition is met that the async func is done AND you can get the lock on "my_mutex".

Joint yields are very important because they prevent unnecessary coroutine wakeup. While coroutine thrashing is not nearly as bad as thread thrashing (and is one of the big advantages of coroutine-centric architecture (in fact perhaps the biggest)).

You must have coroutine versions of all the ops that have delays (file IO, networking, GPU, etc) so that you can yield on them instead of doing thread-waits.

5. You must have some kind of GC.

Because coroutines will constantly be capturing values, you must ensure their lifetime is >= the life of the coroutine. GC is the only reasonable way to do this.

I would also go ahead and put an RW-lock in every object as well since that will be necessary.

6. Dependencies and side effects should be expressed through args and return values.

You really need to get away from funcs like


void DoSomeStuff(void);

that have various un-knowable inputs and outputs. All inputs & outputs need to be values so that they can be used to create dependency chains.

When that's not directly possible, you must use a convention to express it. eg. for file manipulation I recommend using a string containing the file name to express the side effects that go through the file system (eg. for Rename, Delete, Copy, etc.).

7. Note that coroutines do not fundamentally alter the difficulties of threading.

You still have races, deadlocks, etc. Basic async ops are much easier to write with coroutines, but they are no panacea and do not try to be anything other than a nicer way of writing threading. (eg. they are not transactional memory or any other auto-magic).

to be continued (perhaps) ....

Add 3/15/13 : 8. No static size anything. No resources you can run out of. This is another "best practice" that goes with modern thread design that I forgot to list.

Don't use fixed-size queues for thread communication; they seem like an optimization or simplification at first, but if you can ever hit the limit (and you will) they cause big problems. Don't assume a fixed number of workers or a maximum number of async ops in flight, this can cause deadlocks and be a big problem.

The thing is that a "coroutine centric" program is no longer so much like a normal imperative C program. It's moving towards a functional program where the path of execution is all nonlinear. You're setting a big graph to evaluate, and then you just need to be able to hit "go" and wait for the graph to close. If you run into some limit at some point during the graph evaluation, it's a big mess figuring out how to deal with that.

Of course the OS can impose limits on you (eg. running out of memory) and that is a hassle you have to deal with.

12-21-12 - Coroutines From Lambdas

Being pedantic while I'm on the topic. We've covered this before.

Any language with lambdas (that can be fired when an async completes) can simulate coroutines.

Assume we have some async function call :


future<int> AsyncFunc( int x );

which send the integer off over the net (or whatever) and eventually gets a result back. Assume that future<> has a "AndThen" which schedules a function to run when it's done.

Then you can write a sequence of operations like :


future<int> MySequenceOfOps( int x1 )
{
    x1++;

    future<int> f1 = AsyncFunc(x1);

    return f1.AndThen( [](int x2){

    x2 *= 2;

    future<int> f2 = AsyncFunc(x2);

    return f2.AndThen( [](int x3){

    x3 --;

    return x3;

    } );
    } );

}

with a little munging we can make it look more like a standard coroutine :

#define YIELD(future,args)  return future.AndThen( [](args){

future<int> MySequenceOfOps( int x1 )
{
    x1++;

    future<int> f1 = AsyncFunc(x1);

    YIELD(f1,int x2)

    x2 *= 2;

    future<int> f2 = AsyncFunc(x2);

    YIELD(f2,int x3)

    x3 --;

    return x3;

    } );
    } );

}

the only really ugly bit is that you have to put a bunch of scope-closers at the end to match the number of yields.

This is really what any coroutine is doing under the hood. When you hit a "yield", what it does is take the remainder of the function and package that up as a functor to get called after the async op that you're yielding on is done.

Coroutines from lambdas have a few disadvantages, aside from the scope-closers annoyance. It's ugly to do anything but simple linear control flow. The above example is the very simple case of "imperative, yield, imperative, yield" , but in real code you want to have things like :


if ( bool )
{
    YIELD
}

or

while ( some condition )
{

    YIELD
}

which while probably possible with lambda-coroutines, gets ugly.

An advantage of lambda-coroutines is if you're in a language where you have lambdas with variable-capture, then you get that in your coroutines.

12/18/2012

12-18-12 - Async-Await ; Microsoft's Coroutines

As usual I'm way behind in knowing what's going on in the world. Lo and behold, MS have done a coroutine system very similar to me, which they are now pushing as a fundamental technology of WinRT. Dear lord, help us all. (I guess this stuff has been in .NET since 2008 or so, but with WinRT it's now being pushed on C++/CX as well)

I'm just catching up on this, so I'm going to make some notes about things that took a minute to figure out. Correct me where I'm wrong.

For the most part I'll be talking in C# lingo, because this stuff comes from C# and is much more natural there. There are C++/CX versions of all this, but they're rather more ugly. Occasionally I'll dip into what it looks like in CX, which is where we start :

1. "hat" (eg. String^)

Hat is a pointer to a ref-counted object. The ^ means inc and dec ref in scope. In cbloom code String^ is StringPtr.

The main note : "hat" is a thread-safe ref count, *however* it implies no other thread safety. That is, the ref-counting and object destruction is thread safe / atomic , but derefs are not :


Thingy^ t = Get(); // thread safe ref increment here
t->var1 = t->var2; // non-thread safe var accesses!

There is no built-in mutex or anything like that for hat-objects.

2. "async" func keyword

Async is a new keyword that indicates a function might be a coroutine. It does not make the function into an asynchronous call. What it really is is a "structify" or "functor" keyword (plus a "switch") . Like a C++ lambda, the main thing the language does for you is package up all the local variables and function arguments and put them all in a struct. That is (playingly rather loosely with the translation for brevity) :


async void MyFunc( int x )
{
    string y;

    stuff();
}

[ is transformed to : ]

struct MyFunc_functor
{
    int x;
    string y;

    void Do() { stuff(); }
};

void MyFunc( int x )
{
    // allocator functor object :
    MyFunc_functor * f = new MyFunc_functor();
    // copy in args :
    f->x = x;
    // run it :
    f->Do();
}

So obviously this functor that captures the function's state is the key to making this into an async coroutine.

It is *not* stack saving. However for simple usages it is the same. Obviously crucial to this is using a language like C# which has GC so all the references can be traced, and everything is on the heap (perhaps lazily). That is, in C++ you could have pointers and references that refer to things on the stack, so just packaging up the args like this doesn't work.

Note that in the above you didn't see any task creation or asynchronous func launching, because it's not. The "async" keyword does not make a function async, all it does is "functorify" it so that it *could* become async. (this is in contrast to C++11 where "async" is an imperative to "run this asynchronously").

3. No more threads.

WinRT is pushing very hard to remove manual control of threads from the developer. Instead you have an OS thread pool that can run your tasks.

Now, I actually am a fan of this model in a limitted way. It's the model I've been advocating for games for a while. To be clear, what I think is good for games is : run 1 thread per core. All game code consists of tasks for the thread pool. There are no special purpose threads, any thread can run any type of task. All the threads are equal priority (there's only 1 per core so this is irrelevant as long as you don't add extra threads).

So, when a coroutine becomes async, it just enqueues to a thread pool.

There is this funny stuff about execution "context", because they couldn't make it actually clean (so that any task can run any thread in the pool); a "context" is a set of one or more threads with certain properties; the main one is the special UI context, which only gets one thread, which therefore can deadlock. This looks like a big mess to me, but as long as you aren't actually doing C# UI stuff you can ignore it.

See ConfigureAwait etc. There seems to be lots of control you might want that's intentionally missing. Things like how many real threads are in your thread pool; also things like "run this task on this particular thread" is forbidden (or even just "stay on the same thread"; you can only stay on the same context, which may be several threads).

4. "await" is a coroutine yield.

You can only use "await" inside an "async" func because it relies on the structification.

It's very much like the old C-coroutines using switch trick. await is given an Awaitable (an interface to an async op). At that point your struct is enqueued on the thread pool to run again when the Awaitable is ready.

"await" is a yield, so you may return to your caller immediately at the point that you await.

Note that because of this, "async/await" functions cannot have return values (* except for Task which we'll see next).

Note that "await" is the point at which an "async" function actually becomes async. That is, when you call an async function, it is *not* initially launched to the thread pool, instead it initially runs synchronously on the calling thread. (this is part of a general effort in the WinRT design to make the async functions not actually async whenever possible, minimizing thread switches and heap allocations). It only actually becomes an APC when you await something.

(aside : there is a hacky "await Task.Yield()" mechanism which kicks off your synchronous invocation of a coroutine to the thread pool without anything explicit to await)

I really don't like the name "await" because it's not a "wait" , it's a "yield". The current thread does not stop running, but the current function might be enqueued to continue later. If it is enqueued, then the current thread returns out of the function and continues in the calling context.

One major flaw I see is that you can only await one async; there's no yield_all or yield_any. Because of this you see people writing atrocious code like :

await x;
await y;
await z;
stuff(x,y,z);
Now they do provide a Task.WhenAll and Task.WhenAny , which create proxy tasks that complete when the desired condition is met, so it is possible to do it right (but much easier not to).

Of course "await" might not actually yield the coroutine; if the thing you are awaiting is already done, your coroutine may continue immediately. If you await a task that's not done (and also not already running), it might be run immediately on your thread. They intentionally don't want you to rely on any certain flow control, they leave it up to the "scheduler".

5. "Task" is a future.

The Task< > template is a future (or "promise" if you like) that provides a handle to get the result of a coroutine when it eventually completes. Because of the previously noted problem that "await" returns to the caller immediately, before your final return, you need a way to give the caller a handle to that result.

IAsyncOperation< > is the lower level C++/COM version of Task< > ; it's the same thing without the helper methods of Task.

IAsyncOperation.Status can be polled for completion. IAsyncOperation.GetResults can only be called after completed. IAsyncOperation.Completed is a callback function you can set to be run on completion. (*)

So far as I can tell there is no simple way to just Wait on an IAsyncOperation. (you can "await"). Obviously they are trying hard to prevent you from blocking threads in the pool. The method I've seen is to wrap it in a Task and then use Task.Wait()

(* = the .Completed member is a good example of a big annoyance : they play very fast-and-loose with documenting the thread safety semantics of the whole API. Now, I presume that for .Completed to make any sense it must be a thread-safe accessor, and it must be atomic with Status. Otherwise there would be a race where my completion handler would not get called. Presumably your completion handler is called once and only once. None of this is documented, and the same goes across the whole API. They just expect it all to magically work without you knowing how or why.)

(it seems that .NET used to have a Future< > as well, but that's gone since Task< > is just a future and having both is pointless (?))


So, in general if I read it as :


"async" = "coroutine"  (hacky C switch + functor encapsulation)

"await" = yield

"Task" = future

then it's pretty intuitive.


What's missing?

Well there are some places that are syntactically very ugly, but possible. (eg. working with IAsyncOperation/IAsyncInfo in general is super ugly; also the lack of simple "await x,y,z" is a mistake IMO).

There seems to be no way to easily automatically promote a synchronous function to async. That is, if you have something like :


int func1(int x) { return x+1; }

and you want to run it on a future of an int (Task< int >) , what you really want is just a simple syntax like :

future<int> x = some async func that returns an int

future<int> y = start func1( x );

which makes a coroutine that waits for its args to be ready and then runs the synchronous function. (maybe it's possible to write a helper template that does this?)

Now it's tempting to do something like :


future<int> x = some async func that returns an int

int y = func1( await x );

and you see that all the time in example code, but of course that is not the same thing at all and has many drawbacks (it waits immediately even though "y" might not be needed for a while, it doesn't allow you to create async dependency chains, it requires you are already running as a coroutine, etc).

The bigger issue is that it's not a real stackful coroutine system, which means it's not "composable", something I've written about before :
cbloom rants 06-21-12 - Two Alternative Oodles
cbloom rants 10-26-12 - Oodle Rewrite Thoughts

Specifically, a coroutine cannot call another function that does the await. This makes sense if you think of the "await" as being the hacky C-switch-#define thing, not a real language construct. The "async" on the func is the "switch {" and the "await" is a "case ". You cannot write utility functions that are usable in coroutines and may await.

To call functions that might await, they must be run as their own separate coroutine. When they await, they block their own coroutine, not your calling function. That is :


int helper( bool b , AsyncStream s )
{
    if ( b )
    {
        return 0;
    }
    else
    {
        int x = await s.Get<int>();
        return x + 10;
    }
}

async Task<int> myfunc1()
{
    AsyncStream s = open it;
    int x = helper( true, s );
    return x;
}

The idea here is that "myfunc1" is a coroutine, it calls a function ("helper") which does a yield; that yields out of the parent coroutine (myfunc1). That does not work and is not allowed. It is what I would like to see in a good coroutine-centric language. Instead you have to do something like :

async Task<int> helper( bool b , AsyncStream s )
{
    if ( b )
    {
        return 0;
    }
    else
    {
        int x = await s.Get<int>();
        return x + 10;
    }
}

async Task<int> myfunc1()
{
    AsyncStream s = open it;
    int x = await helper( true, s );
    return x;
}

Here "helper" is its own coroutine, and we have to block on it. Now it is worth noting that because WinRT is aggressive about delaying heap-allocation of coroutines and is aggresive about running coroutines immediately, the actual flow of the two cases is not very different.

To be extra clear : lack of composability means you can't just have something like "cofread" which acts like synchronous fread , but instead of blocking the thread when it doesn't have enough data, it yields the coroutine.

You also can't write your own "cosemaphore" or "comutex" that yield instead of waiting the thread. (does WinRT provide cosemaphore and comutex? To have a fully functional coroutine-centric language you need all that kind of stuff. What does the normal C# Mutex do when used in a coroutine? Block the whole thread?)


There are a few places in the syntax that I find very dangerous due to their (false) apparent simplicity.

1. Args to coroutines are often references. When the coroutine is packaged into a struct and delayed execution, what you get is a non-thread-safe pointer to some shared object. It's incredibly easy to write code like :


async void func1( SomeStruct^ s )
{

    s->DoStuff();
    MoreStuff( s );

}

where in fact every touch of 's' is potentially a race and bug.

2. There is no syntax required to start a coroutine. This means you have no idea if functions are async or not at the call site!


void func2()
{

DeleteFile("b");
CopyFile("a","b");

}

Does this code work? No idea! They might be coroutines, in which case DeleteFile might return before it's done, and then I would be calling CopyFile before the delete. (if it is a coroutine, the fix is to call "await", assuming it returned a Task).

Obviously the problem arises from side effects. In this case the file system is the medium for communicating side effects. To use coroutine/future code cleanly, you need to try to make all functions take all their inputs as arguments, and to return all their effects are return values. Even if the return is not necessary, you must return some kind of proxy to the change as a way of expressing the dependency.

"async void" functions are probably bad practice in general; you should at least return a Task with no data (future< void >) so that the caller has something to wait on if they want to. async functions with side effects are very dangerous but also very common. The fantasy that we'll all write pure functions that only read their args (by value) and put all output in their return values is absurd.


It's pretty bold of them to make this the official way to write new code for Windows. As an experimental C# language feature, I think it's pretty decent. But good lord man. Race city, here we come. The days of software having repeatable outcomes are over!

As a software design point, the whole idea that "async improves responsiveness" is rather disturbing. We're gonna get a lot more trickle-in GUIs, which is fucking awful. Yes, async is great for making tasks that the user *expects* to be slow to run in the background. What it should not be used for is hiding the slowness of tasks that should in fact be instant. Like when you open a new window, it should immediately appear fully populated with all its buttons and graphics - if there are widgets in the window that take a long time to appear, they should be fixed or deleted, not made to appear asynchronously.

The way web pages give you an initial view and then gradually trickle in updates? That is fucking awful and should be used as a last resort. It does not belong in applications where you have control over your content. But that is exactly what is being heavily pushed by MS for all WinRT apps.

Having buttons move around after they first appeared, or having buttons appear after the window first opened - that is *terrible* software.

(Windows 8 is of course itself an example; part of their trick for speeding up startup is to put more things delayed until after startup. You now have to boot up, and then sit there and twiddle your thumbs for a few minutes while it actually finishes starting up. (there are some tricks to reduce this, such as using Task Scheduler to force things to run immediately at the user login event))


Some links :

Jerry Nixon @work Windows 8 The right way to Read & Write Files in WinRT
Task.Wait and �Inlining� - .NET Parallel Programming - Site Home - MSDN Blogs
CreateThread for Windows 8 Metro - Shawn Hargreaves Blog - Site Home - MSDN Blogs
Diving deep with WinRT and await - Windows 8 app developer blog - Site Home - MSDN Blogs
Exposing .NET tasks as WinRT asynchronous operations - Windows 8 app developer blog - Site Home - MSDN Blogs
Windows 8 File access sample in C#, VB.NET, C++, JavaScript for Visual Studio 2012
Futures and promises - Wikipedia, the free encyclopedia
Effective Go - The Go Programming Language
Deceptive simplicity of async and await
async (C# Reference)
Asynchronous Programming with Async and Await (C# and Visual Basic)
Creating Asynchronous Operations in C++ for Windows Store Apps
Asynchronous Programming - Easier Asynchronous Programming with the New Visual Studio Async CTP
Asynchronous Programming - Async Performance Understanding the Costs of Async and Await
Asynchronous Programming - Pause and Play with Await
Asynchronous programming in C++ (Windows Store apps) (Windows)
AsyncAwait Could Be Better - CodeProject
File Manipulation in Windows 8 Store Apps
SharpGIS Reading and Writing text files in Windows 8 Metro

12/06/2012

12-6-12 - Theoretical Oodle Rewrite Continued

So, continuing on the theme of a very C++-ish API with "futures" , ref-counted buffers, etc. :

cbloom rants 07-19-12 - Experimental Futures in Oodle
cbloom rants 10-26-12 - Oodle Rewrite Thoughts

It occurs to me that this could massively simplify the giant API.

What you do is treat "array data" as a special type of object that can be linearly broken up. (I noted previously about having RW locks in every object and special-casing arrays by letting them be RW-locked in portions instead of always locking the whole buffer).

Then arrays could have two special ways of running async :

1. Stream. A straightforward futures sequence to do something like read-compress-write would wait for the whole file read to be done before starting the compress. What you could do instead is have the read op immediately return a "stream future" which would be able to dole out portions of the read as it completed. Any call that processes data linearly can be a streamer, so "compress" could also return a stream future, and "write" would then be able to write out compressed bits as they are made, rather than waiting on the whole op.

2. Branch-merge. This is less of an architectural thing than just a helper (you can easily write it client-side with normal futures); it takes an array and runs the future on portions of it, rather than running on the whole thing. But having this helper in from the beginning means you don't have to write lots of special case branch-merges to do things like compress large files in several pieces.

So you basically just have a bunch of simple APIs that don't look particularly Async. Read just returns a buffer (future). ReadStream returns a buffer stream future. They look like simple buffer->buffer APIs and you don't have to write special cases for all the various async chains, because it's easy for the client to chain things together as they please.

To be redundant, the win is that you can write a function like Compress() and you write it just like a synchronous buffer-to-buffer function, but it's arguments can be futures and its return value can be a future.

Compress() should actually be a stackful coroutine, so that if the input buffer is a Stream buffer, then when you try to access bytes that aren't yet available in that buffer, you Yield the coroutine (pending on the stream filling).


Functions take futures as arguments and return futures.

Every function is actually run as a stackful coroutine on the worker threadpool.

Functions just look like synchronous code, but things like file IO cause a coroutine Yield rather than a thread Wait.

All objects are ref-counted and create automatic dependency chains.

All objects have built-in RW locks, arrays have RW locks on regions.

Parallelism is achieved through generic Stream and Branch/Merge facilities.

While this all sounds very nice in theory, I'm sure in practice it wouldn't work. What I've found is that every parallel routine I write requires new hacky special-casing to make it really run at full efficiency.

12-6-12 - The Oodle Dilemma

Top two Oodle (potential) customer complaints :

1. The API is too big, it's too complicated. There are too many ways of doing basically the same thing, and too many arguments and options on the functions.

2. I really want to be able to do X but I can't do it exactly the way I want with the API, can you add another interface?

Neither one is wrong.

old rants