#define macrogetc(_stream) (--(_stream)->_cnt >= 0 ? ((char)*(_stream)->_ptr++) : _filbuf(_stream))when you know that your FILE is only being used by one thread.
In general there's a nasty issue with multithreaded API design. When you have some object that you know might be accessed from various threads, should you serialize inside every access to that object? Or should you force the client to serialize on the outside?
If you serialize on the inside, you can have severe performance penalties, like with getc.
If you serialize on the outside, you make it very prone to bugs because it's easy to access without protection.
One solution is to introduce an extra "FileLock" object. So the "File" itself has no accessors except "Lock" which gives you a FileLock, and then the "getc" is on the FileLock. That way somebody can grab a FileLock and then locally do fast un-serialized access through that structure. eg:
instead of : c = File.getc(); you have : FileLock fl = File.lock(); c = fl.getc();
Another good solution would be to remove the buffering from the stdio FILE and introduce a "FileView" object which has a position and a buffer. Then you can have mutliple FileViews per FILE which are in different threads, but the FileViews themselves must be used only from one thread at a time. Accesses to the FILE are serialized, accesses to the FileView are not. eg :
eg. FILE * fp is shared. Thread1 has FileView f1(fp); Thread2 has FileView f2(fp); FileView contains the buffer you do : c = f1.getc(); it can get a char from the buffer without synchronization, but buffer refill locks.(of course this is a mess with readwrite files and such; in general I hate dealing with mutable shared data, I like the model that data is either read-only and shared, or writeable and exclusively locked by one thread).
(The FileView approach is basically what I've done for Oodle; the low-level async file handles are thread-safe, but the buffering object that wraps it is not. You can have as many buffering objects as you want on the same file on different threads).
Anyway, in practice, macrogetc is a perfectly good solution 99% of the time.
Stdio is generally very fast. You basically can't beat it except in the trivial way (trivial = use a larger buffer). The only things you can do to beat it are :
1. Double-buffer the streaming buffer filling and use async IO to fill the buffer (stdio uses synchronous IO and only fills when the buffer is empty).
2. Have an accessor like Peek(bytes) to the buffer that just gives you a pointer directly into the buffer and ensures it has bytes amount of data for you to read. This eliminates the branch to check fill on each byte input, and eliminates the memcpy from fread.
(BTW in theory you could avoid another memcpy, because Windows is doing IO into the disk cache pages, so you could just lock those and get a pointer and process from them directly. But they don't let you do this for obvious security reasons. Also if you knew in advance that your file was in the disk cache (and you were never going to use it again so you don't want it to get into the cache) you could do uncached IO, which is microscopically faster for data not in the cache, because it avoids that memcpy and page allocation. But again they don't let you ask "is this in the cache" and it's not worth sweating about).
Anyway, the point is you can't really beat stdio for byte-at-a-time streaming input, so don't bother. (on Windows, where the disk cache is pretty good, eg. it does sequential prefetching for you).