6/21/2008

06-21-08 - 3

Wrapping up my Unicode adventure :

I uploaded new versions of cblib and the ChukSH executables that should handle unicode pretty decently. The main piece of code is a function in cblib/FileUtil.h called "MakeUnicodeNameFullMatch" which takes a char string and does a file search in each directory portion of the path to promote the single byte char strings to unicode one by one. (fixes the flaw in earlier posted code on this page).

I've straightened out a few things. For one the "%S" (capital S) unicode printing is just totally broken, don't use it. Console output is indeed "OEM", so in my console apps I am treating all single byte char strings as OEM, and any time I do any file IO, I use the unicode file IO functions (the single byte char file IO functions would be in "A" code page, so you cannot use them if you treat your char strings as "A"). For example, I do FindFirst/FindNext in unicode, then kick down to OEM single byte to do substring compares to the command line args, then promote back up to Unicode to do a file rename or copy.

Some notes for people using consoles :

cmd.exe seems to do everything "right" in the sense that it displays unicode by converting to OEM code page. The dir autocomplete and "cd" and such all take the OEM code page names.

4NT is just totally broken for unicode. You can't even "cd" into dirs with unicode names. That's sort of okay because 4NT is deprecated and you're supposed to use :

TCC is pretty weird with unicode. "dir" appears to be incorrectly outputting "A" code page names, not OEM, but the console is still OEM code page, so you get bad characters. I believe what's actually happening is TCC is actually putting unicode chars in its screen buffer (since you can drag-select-copy and get the true unicode names in the clip board), and windows is converting those unicode chars to "A", not OEM, for display. That seems to just be a bug. If you drag-select a unicode name and paste it to the TCC command line, it converts to OEM.

No comments:

old rants