cbloom rants: 06-04-11

I've been meaning to do this for a long time and finally got off my ass.

TR (text replace) and zren (rename) in ChukSH now support "keep case".

Keep case is pretty much what you always want when you do text replacement (especially in source code), and everybody should copy me. For example when I do a find-replace from "lzp1f" -> "lzp1g" what I want is :


lzp1f -> lzp1g  (lower->lower)
LZP1F -> LZP1G  (upper->upper)
Lzp1f -> Lzp1g  (first cap->first cap)
Lzp1G -> Lzp1G  (mixed -> mixed)

The kernel that does this is matchpat in cblib which will handle rename masks like : "poop*face" -> "shit*butt" with keep case option or not.

In a mixed-wild-literal renaming spec like that, the "keep case" applies only to the literal parts. That is, "poop -> shit" and "face -> butt" will be applied with keep-case independently , the "*" part will just get copied.

eg :


Poop3your3FACE -> Shit3your3BUTT

Also, because keep-case is applied to an entire chunk of literals, it can behave somewhat unexpectedly on file renames. For example if you rename


src\lzp* -> src\zzh*

the keep-case will apply to the whole chunk "src\lzp" , so if you have a file like "src\LZP" that will be considered "mixed case" not "all upper". Sometimes my intuition expects the rename to work on the file part, not the full path. (todo : add an option to separate the case-keeping units by path delims)

The way I handle "mixed case" is I leave it up to the user to provide the mixed case version they want. It's pretty impossible to get it right automatically. So the replacement text should be provided in the ideal mixed case capitalization. eg. to change "HelpSystem" to "QueryManager" you need to give me "QueryManager" as the target string, capitalized that way. All mixed case source occurances of "HelpSystem" will be changed to the same output, eg.


helpsystem -> querymanager
HELPSYSTEM -> QUERYMANAGER
Helpsystem -> Querymanager
HelpSystem -> QueryManager
HelpsYstem -> QueryManager
heLpsYsTem -> QueryManager
HeLPSYsteM -> QueryManager

you get it.

The code is trivial of course, but here it is for your copy-pasting pleasure. I want this in my dev-studio find/replace-in-files please !


// strcpy "putString" to "into"
//  but change its case to match the case in src
// putString should be mixed case , the way you want it to be if src is mixed case
void strcpyKeepCase(
        char * into,
        const char * putString,
        const char * src,
        int srcLen);

void strcpyKeepCase(
        char * into,
        const char * putString,
        const char * src,
        int srcLen)
{   
    // okay, I have a match
    // what type of case is "src"
    //  all lower
    //  all upper
    //  first upper
    //  mixed
    
    int numLower = 0;
    int numUpper = 0;
    
    for(int i=0;i<srcLen;i++)
    {
        ASSERT( src[i] != 0 );
        if ( isalpha(src[i]) )
        {
            if ( isupper(src[i]) ) numUpper++;
            else numLower++;
        }
    }
    
    // non-alpha :
    if ( numLower+numUpper == 0 )
    {
        strcpy(into,putString);
    }
    else if ( numLower == 0 )
    {
        // all upper :
        while( *putString )
        {
            *into++ = toupper( *putString ); putString++;
        }
        *into = 0;
    }
    else if ( numUpper == 0 )
    {
        // all lower :
        while( *putString )
        {
            *into++ = tolower( *putString ); putString++;
        }
        *into = 0;
    }
    else if ( numUpper == 1 && isalpha(src[0]) && isupper(src[0]) )
    {
        // first upper then low
        
        if( *putString ) //&& isalpha(*putString) )
        {
            *into++ = toupper( *putString ); putString++;
        }
        while( *putString )
        {
            *into++ = tolower( *putString ); putString++;
        }
        *into = 0;
    }
    else
    {
    
        // just copy putString - it should be mixed 
        strcpy(into,putString);
    }
}

ADDENDUM : on a roll with knocking off stuff I've been meaning to do for a while ...

ChukSH now also contains "fixhtmlpre.exe" which fixes any less-than signs that are found within a PRE chunk.

Hmm .. something lingering annoying going on here. Does blogger convert and-l-t into less thans?

ADDENDUM : yes it does. Oh my god the web is so fucked. I've been doing a bit of reading and it appears this is a common and atrocious hack. Basically the problem is that people use XML for the markup of the data transfer packets. Then they want to sent XML within those packets. So you have to form some shit like :


<data> I want to send <B> this </B> god help me </data>

but putting the less-thans inside the data packet is illegal XML (it's supposed to be plain text), so instead they send


<data> I want to send &-l-tB> this &-l-t/B> god help me </data>

but they want the receiver to see a less-than, not the characters &-l-t , so the receiver parses those codes back into less-than and then treats the data received as its own hunk of XML with internal markups.

Basically people use it as a way to send codes that the current parser will ignore, but the next parser will see. There are lots of pages about how this is against compliance standards but nobody cares and it seems to be widespread.

So anyway, the conclusion is : just changing less thans to &-l-t works fine if you are just posting html (eg. for rants.html it works fine) but for sending to Blogger (or probably any other modern XML-based app) it doesn't.

The method I use now which seems to work on Blogger is I convert less thans to


<code><</code>

How is there not a fucking "literal" tag ? (There is one called XMP but it's deprecated and causes line breaks, and it's really not just a literal tag around a bunch of characters, it's a browser format mode change)

cbloom rants

6/04/2011

06-04-11 - Keep Case

2 comments: