07-30-09 - cbloom.com

I just had a look at the cbloom.com stats for the first time in over a year, and it was quite a shock!!

A while ago I tried putting up a wordpress blog; I never really even got it off the ground, but apparently it's the most popular page on my site :

requests    page
24832       /wordpress/
15681       /wordpress/wp-comments-post.php
12585       /rants.html
10966       /robots.txt

And lots of bots are trying to make comments. If you look at the browsers used to visit the site it's obvious why :

reqs    pages   browser
8276    8082    Baiduspider (http://www.baidu.com/search/spider.htm)
6415    4186    Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; )
4299    3636    T-Mobile Dash Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; Smartphone; 320x240)
5062    3102    msnbot/2.0b (http://search.msn.com/msnbot.htm)
4420    3001    Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
5902    2903    msnbot/1.1  (http://search.msn.com/msnbot.htm)
2700    2700    Mozilla/5.0 (Macintosh; Mac OS X AppleWebKit KHTML, like Gecko, Safari/528.16 OmniWeb/v622.8.0)
3236    2212    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
3081    2057    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Baiduspider I guess is a chinese index bot, so I'm not sure exactly who's trying to autopost to my wordpress. #2 is the Yahoo search spider. The "T-Mobile Dash Mozilla" apparently is also a search bot for MSN Mobile. Then we get three more search bots, and then we get Safari ! It's not until the #8th most used browser that we see just a normal Windows Firefox (correction : actually Mozilla/4.0 means IE).

Not surprisingly, the Googlebot seems to be way more polite about not abusing your site. Even though it indexes my site as well as anyone, it hits me far less. (MSN is actually the worst by far if you add them up). (of course some of these could be spam bots pretending to be search bots).

The other funny thing was the referring search words - the top nine are all poker related, it's not until #10 that you get "compression".

In semi-unrelated news, god damn all you ghetto fucking developers and your ghetto ass command line parsing skills. Any time I see a message like this I want to punch you in the nuts :

"Unknown argument -? ; use --help for help"

URG WTF ; 1. just go ahead and show me the help if my args are no good, and 2. how fucking hard is it to recognize a question mark? Just add it to your switch cases :

case 'h':
case '?':
    // show help

How hard was that?

The one that bugs me more is not recognizing switches unless they come in the right order, like I was just using "NcFTP" today and it says

Usage: ncftp [flags] < host >

So of course I type

ncftp www.cbloom.com -u xxx

And it's completely confused. WTF, how hard is that to handle ? (voice of Gob) COME ON!!

(BTW NcFTP is a handly light FTP client that supports recursive delete get and put).

The standard way I use command line apps is to write some command line, up arrow to edit it and tack more flags on the back. You should always handle flags occuring at any point of the command line.

Another one that bugs is people who expect either a space or not space for the payload of a flag ; eg they require either "-s 1" or "-s1" but don't support the other. The big problem is it's not standard so my hands don't have a reflexive habit of which way to do it. Of course you should just support both, it's pretty fucking trivial.

In cblib I use this :

int argint(int argc,char * argv[],int & i)
    char * cur = argv[i]+2;
    cur = skipwhitespace(cur);
    if ( *cur )
        char * endptr;
        int ret = strtol(cur,&endptr,10);
        if ( endptr != cur )
            return ret;

    if ( i == argc-1 )
        FAIL("no int value found for arg!");

    return atoi(argv[i]);

A standard cbloom arg parser looks like this :

int main(int argc,char *argv[])
    lprintf("newdct built %s %s\n",__DATE__,__TIME__);
    lprintf("usage : newdct [options] < from > [to]\n");
    lprintf("usage : newdct -h for help\n");

    int killBands = 0;
    const char * fmName = NULL;
    cosnt char * toName = NULL;

    for(int argi=1; argi < argc ; argi++ )
        if ( argv[argi][0] == '-' )
            switch( toupper(argv[argi][1]) )
            case '?':
            case 'H':
                fmName = NULL;
                argi = argc;

            case 'K':
                // handles -k N and -kN and "-k N"
                killBands = argint(argc,argv,argi);
                lprintf("got option : killBands = %d\n",killBands);
                lprintf("warning : bad option ignored : %s\n",argv[argi]);
            if( fmName == NULL ) { fmName = argv[argi]; }
            else if ( toName == NULL ) { toName = argv[argi]; }
                lprintf("warning : ignored extra arg : %s\n",argv[argi]);   
    // toName is optional
    if ( fmName == NULL )
        lprintf("HELP : \n");
        // ...

Do it my way god damn you.


nothings said...

I use this code to parse commandline options, at least when I remember it exists in stb.h.

It relies on the argv array being writeable -- it parses all the options and puts them in a separate array (one string per option), and updates argc/argv to only contain the non-option arguments. The third argument is a list of the options that take a parameter.

MikeShlz said...

heh, this complaint has bugged me for years, possibly decades. And it's partly why I forsook Unixes, because every g-d thing you need to do runs through a command line, which unpredictable switches and parsing goofs. I guess if you got the patience to figure each out, you get to be a guru, but that isn't to me.

Tom said...

As far as the pressing issue of the argument separator goes, the GNU tools' standard seems to be that one char switches have a space between switch and argument since they can be specified several at once (e.g., "ls -alF" = "ls -a -l -F"). So if you're just looking for a standard, any standard, then you'd be in some company if you followed this. Though the same would go if you didn't.

old rants