6/23/2008

06-23-08 - 1

I've been checking out the Better String Library .

Let me back up a bit. I'm not really delighted with my String. For ease of edits, I really like plain old char []. I even like the null-delimit of C that some people hate. The null delimit is really swank because you can take a big string and split it into substrings in place just by jamming a null, then you restore it by putting the original char back. This is a very common and sweet thing, for example if you want to split a full file spec into the path part and file part you just find the last slash and jam a null, blamo you have path part and file part, then you can get the full file spec back just by going filepart[-1] = '/'; how hot. I also really like the sprintf() way of making strings.

So, what's the problem with char[] ? A few things. One is the static sizes. Mostly that's okay (assuming you use all the "n" versions of functions to prevent overruns, which you probably don't). Even if you do, it's ugly for cases of very highly variable lengths, like emails or something. The other big one is they don't go in containers well. sprintf is very unsafe, but if you use my safeprintf you get a lot of protection against the common errors there.

Now the String I have in cblib is a COW string class which mainly doesn't provide edit functions. My design idea was that you use it for storage, and to do edits you get the cstring and stick it in a char[] and edit, then jam the result back in a String. Sort of like the Java readable string / writeable string. My String is not strictly read-only but it's just a bit of a pain to edit so I rarely do. COW is pretty far out of favor these days but I still think it has a lot going for it. For one thing it lets you just return things by value and pass by value in function args and not worry about making copies. That's very handy. The real reason COW is nice though is that it plays very well with STL containers; you can make a vector < String > and not have lots of unnecessary copying, and you can even just std::sort on it and it's all good. Of course you could get the same benefit by wrapping your string in a ptr, like vector < shared_ptr < std::string > > or something.

Okay, so anyway, the idea of a string class that's very simple and easy to use like char[] but is packed up and safe and containerable like my String is pretty appealing.

"Better String" is pretty close. It's really not like a containerable string class unless you use shared_ptr< CBString > because if you just use vector< CBString > you get copies like crazy. It's really just a wrapped and cleaned up version of char[] which will resize instead of overrun and all that kind of good stuff. However :

It's an annoying thing that it doesn't treat null the same as C strings. If you want to track the length in a seperate variable to accelerate strlen and strcat, that's fine, you can do that and still support null. Rather than having an operator[] that returns a char, you have an operator[] that returns a CharProxy. When CharProxy is assigned a null, it updates the length. If CharProxy sets a null to non-null, it updates or invalidates the length. But this has a problem :

When we work on C strings, we often temporarily invalidate them and this is hard to translate to bstring or any other string class. The simplest example goes like this :

1. int len = strlen(string);
2. string[len] = 'a';
3. string[len+1] = 0;

This is code to stick an 'a' char on the end of string. It looks pretty normal (maybe), but it holds a trap. In between lines 2 and 3 the string is temporarily fucked. We stomped on the null and made the string of indeterminate length. We simply rely on the fact that during this phase of temporary fuckitude, we will be treating "string" only as a char array and not as an actual string. Obviously this is a silly example but this general pattern of temporarily fucking the string and treating it as a raw char array is very common to C-style string manipulation, and IMO is part of what makes it cool.

I dunno, I'm still kind of unhappy. It seems to me that maybe making a solid EditString and ConstString might be the way to go. That was kind of annoying in Java though.

No comments:

old rants