3/15/2007

03-15-07 - 2

Okay, so in case you don't know this C++/STL performance stuff I'll just list some random things you should be doing if you are expecting to use C++/STL and have it be fast. Once again, if you use C++/STL often you should read Alexandrescu, etc. and there are lots of good articles in CUJ, etc.

1. Use a custom allocator. The STL containers do lots of allocations (if you tell it to). Your app should already be using a customer allocator, like dlmalloc, but you still might want to turn on one of the node/pool allocators in the STL because things like map,list,etc. have a very reliable node style allocation pattern. You might also consider using arenas or the stack for temporary work. Just to be clear I'm not talking about the per-container allocator nonsense that the STL supports, I'm talking about changing the overall allocator in the STL which is a non-standard thing but pretty easy. In STLport this is in _site_config.h and _alloc.h

1.B. Most STL implementations come with some replacement allocators you can turn on. Also, most STL implementations are by default exception-safe and thread-safe. That requires some overhead, and if you don't care about that you can turn it off. There's usually a config.h for your STL implementation and you can go in there and hammer on the options to make it more performant. I like to disable iostreams and other things I don't use to speed up my compile. In STLport this is in stl_user_config.h and there's some more in _site_config.h

2. Don't use vector<> like an array. vector is a heavy thing that stores an ordered list. It shouldn't be what you pass around in function arguments - functions should take iterators, or just raw pointers. That allows the client code to just use a regular C array, or some kid of templated fixed-size array<> , or a stack array, etc. If you're going to use vector like an array, then go ahead and construct it to a set size using vector x(5); or whatever. A lot of game developers over-use vector because they think it's light-weight and efficient (it's not), so we'll do more points on :

3. vector<> can be really inefficient. When vector has to realloc it has to destruct all the originals and construct new copies, which can be really bad if the contained things do allocations. Most of the other containers don't have this flaw, so don't use vector<> on things that allocate. Or, make sure you reserve enough space so it doesnt have to realloc. People often write really bad code for building lists where they just start a vector and keep pushing onto it. vector<> also doubles when it has to grow which can be really dumb in some cases for games. A lot of people in games use vector<> when they're really just going to be adding things in one spot and never again, well vector's very heavy for that and the memory used could be close to 2X as much as actually necessary.

4. A pointer is a type of iterator, so all the cool algorithms that work on iterators work on pointers. So you can just use flat C arrays and still use the STL. In particular if you have a sorted array, you can use binary_search and such to do logN lookups and you don't need to bother with a map<> or whatever.

5. The string in the STL will do a ton of allocs, especially if you do something evil like vector< string >. If you're going to use strings much you really want a COW ref-counted string which will allows you to do things like sorting without a ton of allocs. BTW I personally still use COW strings in threaded apps & use the STL in non-threaded support mode in threaded apps. All communication between threads I do either with primitive types only or with manual protection. This is just because I'm a major threading paranoid lunatic and like to keep my threading as simple and contained as possible.

6. Override the std::swap and std::hash when appropriate. The STL algorithms for sorting and insertion and such make use of swap() to avoid allocs. The default swap() uses a temporary which is very evil if your object does allocs. Say for example you have a simple Buffer class which owns an m_ptr. The standard swap() will duplicate it, assign it, delete the temporary. You should replace it with a swap that just swaps the internal pointer. This is a huge performance issue if you ever try to sort or insert in a list of these things. Similarly for the non-standard hash_map , if you want to key your hash_map on anything other than an int it might do something very stupid unless you define your own hash() function which does something reasonable. (and even if you're just keying on an int, if you know the range the int takes you should use that info to define an optimal hash) (BTW see also Google's SparseHash and Super Fast Hash )

7. The STL containers and algorithms are very well implemented for the constraints that they are designed for, but if you don't care about those constrains you can obviously do better. Don't bother trying to replace them with some other totally generic container, but if you're doing something that doesn't match their constraints then by all means write your own thing. For example you might want some kind of container where you don't really care how long it takes to build but you want the lookups to be as fast as possible, well then you can easily beat the STL because their containers all limit insertion time.

8. If you're paranoid about performance like me, make all your constructors "explicit" so you don't get any implicit temporaries. (IMHO this is a good style thing to do regardless of performance, but the performance nazis also like it because it means there're no hidden constructors happening so they can sleep at night).

9. The return-value-optimization only happens when the function is inlined, so for functions where performance is important, either don't return by value, or use __forceinline and/or make sure the function is very simple. This goes for constructors as well. Also try to use the initializer list in constructors as much as possible, because it eliminates temporary initialization of those variables which may be expensive (eg. don't let things just default construct and then fix them in the code of your constructor). (BTW LTCG sort of makes this go away if you trust it to take care of all sorts of magic for you, but I prefer to not trust it and then if it can make things even better then that's awesome).

BTW , Won made an interesting reply and I made some revisions. He pointed me to this VList thing which is sort of interesting.

No comments:

old rants