2/02/2009

02-01-09 - Swap and Templates Part 2

So I'm back into the nightmare of std::swap and template overloading. I've been working on my cb::hash_table a bit, which I built on cb::vector so I'm seeing a lot of stuff with vector. One issue is when the vector changes capacity, you have to copy all the data over and delete the old ones. The normal way to implement vector is to do this with a bunch of uninitialized copy constructor calls, then call the destructor on all the old data. But that copy can be very expensive, and worse if your objects are large in memory it means you can temporarily double your memory use.

So I had the idea to use swap. When the vector capacity changes, first go ahead and default construct everything in the new space so it's valid. Then ::swap the new to the old, and then destruct the old vector. This is of course actually not ideal, what you really want is a single call that would "swap_construct" and object so that you don't have to default construct anything. But anyhoo, for many types of data it's easy to make a very cheap default constructor and a very fast swap.

Apparently this idea is old, and in fact newer versions of the Dinkum STL do this, they call it Swaptimization which you can find on that page if you search for swap. BTW it's also obvious that sometimes swapping is worse, eg. for vector< int > - but it's never much worse. If you wanted maximimum speed on silly cases like that, you could use a type-traits template to choose whether the class prefers swapping or copying. Actually in my old versions of my own vector I had a lot of type-traits to select the most efficient operation, but I've ripped that all out now. I find it too fragile and just also not important. (note that most STL implementations now use type traits for acceleration; I think this is really loathsome, because it's not exposed in a standard way, so you can't get to it for your own types - it makes them fact on basic types and their own types, and slow on your types, which blows - the whole idea of generic programming is that you can adapt it to work on your types, but the standards people have recently started giving special treatment to things in the STL, for example the STL gets different name lookup rules now (wtf)). (for type traits in the STL see for example stuff like "_IsOKToMemCpy" and "has_trivial_destructor" in STLport.)

Anyhoo, the problem now is that you need to make sure the right swap() is called. Ruh roh. You can't really correct overload std::swap in the std namespace ; see for example : STL defects #225-227 or PDF post about swap problems by Alan Griffiths . or C++ Standards - The "swap" Problem

So, what might you do instead? One option is to define your own swap() in your own namespace, and rely on the Koenig lookup to find your swap (cuz your objects are in your namespace). That doesn't work, partly because the STL calls std::swap explicitly. I'm not sure why they do that instead of just calling swap(). Apparently new MSVC uses ADL_Swap or some funny mumbo jumbo to try to fix this.

In any case, I have a hacky solution that I'm using for now.

I go into the STL headers to the definition of std::swap. It should look something like this :

   template < class _Tp >
   inline void swap(_Tp& __a, _Tp& __b) {
     _Tp __tmp = __a;
     __a = __b;
     __b = __tmp;
   }   

I change that to :

   template < class _Tp >
   inline void swap(_Tp& __a, _Tp& __b) {
     MySwap(a,b);
   }   

Now in my own code I have MySwap :

template < typename Type >
struct swap_functor
{
   void operator () (Type &a,Type &b)
   {
      Type c = a; a = b; b = c;
   }
};

template< typename Type > inline void MySwap(Type &a,Type &b)
{
   swap_functor< Type >()(a,b);
}

Which redirects MySwap to a functor, and the default implementation is the usual swap.

The reason to go to a functor is that you can take advantage of partial specialization due to the stronger support for class templates as opposed to function templates. Thus you can easily partial-specialize to all types of vectors (for example) :

template < class t_entry > 
struct swap_functor< cb::vector< t_entry > >
{
   void operator () ( cb::vector< t_entry > & _Left, cb::vector< t_entry > & _Right)
   {
      _Left.swap(_Right);
   }
};

The reason we do the redirects like this is that when you call anything in std::algorithms, they will call std::swap. First of all they will call the STL-provided overrides for std::swap for various STL types (like std::vector and std::string). Then they will call to the default, which calls MySwap. MySwap will go to the specializations that I wanted for my types. Finally it will fall back to the default swap.

This is the only way I know of to get specialization for both my types and the STL types. Note that to get the specializations on the std types, you have to call std::swap() - it's the "outermost" swap, and you always need to call to the leaves.

Now, for example, if I wanted to also get somebody else's library integrated, call it "somelib" that implements "somelib::swap" , I would do all the above, but the cb::swap_functor default implementation would be changed to call somelib::swap. Again everyone should call std::swap to get all the specializations.

Obviously this is not ideal because it involves digging into other people's libraries.

BTW using Swap in vectors is obviously semantically much better, even if it isn't an optimization. If you consider default constructed objects to be "nulls" and objects that have been set up and pushed into a vector to be "initialized", then the standard copy way of growing a vector temporarily makes a bunch of new "initialized" objects. The swap way just moves them and doesn't temporary change the number. A common case is stuffing something like a shared_ptr ref-counted pointer into a vector - the swap way keeps the ref counts all invariant, while the copy way temporarily bumps the refs up then back down again.

BTW one thing I noticed while testing this is that std::sort doesn't seem to use swap (!?). I always thought it did. Other algorithms, like std::reverse and std::random_permutation do call std::swap. sort seems to invoke the copy constructor and assignment a lot. The RAD version rr::sort does work entirely through swap, which means it should beat std::sort very trivially in bad cases, like vector< vector< int > >

ADDENDUM : thanks to the commenters; I think what I like best is to make my swap_functor default to doing a *byte* swap. Then any object which wants to be swapped and can't stand to be byte swapped (very rare) would override swap_functor. Doing it that way means almost nobody ever has to override swap_functor, it just uses the default (byte swap) almost always.

BTW the cool way to byte swap is something like this :


#pragma pack(push)
#pragma pack(1)
template < int n_count >
  struct Bytes { char bytes[n_count]; };
#pragma pack(pop)

template < typename t_type >
void ByteCopy(t_type * pTo,const t_type & from)
{
   typedef Bytes< sizeof(t_type) > t_bytes;

   *(reinterpret_cast< t_bytes * >(pTo)) = reinterpret_cast< const t_bytes & >(from);
}

template < typename t_type >
void ByteSwap(t_type & a,t_type & b)
{
   typedef Bytes< sizeof(t_type) > t_bytes;

   t_bytes c; // don't use t_type cuz we don't want a constructor or destructor
   ByteCopy(&c,reinterpret_cast< const t_bytes & >(a));
   ByteCopy(&a,b);
   ByteCopy(&b,reinterpret_cast< const t_type & >(c));
}

The nice thing about this is for small types the compiler does the right thing and just generates mov instructions. For large types you would want to call a memswap function, but you would never do this on large types so punt.

6 comments:

Arseny Kapoulkine said...

Back then when I had my own resizable vector (now I don't use one, except for tools where it's std::vector), I moved objects via simple memcpy (and did not call destructor on old chunk of memory). Sure, it's kind of UB, but I've never seen a real-world class that does not work with this approach on platforms I care about.

cbloom said...

Hmm.. interesting. At first I was like "wat" but you're right, just moving the bytes like that is going to work for 99% of data.

The only time it doesn't work is if somebody is keeping pointers into himself (weird and rare) or if someone has a pointer registered with someone else that points back at you. In those cases you usually have a RefCounted or some kind of Handle system, and the vector actually holds a shared pointer (which is copyable) not the actual object (whose address matters).

You could easily make that clean by using type traits. You could make the default option be "bit copy" in the type traits, and require people who want to be "assignment copied" to set up the traits for their type to indicate that.

MH said...

Yep, I just suggested this same solution to someone for their .erase(). Just memcopy or regular slow copy the item into a temp var, then memmove item + 1 through end into item + 0. Then destruct the temp var.

(The order of the vector mattered, but constructs, destructs were touching many random places in memory)

cbloom said...

Word.

I've been thinking about this; I used to have a type traits in galaxy to indicate that a type was "memcopyable" in which case I would use memcpy instead of operator =.

This is not quite the same thing, this is a question of whether the object is "relocatable". You can have lots of objects that are relocatable but not memcopyable. (for example auto_ptr,shared_ptr, etc. are relocatable but not memcopyable).

In fact I think if you are doing a custom game-STL type of thing it probably is reasonable to constrain your containers to only be able to hold relocatable objects.

Almost any time you actually have something that's not relocatable, you don't jam it in a container, you hold it with some kind of pointer.

MH said...

The code in .erase in a vector|T| is relocating each element above iterator it down by sizeof( T ), unless Im missing something.

cbloom said...

No, it's not just relocating down. According to the standard, it calls the destructor of the element being erased and then operator= to slide the others down.

STLport for example uses __type_traits::has_trivial_assignment_operator to switch to memmove in many cases.

old rants