2/16/2013

02-16-13 - The Reflection Visitor Pattern

Recent blog post by Maciej ( Practical C++ RTTI for games ) set me to thinking about the old "Reflect" visitor pattern.

"Reflect" is in my opinion clearly the best way to do member-enumeration in C++. And yet almost nobody uses it. A quick reminder : the reflection visitor pattern is that every class provides a member function named Reflect which takes a templated functor visitor and applies that visitor to all its members; something like :


class Thingy
{
type1 m_x;
type2 m_y;

template <typename functor>
void Reflect( functor visit )
{
    // (for all members)
    visit(m_x);
    visit(m_y);
}

};

with Reflect you can efficiently generate text IO, binary IO, tweak variable GUIs, etc.

(actually instead of directly calling "visit" you probably want to use a macro like #define VISIT(x) visit(x,#x))

A typical visitor is something like a "ReadFromText" functor. You specialize ReadFromText for the basic types (int, float), and for any type that doesn't have a specialization, you assume it's a class and call Reflect on it. That is, the fallback specialization for every visitor should be :


struct ReadFromText
{
    template <typename visiting>
    void operator () ( visiting & v )
    {
        v.Reflect( *this );
    }
}:

The standard alternative is to use some macros to mark up your variables and create a walkable set of extra data on the side. That is much worse in many ways, I contend. You have to maintain a whole type ID system, you have to have virtuals for each type of class IO (note that the Reflect pattern uses no virtuals). The Reflect method lets you use the compiler to create specializations, and get decent error messages when you try to use new visitors or new types that don't have the correct handlers.

Perhaps the best thing about the Reflect system is that it's code, not data. That means you can add arbitrary special case code directly where it's needed, rather than trying to make the macro-cvar system handle everything.

Of course you can go farther and auto-generate your Reflect function, but in my experience manual maintenance is really not a bad problem. See previous notes :

cbloom rants 04-11-07 - 1 - Reflection
cbloom rants 03-13-09 - Automatic Prefs
cbloom rants 05-05-09 - AutoReflect

Now, despite being pro-Reflect I thought I would look at some of the drawbacks.

1. Everything in headers. This is the standard C++ problem. If you truly want to be able to Reflect any class with any visitor, everything has to be in headers. That's annoying enough that in practice in a large code base you probably want to restrict to just a few types of visitor (perhaps just BinIO,TextIO), and provide non-template accessors for those.

This is a transformation that the compiler could do for you if C++ was actually well designed and friendly to programmers (grumble grumble). That is, we have something like

template <typename functor>
void Reflect( functor visit );
but we don't want to eat all that pain, so we tell the compiler which types can actually ever visit us :
void Reflect( TextIO & visit );
void Reflect( BinIO & visit );
and then you can put all the details in the body. Since C++ won't do it for you, you have to do this by hand, and it's annoying boiler-plate, but could be made easier with macros or autogen.

2. No virtual templates in C++. To call the derived-class implementation of Reflect you need to get down there in some ugly way. If you are specializing to just a few possible visitors (as above), then you can just make those virtual functions and it's no problem. Otherwise you need a derived-class dispatcher (see cblib and previous discussions).

3. Versioning. First of all, versioning in this system is not really any better or worse than versioning in any other system. I've always found automatic versioning systems to be more trouble than they're worth. The fundamental issue is that you want to be able to incrementally do the version transition (you should still be able to load old versions during development), so the objects need to know how to load old versions and convert them to new versions. So you wind up having to write custom code to adapt the old variables to the new, stuff like :


if ( version == 1 )
{
    // used to have member m_angle
    double m_angle;
    visitor(m_angle);
    m_angleCos = cos(m_angle);
}
else
{
    visitor(m_angleCos);
}

now, you can of course do this without explicit version numbers, which is my preference for simple changes. eg. when I have some text prefs and decide I want to remove some values and add new ones, you can just leave code in to handle both ways for a while :

{

#ifndef FINAL
if ( visitor.IsRead() )
{
    double m_angle = 0;
    visitor(m_angle);
    m_angleCos = cos(m_angle);
}
#endif

visitor(m_angleCos);

}

where I'm using the assumption that my IO visitor is a NOP on variables that aren't in the stream. (eg. when loading an old stream, m_angleCos won't be found and so the value from m_angle will be loaded, and when loading a new stream the initial filling from m_angle will be replaced by the later load from m_angleCos).

Anyway, the need for conversions like this has always put me off automatic versioning. But that also means that you can't use the auto-gen'ed reflection. I suspect that in large real-world code, you would wind up doing lots of little special case hacks which would prevent use of the simple auto-gen'ed reflection.

4. Note that macro-markup and Reflect() both could provide extra data, such as min & max value ranges, version numbers, etc. So that's not a reason to prefer one or the other.

5. Reflect() can be abused to call the visitor on values that are on the stack or otherwise not actually data members. Mostly that's a big advantage, it lets you do converions, and also serialize in a more human-friendly format (for text or tweakers) (eg. you might store a quaternion, but expose it to tweak/text prefs as euler angles) (*).

But, in theory with a macro-markup cvar method, you could use that for layout info of your objects, which would allow you to do more efficient binary IO (eg. by identifying big blocks of data that can be read in binary without any conversions).

(* = whenever you expose a converted version, you should also store the original form in binary so that write-then-read is a gauranteed nop ; this is of course true even for just floating point numbers that aren't printed to all their places, which is something I've talked about before).

I think this potential advantage of the cvar method is not a real advantage. Doing super-efficient binary IO should be as close to this :


void * data = Load( one file );
GameWorld * world = (GameWorld *) data;

as possible. That's going to require a whole different pathway for IO that's separate from the cvar/reflect pathway, so there's no need to consider that as part of the pro/con.

6. The End. I've never used the Reflect() pattern in the real world of a large production codebase, so I don't know how it would really fare. I'd like to try it.

13 comments:

PeterM said...

Hello,

Ron Pieket (Insomniac) covered a few game data loading approaches with respect to versioning amongst other things in his GDC 2012 talk "Developing Imperfect Software", and came up with/described what I think is a pretty good solution. It's probably worth a watch if you haven't already:

http://www.itshouldjustworktm.com/?p=652

Cheers,
Peter

Fabian 'ryg' Giesen said...

That task focuses on serialization, and unless I misunderstood what he's saying, what he describes is basically the serialization system that RAD's Granny has been shipping since 2002 or so. :)

cbloom said...

Yeah, I just had a quick scan (god damn don't put information in video form. text text text) and it certainly sounds just like Granny.

The granny-style struct markup system is okay. I think the main advantage of it is that you can find the values that need to be endian-fixed and do an efficient pass of endian fixing and then just point at the struct blocks.

As for the Insomniac argument, I don't really see the win. To be clear, the alternative to the Insomniac way is :

You have some loose "source" data files (certainly not XML, wtf, but some kind of simple always-readable format).

You build fast-load binary variants from the source data. If the binary is up to date you "load and go" it, otherwise you load the source and parse it and generate a new binary (so that your next load will be fast).

The Insomniac way adds the ability to load out of date binaries without going all the way back to a source parse.

That sounds nice, but it also adds a whole extra pathway that you have to maintain, and you can introduce weird heisen-bugs due to the old-binary loading in a weird way (that you don't get when you bake new binaries from source data).

I dunno, maybe I could be convinced, but it certainly doesn't seem like a compelling win for so much added complexity.

Unknown said...

There was a proposed boost library (that sadly didn't make it very far) that was auto implementing this pattern, including taking care of inheritance. It's called boost.reflect (it did many other things as well). It can be found here https://github.com/bytemaster/boost_reflect.

All you had to do was call a macro in the form:

BOOST_REFLECT(Class, (base1)(base2)(...), (member1)(member2)(...))

And it would generate the visitor for the class. You could call it this way:

boost::reflect::visitor::visit( myVisitor ) ;


I used it in a full size project (~60k lines) and it worked like a charm. I guessed it did impact compilation times but I have never bothered measuring. I even wrote a small pre compiler to auto generate the macros in the simple cases to make sure I wouldn't forget adding new members and relieving the pain of having to write all those macros.

My favorite usage was to dumps my objects to and from CSV files, and I had a lot of success with a visitor that converted endianness.


A last thing worth noting is that the C++ standard committee has appointed a study group on reflection (SG7). I can't wait to see what will come out of it.

NeARAZ said...

This is very much how Unity's "serialization" system works (we call them "Transfer" instead of reflect, but same idea). And indeed that is used for both saving/loading and automatically building UI for editing objects (the UI recognizes some built-in types, e.g. a color member will automatically get a color picker in the UI etc.).

Couple points:

"1. Everything in headers" - yeah that's annoying. We kind of work around it by routing through virtual calls, which kind of sucks but we accept the hit.

So we have several macros, in header you do DECLARE_OBJECT_SERIALIZE() which does virtual void VirtualRedirectTransfer(BinaryRead&); virtual void VirtualRedirectTransfer(BinaryWrite&); etc.

And then in the .cpp file you do IMPLEMENT_OBJECT_SERIALIZE which does implementation of each of these functions that calls into the template one. And that single template one is defined in .cpp file, plus the macro explicitly instantiates the template with the transfer functors.


"3. Versioning" - what we have is, we have a "safe binary read" serializer as well as "binary read" one. The "safe" one can handle missing or added fields, and is always used while developing (in the editor). So if you just want to add a new field to a class, you just add it and call "transfer" on it. The other reader function is much simpler, can't handle any missing/added fields and is used when reading final baked game data.

In addition to that, the classes can also have a version number, for things where you really change something significantly.

NeARAZ said...

...oh, and exactly same system is also used for things like "gather dependencies" - e.g. when building a game data file, we only include the assets that are actually referenced by something. That's just one more "transfer functor" that adds any "persistent pointer" members to the set of objects.

MH said...

Oh hey. I realized you could add an additional parameter to the macro that represent metadata.

So, you could have:
REFLECT( , AttVersion(3).AttNetwork() );

This allows lots of cool things that I had thought of as weaknesses of the C++ approach.

MH said...

Blogger ate part of my comment

REFLECT( {thing}, AttVersion( 3 ), AttNetwork() );

cbloom said...

@MH - yeah, I always see that discussed as an advantage of the macro-markup way (that you can add various bits of extra metadata), but of course they can just be more args on the reflection visitor that are just ignored when they don't apply.

cbloom said...

@Nearaz - good to know, that's how I imagined it would be used.

It sounds like you don't even have a "flat load" path, that you either load with slow/checked transfer or with fast/unchecked transfer?

For posterity, in case this wasn't clear :

In the main post I was assuming that IO through Reflection was "tagged"; that is you write something like [var name][var value] , so that rearranging variables and such doesn't break the file format. (of course var name can be a 4-byte hash or whatever, you don't have to do slow text IO).

But you can also do faster untagged/unchecked IO when you know the binary matches the code perfectly, just read/write the data without checking tags.

Per Vognsen said...

"And yet almost nobody uses it."

Except every game that has used any version of the Unreal Engine? They use it for everything from serialization to garbage collection.

NeARAZ said...

@cbloom: oh no, for checked vs. non-checked we do it differently.

In the checked/slow part, a data file also includes the "type tree" - kind of RTTI info for each type of object that is serialized in the file (basically name & type pairs). Then the data itself is just values (advantage: names&types only stored once per class, instead of in each data instance).

In the unchecked/fast part (used when you know that code that built the data is exactly the same version as code that's reading the data), this "type tree" information does not exist, and the data is just data. Reading the data, transfer(&myint,"myint") just ignores the name and fetches 4 bytes at the current read offset.

All the above is used for object-like data that has references to other objects etc. Large blob-like data (texture pixels, mesh data, audio etc.) go through another system that operates on just blobs of bytes.

Mark Lee said...

"The Insomniac way adds the ability to load out of date binaries without going all the way back to a source parse."

That's huge though. The promise of forward and backward compatibility between different versions of data and executables takes on more value as teams grow big and many people are in the code, potentially tweaking file formats. I'm not sure if insomniac implemented this system in the end, but the promise of not having to wait around for new builds of levels just because you sync'ed code does sound very appealing in theory. Granted, in practice may be another matter.

@NeARAZ: The description of the unity system is starting to sound quite granny like - development only schemas which are dumped for final builds.

old rants