Statically Dispatched Boxing of Tagged Data

The use of “tagged data” to represent types in interpreters of dynamically-typed languages (like Python, Ruby, Perl) is commonplace. A piece of tagged data (“box”) is a structure that holds an integer (“type tag”) that represents the object’s type, and a void* that points to the object’s value in the heap. Using the type tag, the interpreter does typechecking at runtime.

struct box_t
{
  int tag;
  void* obj;
};

Cutely, the process of making the value indirect via a pointer is called “boxing”; the process of retrieving a value given a pointer is called “unboxing”.

Recall we mentioned that values come from the heap. This means that every box operation needs to invoke some kind of dynamic memory allocator; every unbox operation needs to dereference a pointer in order to get at the value. Deallocating the object requires calling some kind of dynamic memory “deallocate” function.

The policy of allocating large values on the/some heap makes sense. On the other hand, if an object takes up at most sizeof(void*) bytes there’s little sense in using the heap at all: to access an object we need to read sizeof(void*) bytes to get the object’s address on the heap, and read sizeof(value) bytes from that address. If our object takes at most sizeof(void*) bytes we can just cram it in the field used for the void* and save a heap allocation, a potential cache miss and/or page fault (per access), and a heap deallocation. This has the effect of allocating objects that are “small enough” on the stack.

Since the average program uses many small objects (you’re much more likely to allocate small values like booleans and integers than not), the above optimization saves the interpreter a lot of work.

Now that we’ve gone over the basics, let’s see how all of this can be implemented in an interpreter, in C++.

Representing a box with a type tag and a union of a void*, a bool, etc. is an annoying solution — a box should know nothing about the types it may contain, and unions in C++ have some interesting (and dangerous) characteristics that we should ideally avoid. Even if we didn’t care about these issues, using a union only solves a typecasting problem; we’d also like the C++ compiler to automatically decide where to allocate a given type based on size.

This can be done with a bit of template metaprogramming. The term sounds fancy, but it really (roughly) means “use templates to make the compiler generate some boilerplate code.” Let’s start with a rudimentary template that allocates all objects on the heap:

template <class obj_type>
struct boxing_t
{
  static box_t box(const obj_type& obj)
  {
    box_t b;
    // set dynamic type tag...
    b.obj = new obj_type(obj);
    return b;
  }

  static const obj_type* unbox(const box_t& b)
  {
    return reinterpret_cast<const obj_type*>(b.obj);
  }
};

(Note that the above code makes the assumption that our type is copyable. In practice this is a bit restrictive, but irrelevant for the purposes of illustration.)

The above template does the basic task of boxing up a value by allocating it from the heap. It also has the nice property that if you feel like specializing the template on some type T, your specialization will have to implement both box and unbox functions; it is impossible for the compiler to “accidentally” choose box/unbox functions from different classes for any given type. This is static (compile-time) polymorphism.

Now let’s make this automatically generate box/unbox pairs for all types that are “small enough” and can be allocated on the stack. A variant of template specialization called “partial” specialization is used. The idea is that sometimes you want to specialize a class template for some but not all template parameters. We use partial specialization on a template parameter that decides whether or not the object can be allocated on the stack. The “true” case generates box/unbox functions that work with the stack, the “false” case generates box/unbox functions that work with the heap:

template <class obj_type,
          bool can_go_on_stack =
            sizeof(obj_type) <= sizeof(void*)>
struct boxing_t;

template <class obj_type>
struct boxing_t <obj_type, true> // can go on stack case
{
  static box_t box(const obj_type& obj)
  {
    box_t b;
    // set dynamic type tag...
    new (&b.obj) obj_type(obj); // placement-new to put object on stack
    return b;
  }

  static const obj_type* unbox(const box_t& b)
  {
    return reinterpret_cast<const obj_type*>(&b.obj);
  }
};

template <class obj_type>
struct boxing_t <obj_type, false> // can't go on stack case
{
  // same as body of heap-template given before.
};

Phew! That wasn’t so bad, hmm? From now on, whenever a new type is added to the interpreter (and to the language it is interpreting), the C++ compiler will automatically figure out where objects of that type should be allocated and generate the appropriate box/unbox functions — and we don’t have to do any work.

Before we celebrate there’s one more caveat: data alignment. By cramming small objects into the void* field we’re breaking strict aliasing rules, since it’s not guaranteed that the address of the void* field is aligned properly for all possible types we can cram into it. Therefore you’ll have to turn off strict aliasing in your compiler.

Finally, you’ll have to make sure that objects you cram into the void* field are “plain old data.” This can be checked at compile-time using TR1’s is_pod function (see TR1 for more details.) If you want to be able to cram nontrivial types into the void* field you’ll have to figure out a way to call the type’s destructor when your box gets destroyed.

To see Bee’s detailed implementation of this, look here.

Happy dispatching.

Bee Development