Thursday, August 22, 2013

Mapped Types

I started working on implementing annotations shortly after making my previous post.  I implemented a couple of simple ones first, then started working on the Array annotation.  While getting it work for wrapped types, I realized that it would interact rather strangely with mapped types. I decided it would probably save myself some headaches later if I implemented mapped types now rather than later, and so I implemented them (also, I meant to do it early and forgot...)

The idea behind mapped types is that there are some C++ types that the library uses that Python programmers shouldn't need to worry about.  The big, obvious example is C++ strings.  So, the bindings should silently convert such types to and from Python types.  Of course, there is no way to automatically create the code to convert the mapped types; it has to be supplied to the generator.  The existing wxPython has hand written .sip files that contain the code.  Since the cffi generator doesn't have an intermediary format like the .sip files, it will have cffi-specific tweaker scripts that include the code.

For sip, the actual conversion code can be much simpler than it can for the cffi generator.  CPython's api means that one block of code can manipulate both the C++ objects and the Python objects.  Its not really possible to do the same thing in the cffi bindings.  Code for interacting with the mapped types directly from Python isn't generated because the user won't need to interact with the objects and because no information about the interface of the mapped type is provided to the generator.  So instead, the solution I came up with was to split the conversion (in each direction) into two parts: a to C conversion and a from C conversion.  The idea is that, using cffi, C data types can be an intermediary between the Python and C++ types.  This gives us Python->C and C->Python code that is written in Python and the C++->C and C->C++ code is written in C++.  Additionally, the to/from Python code is called only from Python and to/from C++ code is called only from C++ (with one exception,) reducing the total number boundary crossing.

An example of what the conversion code can look like for wxString:

//malloc must be used instead of new so that this data can be freed from Python
char *cstr = (char*)malloc(cpp_obj->length());
strcpy(cstr, cpp_obj->c_str());
return cstr;
//We don't have to free the cdata here because it was allocated from
return new wxString(cdata);
# Py2C
cdata ='char[]', obj)
# Py2C always returns two values: the actual cdata and a keepalive object. The
# latter is needed when creating, for example an array of strings
return (cdata, None)
# C2Py
obj = ffi.string(cdata)
# Explicit freeing is necessary here unfortunately
return obj

A couple of comments about the above code:  C->C++ code should always return a pointer to a heap allocated object so that the same block of code can be used even when the library will expect to take ownership of new object.  While using in the Python->C code will allow C->C++ to not have to do any freeing of memory, the C->Python code will always have to cleanup after C++->C code.

Of course, there are a few issues with this approach.  First of all, there is some overhead associated with allocating objects on the heap and then almost immediately freeing them.  But, compared the other alternative I came up with, this one involves fewer boundary-crossing and preforms much better.  The second problem arises from virtual methods messing things up a little.  In all other circumstances, the Python->C code can use to allocate the objects that get passed to the native function, but in a virtual function, the object has to be returned rather than passed as a parameter. This prevents us from doing the C->C++ conversion in the C++ virtual function.  By then, the Python-created c data will be out of scope and potentially garbage collected.  So in the virtual method handler, calling the C->C++ conversion from Python code is necessary (this is the exception I mentioned.)  The last issue, also related to virtual methods, is one that I fear is simply unsolvable.  When a virtual method returns a pointer to a mapped type, there is have no way of making sure that the object we allocate in the C->C++ conversion code is ever deleted.  I think this is a problem sip has too, though.

So, now that I have mapped types taken care of, I plan to continue working on implementing the various annotations.  Expect my next post to be a bunch about them.

Thursday, August 8, 2013


The last week has felt a lot more productive then the previous few.  I've added a numberof small things to the generator, including methods with custom C++ code, (working) protected methods, and overloaded methods.  Overloaded methods presented kind of an interesting problem.  A bit of background:  the way I setup my multimethod code, overloads have their types specified in a decorator.  So the declaration of a multimethod with one overload looks like:
def func():
    """A multimethod."""

def func(s):
    print s
(Side note: I'd like to point out how awesome inspect.getargspec is. It made writing the multimethod support code way easier.)

The problem:  the type for each variable has to already exist by the time the overload is created.  Now you might be thinking, like I was when I first realized this was going to be a problem, that you probably can sort things so that the declaration for every class comes before its required by some function.  But there is one situation that such a solution could never solve: copy constructors.  The copy constructor must have its own class as a type for its parameter.  So, what I ended up doing was to move the actual overloads of the multimethods to the end of the module, after every class has been created.  While this does impair the readability of the code some, I don't think anyone is likely to care much about it in this case.

So anyway, ...  As I add more functionality to the generator, I keep finding more things that I'll need to add later.  Something that dawned on me fairly recently that I'm slightly dreading is the prospect of adding support for all of the function and parameter sip annotations that wxPython uses.  I'll probably write more about this later when I start actually adding support for them, but I'm predicting that dealing with them and the interactions between multiple of them could be a very frustrating experience.