Thursday, August 22, 2013

Mapped Types

I started working on implementing annotations shortly after making my previous post.  I implemented a couple of simple ones first, then started working on the Array annotation.  While getting it work for wrapped types, I realized that it would interact rather strangely with mapped types. I decided it would probably save myself some headaches later if I implemented mapped types now rather than later, and so I implemented them (also, I meant to do it early and forgot...)

The idea behind mapped types is that there are some C++ types that the library uses that Python programmers shouldn't need to worry about.  The big, obvious example is C++ strings.  So, the bindings should silently convert such types to and from Python types.  Of course, there is no way to automatically create the code to convert the mapped types; it has to be supplied to the generator.  The existing wxPython has hand written .sip files that contain the code.  Since the cffi generator doesn't have an intermediary format like the .sip files, it will have cffi-specific tweaker scripts that include the code.

For sip, the actual conversion code can be much simpler than it can for the cffi generator.  CPython's api means that one block of code can manipulate both the C++ objects and the Python objects.  Its not really possible to do the same thing in the cffi bindings.  Code for interacting with the mapped types directly from Python isn't generated because the user won't need to interact with the objects and because no information about the interface of the mapped type is provided to the generator.  So instead, the solution I came up with was to split the conversion (in each direction) into two parts: a to C conversion and a from C conversion.  The idea is that, using cffi, C data types can be an intermediary between the Python and C++ types.  This gives us Python->C and C->Python code that is written in Python and the C++->C and C->C++ code is written in C++.  Additionally, the to/from Python code is called only from Python and to/from C++ code is called only from C++ (with one exception,) reducing the total number boundary crossing.

An example of what the conversion code can look like for wxString:

//malloc must be used instead of new so that this data can be freed from Python
char *cstr = (char*)malloc(cpp_obj->length());
strcpy(cstr, cpp_obj->c_str());
return cstr;
//We don't have to free the cdata here because it was allocated from
return new wxString(cdata);
# Py2C
cdata ='char[]', obj)
# Py2C always returns two values: the actual cdata and a keepalive object. The
# latter is needed when creating, for example an array of strings
return (cdata, None)
# C2Py
obj = ffi.string(cdata)
# Explicit freeing is necessary here unfortunately
return obj

A couple of comments about the above code:  C->C++ code should always return a pointer to a heap allocated object so that the same block of code can be used even when the library will expect to take ownership of new object.  While using in the Python->C code will allow C->C++ to not have to do any freeing of memory, the C->Python code will always have to cleanup after C++->C code.

Of course, there are a few issues with this approach.  First of all, there is some overhead associated with allocating objects on the heap and then almost immediately freeing them.  But, compared the other alternative I came up with, this one involves fewer boundary-crossing and preforms much better.  The second problem arises from virtual methods messing things up a little.  In all other circumstances, the Python->C code can use to allocate the objects that get passed to the native function, but in a virtual function, the object has to be returned rather than passed as a parameter. This prevents us from doing the C->C++ conversion in the C++ virtual function.  By then, the Python-created c data will be out of scope and potentially garbage collected.  So in the virtual method handler, calling the C->C++ conversion from Python code is necessary (this is the exception I mentioned.)  The last issue, also related to virtual methods, is one that I fear is simply unsolvable.  When a virtual method returns a pointer to a mapped type, there is have no way of making sure that the object we allocate in the C->C++ conversion code is ever deleted.  I think this is a problem sip has too, though.

So, now that I have mapped types taken care of, I plan to continue working on implementing the various annotations.  Expect my next post to be a bunch about them.

No comments:

Post a Comment