Sunday, September 22, 2013

Wrapping up the summer

Unfortunately, I have been unable to completed the entire project.  Here's where things stand right now:  There are enough of the ETG (tweaker) scripts converted in order for the main module to be imported and for a number of the simple demos to run.  That said, a bunch of functionality is missing from the main module and none of the submodules (wx.html, wx.grid, etc.) work.  Converting the remainder of the ETG scripts is something that is just going to take more time.  Many of them are simple enough, but some of them have a lot of additions that use the C-API and will require a significant amount of time and work.

I've spent the last couple of days working on some documentation.  I've decided to focus on writing documentation for converting etg scripts rather than for the internals of the generator.  Although there are doubtlessly bugs in the generator yet to be fixed (I already know of at least a couple), I think most of the work that needs to be done and that is likely to be done by someone else is working with the ETG scripts.

I don't plan on abandoning this project. I want see this project through to its completion. I've asked permission to continue working on this project as a part of a class this semester.  I haven't gotten a response just yet, but I'm hopeful.

Wednesday, September 11, 2013

Sip annotations

I've finally, finished implementing all of the sip annotations that are currently used by wxPython Phoenix.  Barring some unexpected requirement (and doubtless there will be at least one) the generator itself should be complete.

To talk about sip annotations a bit: sip uses what it calls annotations, specified by extra code added to .sip files, to allow the default before of the generated bindings to be changed.  Annotations can applied to basically any C++ definition.  The ones that are relevant here are annotations for classes, methods, method parameters, and variables. 

An example of what an annotated function definition would look like in a .sip file (taken sip's documentation):

void exec(QWidget * /Transfer/) /ReleaseGIL, PyName=call_exec/;

Since the existing wxPython code is pretty heavily tailored to sip, the tweaker scripts reference these annotations directly.  This also makes sense, I suppose, since there isn't really a good, predefined way to specify the various behaviors of the annotations represent.  The problem, however, was that some of the annotations are poorly documented.

The worst example of an annotation being poorly documented is the Transfer annotation.  When applied to a function,Transfer indicates that the ownership of the return value is given to C++.  That much is pretty clear from the documentation anyway.  What is not documented is that when Transfer is applied to a Ctor, sip increments the refcount for the new object an extra time.  The effect of this is that the (Python) object will not be garbage collected until either its (hopefully virtual) C++ Dtor is called or the ownership of the object is somehow changed.

For a more mild example, I don't know how many times I read the documentation for the KeepReference code before I figured out how its 'keys' work. (And only after I implemented that functionality did I realize that wxPython never specifies the key to use...)


Moving forward, I now need to modify the tweaker scripts to make them compatible with the new generator.  My plan for this is to remove all of the scripts, and add them back one by one.  As I add them back, I plan on splitting any sip-specific and cffi-specific code into separate files which can be imported by the shared script depending on the generator being used.

To be perfectly honest, I don't expect to finish modifying all of the scripts before the 16th.  For now, I've settled on trying to have most of the core module working by then.  I wanted to be at this point about a month ago, but things always seem to take me longer than I plan for them too.

Thursday, August 22, 2013

Mapped Types

I started working on implementing annotations shortly after making my previous post.  I implemented a couple of simple ones first, then started working on the Array annotation.  While getting it work for wrapped types, I realized that it would interact rather strangely with mapped types. I decided it would probably save myself some headaches later if I implemented mapped types now rather than later, and so I implemented them (also, I meant to do it early and forgot...)

The idea behind mapped types is that there are some C++ types that the library uses that Python programmers shouldn't need to worry about.  The big, obvious example is C++ strings.  So, the bindings should silently convert such types to and from Python types.  Of course, there is no way to automatically create the code to convert the mapped types; it has to be supplied to the generator.  The existing wxPython has hand written .sip files that contain the code.  Since the cffi generator doesn't have an intermediary format like the .sip files, it will have cffi-specific tweaker scripts that include the code.

For sip, the actual conversion code can be much simpler than it can for the cffi generator.  CPython's api means that one block of code can manipulate both the C++ objects and the Python objects.  Its not really possible to do the same thing in the cffi bindings.  Code for interacting with the mapped types directly from Python isn't generated because the user won't need to interact with the objects and because no information about the interface of the mapped type is provided to the generator.  So instead, the solution I came up with was to split the conversion (in each direction) into two parts: a to C conversion and a from C conversion.  The idea is that, using cffi, C data types can be an intermediary between the Python and C++ types.  This gives us Python->C and C->Python code that is written in Python and the C++->C and C->C++ code is written in C++.  Additionally, the to/from Python code is called only from Python and to/from C++ code is called only from C++ (with one exception,) reducing the total number boundary crossing.


An example of what the conversion code can look like for wxString:

//Cpp2C
//malloc must be used instead of new so that this data can be freed from Python
char *cstr = (char*)malloc(cpp_obj->length());
strcpy(cstr, cpp_obj->c_str());
return cstr;
//C2Cpp
//We don't have to free the cdata here because it was allocated from ffi.new
return new wxString(cdata);
# Py2C
cdata = ffi.new('char[]', obj)
# Py2C always returns two values: the actual cdata and a keepalive object. The
# latter is needed when creating, for example an array of strings
return (cdata, None)
# C2Py
obj = ffi.string(cdata)
# Explicit freeing is necessary here unfortunately
clib.free(cdata)
return obj



A couple of comments about the above code:  C->C++ code should always return a pointer to a heap allocated object so that the same block of code can be used even when the library will expect to take ownership of new object.  While using ffi.new in the Python->C code will allow C->C++ to not have to do any freeing of memory, the C->Python code will always have to cleanup after C++->C code.

Of course, there are a few issues with this approach.  First of all, there is some overhead associated with allocating objects on the heap and then almost immediately freeing them.  But, compared the other alternative I came up with, this one involves fewer boundary-crossing and preforms much better.  The second problem arises from virtual methods messing things up a little.  In all other circumstances, the Python->C code can use ffi.new to allocate the objects that get passed to the native function, but in a virtual function, the object has to be returned rather than passed as a parameter. This prevents us from doing the C->C++ conversion in the C++ virtual function.  By then, the Python-created c data will be out of scope and potentially garbage collected.  So in the virtual method handler, calling the C->C++ conversion from Python code is necessary (this is the exception I mentioned.)  The last issue, also related to virtual methods, is one that I fear is simply unsolvable.  When a virtual method returns a pointer to a mapped type, there is have no way of making sure that the object we allocate in the C->C++ conversion code is ever deleted.  I think this is a problem sip has too, though.

So, now that I have mapped types taken care of, I plan to continue working on implementing the various annotations.  Expect my next post to be a bunch about them.

Thursday, August 8, 2013

Multimethods

The last week has felt a lot more productive then the previous few.  I've added a numberof small things to the generator, including methods with custom C++ code, (working) protected methods, and overloaded methods.  Overloaded methods presented kind of an interesting problem.  A bit of background:  the way I setup my multimethod code, overloads have their types specified in a decorator.  So the declaration of a multimethod with one overload looks like:
@wrapper_lib.MultiMethod
def func():
    """A multimethod."""

@func.overload(s=str)
def func(s):
    print s
(Side note: I'd like to point out how awesome inspect.getargspec is. It made writing the multimethod support code way easier.)

The problem:  the type for each variable has to already exist by the time the overload is created.  Now you might be thinking, like I was when I first realized this was going to be a problem, that you probably can sort things so that the declaration for every class comes before its required by some function.  But there is one situation that such a solution could never solve: copy constructors.  The copy constructor must have its own class as a type for its parameter.  So, what I ended up doing was to move the actual overloads of the multimethods to the end of the module, after every class has been created.  While this does impair the readability of the code some, I don't think anyone is likely to care much about it in this case.

So anyway, ...  As I add more functionality to the generator, I keep finding more things that I'll need to add later.  Something that dawned on me fairly recently that I'm slightly dreading is the prospect of adding support for all of the function and parameter sip annotations that wxPython uses.  I'll probably write more about this later when I start actually adding support for them, but I'm predicting that dealing with them and the interactions between multiple of them could be a very frustrating experience.

Wednesday, July 31, 2013

Progress update, virtual methods

I've been slowly adding features to the generator over the last week.  Its far from complete, but at the moment it works for generating classes and methods/functions.  Most recently I've been working on implementing overriding virtual methods with Python methods.  The implementation of this is  interesting, so I thought I talk about it a little.

The place to start, I suppose, is to explain why this is something that is useful.  The idea is that a subclass of a C++ class created in Python should be as useful as a regular C++ subclass.  Important to this is when the user overrides a method in the Python subclass, they should be able to expect it to act like they had overridden that method in a C++ subclass.  Which is to say that provided the method is virtual, when the method is called on the C++ instance from a base class pointer, the new implementation should be called.  An example is wxPython's App.OnInit (aka wxApp::OnInit.)  This method is not (typically) called by users themselves, but instead by the library.  It is intended to be overridden in a subclass to allow the user to perform some setup before the application starts.

The first step to implementing virtual methods is to generate a subclass of the C++ class we're wrapping.  When we create instances of the Python wrapper class, we'll create an instance of this subclass instead of the original C++ class.  The reason we want this subclass is so that we can re-implement every virtual method.  These re-implementations can then call Python functions via function pointers created using cffi.  The Python functions look up the Python wrapper object for the `this` pointer from the virtual method, convert the arguments into Python types, and then call the corresponding method on the Python wrapper.  The value returned by the Python callback is then converted as necessary and returned back to virtual method.

That seems simple enough, but we have a major problem: there is a bunch of overhead associated with calling a Python callback from C/C++ and with converting the arguments.  In fact, most of time the virtual methods aren't overridden and all of that overhead is for nothing.  Since we're already creating a subclass, we can add a set of flags - one for each virtual method - to track whether or not a method has been overridden in Python.  A virtual re-implementation can then simply call the base implementation of the method if the corresponding flag has not been set.  We can then use a metaclass to figure out which methods have been overridden by a Python subclass and build a set of default flags for instances of that subclass.

That handles the more C++-like use case of a subclass overriding a method from its superclass, but what about something a bit more Pythonic: replacing a method on a single instance. We already have most of what we need in place for this. We just need to update the flag when a Python method corresponding to a virtual method gets changed.  The solution that I ended up using was to use a descriptor to wrap the virtual methods.  When the descriptor's __set__ method is triggered, it sets the flag on the object, but only if the object was created by Python.  The reason for that last check is because we can't guarantee that a object that wasn't created by Python will have the flags field and if it doesn't, trying to set a flag could cause a segfault or could silently corrupt data.

The last thing, which I don't have implemented yet, is to handle when a method is changed on the Python class itself.  This is a little more complicated than the last case since we have to update the flag on every single instance of the class.  The solution here, I think, is to use a WeakSet to keep track of the instances.  When a method is changed, we could iterate over the WeakSet and change the flag on every active instance.  We can detect that a virtual method is being changed by using a __setattr__ method on the metaclass.  I haven't tested it yet, but I think that the overhead of adding instances to the set when they're created shouldn't be too large considering the other things that go on in the constructor anyway, such as calling the C++ constructor.

So, the next week or two, I'm going to be continuing to work on the generator.  I think after I put a few finishing touches on the virtual method handling code, I'll work on a few of the easier things like enumerators and global variables.

Tuesday, July 23, 2013

Progress Update

I've been working on the generator code this week, though I haven't made quite as much progress in the last week as I had hoped.  I ended up restarting a few times before I came up with a plan that I liked.  The problem was that way the build system is set up, each etg script is run separately, without interacting with one another at all.  This means that, for example, the generator handles the wxFrame class, it doesn't have any information about the wxWindow class (from which wxFrame derives.)  That's fine for the sip generator since the generators outputs intermediary files which the sip executable processes.  It's a problem for the cffi generator because it should output the actual C++ and Python files for the bindings.  So the solution I settled on was to pickle the output of the etg scripts, and do the actual generation code in a separate step.  By loading the pickles from scripts that are referenced in the one currently being processed, the generator can look up information about classes handled in other scripts or even other modules.

The next couple of weeks are going to be spent working on the generator script, I think.  The only real unsolved problem there, I think, is figuring out how to correctly order the definition of classes in the Python code for inheritance and default parameters, among other things.  I'll probably write about that in more detail next week when I know what I'll do about it.

Friday, July 12, 2013

Makeup of wxPython

This week I've started working on the generator script and I decided that for a bit of fun and to get a better idea of what the library looks like, I'd collect some statistics about the contents of the library:


Total Classes: 596
Total Methods: 8508
Virtual Methods: 1233
Virtual Methods (not Dtor): 1138
Overloaded Methods: 705
Overloaded Methods (not Ctor): 328
Average Number of overloads: 2.3914893617
Average Number of overloads (not Ctor): 2.46951219512
Global Variables:  590
Defines:  2544


A few notes about the numbers:

This data is based on the output of the tweaker scripts, which get their data from wxWidgets documentation.  As a result, undocumented classes and methods are not included.  I've also intentionally excluded everything that the tweaker scripts mark as 'ignored'.  I have not included pure Python classes or methods.

"Methods" in this context includes C++ methods that are added by the tweaker scripts and are not present in wxWidgets (for example wx.Menu.FindItemById.)

Virtual methods are somewhat under reported here.  The tweaker scripts turn off the virtual flag on a large number of methods that aren't likely to be overridden from Python. This helps keep the size of bindings down since extra code needs to be generated for virtual methods.


This basically confirms what was I had originally suspected about overloaded methods: the majority of them are constructors.  I did expect the average number of overloads to be at least three, though I'm not sure why.  The defines count is on the order of magnitude I expected, but also a little high.  The number of defines is so high because they include the definitions for the default window ids, the default events, and window styles options, among other things.

I'll probably be working on the generator script for the next couple of weeks.  I'll probably write something about that next week, I suppose.