Wednesday, July 31, 2013

Progress update, virtual methods

I've been slowly adding features to the generator over the last week.  Its far from complete, but at the moment it works for generating classes and methods/functions.  Most recently I've been working on implementing overriding virtual methods with Python methods.  The implementation of this is  interesting, so I thought I talk about it a little.

The place to start, I suppose, is to explain why this is something that is useful.  The idea is that a subclass of a C++ class created in Python should be as useful as a regular C++ subclass.  Important to this is when the user overrides a method in the Python subclass, they should be able to expect it to act like they had overridden that method in a C++ subclass.  Which is to say that provided the method is virtual, when the method is called on the C++ instance from a base class pointer, the new implementation should be called.  An example is wxPython's App.OnInit (aka wxApp::OnInit.)  This method is not (typically) called by users themselves, but instead by the library.  It is intended to be overridden in a subclass to allow the user to perform some setup before the application starts.

The first step to implementing virtual methods is to generate a subclass of the C++ class we're wrapping.  When we create instances of the Python wrapper class, we'll create an instance of this subclass instead of the original C++ class.  The reason we want this subclass is so that we can re-implement every virtual method.  These re-implementations can then call Python functions via function pointers created using cffi.  The Python functions look up the Python wrapper object for the `this` pointer from the virtual method, convert the arguments into Python types, and then call the corresponding method on the Python wrapper.  The value returned by the Python callback is then converted as necessary and returned back to virtual method.

That seems simple enough, but we have a major problem: there is a bunch of overhead associated with calling a Python callback from C/C++ and with converting the arguments.  In fact, most of time the virtual methods aren't overridden and all of that overhead is for nothing.  Since we're already creating a subclass, we can add a set of flags - one for each virtual method - to track whether or not a method has been overridden in Python.  A virtual re-implementation can then simply call the base implementation of the method if the corresponding flag has not been set.  We can then use a metaclass to figure out which methods have been overridden by a Python subclass and build a set of default flags for instances of that subclass.

That handles the more C++-like use case of a subclass overriding a method from its superclass, but what about something a bit more Pythonic: replacing a method on a single instance. We already have most of what we need in place for this. We just need to update the flag when a Python method corresponding to a virtual method gets changed.  The solution that I ended up using was to use a descriptor to wrap the virtual methods.  When the descriptor's __set__ method is triggered, it sets the flag on the object, but only if the object was created by Python.  The reason for that last check is because we can't guarantee that a object that wasn't created by Python will have the flags field and if it doesn't, trying to set a flag could cause a segfault or could silently corrupt data.

The last thing, which I don't have implemented yet, is to handle when a method is changed on the Python class itself.  This is a little more complicated than the last case since we have to update the flag on every single instance of the class.  The solution here, I think, is to use a WeakSet to keep track of the instances.  When a method is changed, we could iterate over the WeakSet and change the flag on every active instance.  We can detect that a virtual method is being changed by using a __setattr__ method on the metaclass.  I haven't tested it yet, but I think that the overhead of adding instances to the set when they're created shouldn't be too large considering the other things that go on in the constructor anyway, such as calling the C++ constructor.

So, the next week or two, I'm going to be continuing to work on the generator.  I think after I put a few finishing touches on the virtual method handling code, I'll work on a few of the easier things like enumerators and global variables.

Tuesday, July 23, 2013

Progress Update

I've been working on the generator code this week, though I haven't made quite as much progress in the last week as I had hoped.  I ended up restarting a few times before I came up with a plan that I liked.  The problem was that way the build system is set up, each etg script is run separately, without interacting with one another at all.  This means that, for example, the generator handles the wxFrame class, it doesn't have any information about the wxWindow class (from which wxFrame derives.)  That's fine for the sip generator since the generators outputs intermediary files which the sip executable processes.  It's a problem for the cffi generator because it should output the actual C++ and Python files for the bindings.  So the solution I settled on was to pickle the output of the etg scripts, and do the actual generation code in a separate step.  By loading the pickles from scripts that are referenced in the one currently being processed, the generator can look up information about classes handled in other scripts or even other modules.

The next couple of weeks are going to be spent working on the generator script, I think.  The only real unsolved problem there, I think, is figuring out how to correctly order the definition of classes in the Python code for inheritance and default parameters, among other things.  I'll probably write about that in more detail next week when I know what I'll do about it.

Friday, July 12, 2013

Makeup of wxPython

This week I've started working on the generator script and I decided that for a bit of fun and to get a better idea of what the library looks like, I'd collect some statistics about the contents of the library:

Total Classes: 596
Total Methods: 8508
Virtual Methods: 1233
Virtual Methods (not Dtor): 1138
Overloaded Methods: 705
Overloaded Methods (not Ctor): 328
Average Number of overloads: 2.3914893617
Average Number of overloads (not Ctor): 2.46951219512
Global Variables:  590
Defines:  2544

A few notes about the numbers:

This data is based on the output of the tweaker scripts, which get their data from wxWidgets documentation.  As a result, undocumented classes and methods are not included.  I've also intentionally excluded everything that the tweaker scripts mark as 'ignored'.  I have not included pure Python classes or methods.

"Methods" in this context includes C++ methods that are added by the tweaker scripts and are not present in wxWidgets (for example wx.Menu.FindItemById.)

Virtual methods are somewhat under reported here.  The tweaker scripts turn off the virtual flag on a large number of methods that aren't likely to be overridden from Python. This helps keep the size of bindings down since extra code needs to be generated for virtual methods.

This basically confirms what was I had originally suspected about overloaded methods: the majority of them are constructors.  I did expect the average number of overloads to be at least three, though I'm not sure why.  The defines count is on the order of magnitude I expected, but also a little high.  The number of defines is so high because they include the definitions for the default window ids, the default events, and window styles options, among other things.

I'll probably be working on the generator script for the next couple of weeks.  I'll probably write something about that next week, I suppose.

Friday, July 5, 2013

Event Handling in wxPython

For the last several days I've been working on getting event handling working, so I figured I could tell you a bit about how event handling is implemented in wxPython.

wxWidgets Background

First of all some background about wxWidgets for those unfamilar.  (You can skip this section if you are familar at all with wxWidgets or wxPython)

Event handling in wxWidgets is more or less the same as in other GUI toolkit.  Events can be generated by the library, for example when the user clicks on a button or a timer fires, or can be created by the programmer.  The actual handling of the events occurs in wxEvtHandler, from which most of the widgets in wx are derive.  When a wxEvtHandler instance is created, a parent can be specified.  This will allow the parent to handle any events generated by the child (or its children) that it doesn't handle itself.  These events are manifested as C++ objects deriving from wxEvent and may hold details about the event that has occurred.

The programmer can specify an action for a wxEvtHandler to take when it encounters an event of a particular type (and optionally from a particular widget.)  This is described as connecting the handler to the event.  The action for the handler to take is specified in the form of a C++ callback.  Originally, the way to connect a handler to events was using compile time "event tables," which are limited to calling methods on the wxEvtHandler object they are defined on.  In modern versions of wxWidgets, it is possible to dynamically connect and disconnect a wxEvtHandler to an event at runtime. Additionally, arbitrary methods as well as functions and functors may be used as callbacks for these dynamic connections.

For a better/more detailed explanation see the wxWidgets documentation (or for the C++ averse the wxPython version)

In wxPython

For events to be useful to Python programmers, wxPython must these two things possible:  using arbitrary Python callables to handle events and creating new events.

The first feature is actually relatively straight forward for the existing wxPython bindings.  A C++ function is used as the callback that is passed to the library, which calls the Python callable. To get the pointer to the Python object to the C++ function, a C++ object that wraps the pointer is given as user data for the event.  wxWidgets makes this user data object available to the C++ callback that handles the event and takes ownership of the object, deleting it if/when the event is disconnected.

The second feature is a little more complex.  We want events created in Python to retain their Python attributes when they reach their callbacks.  (You can see an example of this here)  This sounds easy, but wxWidgets internally makes copies of the event objects. When the C++ event object is passed to the callback, it maybe at a different address than the original object, thus making it difficult to find the correct Python object to pass to the Python callable.

wxPython's solution to this is to have a PyEvent object that is Python aware and able to carry it's attributes through C++ unmolested.  It does this by storing a pointer to a Python dict object inside the C++ PyEvent object.  When the event object gets cloned, the pointer gets cloned as well (and its refcount gets incremented.)  The Python PyEvent  object defines __{set,get,del}attr__ methods that redirect to the aforementioned dict object. [1]  This way, even if the Python PyEvent objects are different and/or wrap different C++ objects, they will have the same attributes as far as the user is concerned.

Differences for wxPython-cffi

For the first feature, my solution is pretty much the same with one small twist:  PyPy doesn't have a C API that allows a Python object to be called from C++ like CPython does.  To cope with this, there are a couple of small changes that need to made.  First of all, instead of passing a pointer to the Python callable, we pass an handle created with ffi.new_handle()[2].  Second, we call a constant Python callback from the C++ callback that is connected to the event.  In this Python function we lookup the Python callable with ffi.from_handle(), create the event object, and then finally call the Python object.

This is an indication of what has been and I suspect will continue to be a theme in the project:  replacing C++ code with equivalent Python code

The second feature is a bit more difficult and I don't quite have a perfect solution right now.  What I have in place right now starts out similar to wxPython:  we hold a pointer inside the C++ PyEvent object.  Where its different is the pointer is in fact a wxSharedPointer and it points to a wrapper around a handle to a Python PyEvent object.  The idea here is that the wrapper has a destructor that will call a Python callback to release the reference to the handle so it can be garbage collected.  The wxSharedPointer is a refcounted autopointer and makes sure that the wrapper's dtor is only called when every PyEvent object pointing to it has been deleted.  The problem with this plan is the handle keeps the Python PyEvent object it represents alive, which in turn keeps the last C++ PyEvent object alive.  My thought process was that I wanted to be able to pass the original PyEvent object to the callbacks, bypassing the need to use a separate dictionary and the attribute access methods.

I'm still working out exactly how I'll handle this, but once I have it worked out, the only part of event handling left to work is releasing handles to Python callables once they've been disconnected.  I'll probably write another post once I know how that will work.

1. Its useful to note accessing the dictionary like this is possible because PyEvent, like the vast majority of wxPython types, is implemented in C and so can directly access the data members of the C++ objects it wraps.

2.  ffi.new_handle() is new in CFFI 0.7, which has yet to be released at the time of writing. See near the bottom of Misc methods on ffi in the CFFI documentation for more information.