Django Diaries / 9th May 2013

Starting Off

After a very successful Kickstarter, I had the unfortunate situation of a couple of successive trips abroad, and so initial work has been a bit more delayed than I would have liked. However, thanks to securing more time to work on the project every week, progress should be faster than planned from now on.

The plan is that these diaries will contain a rough summary of the work I've been doing; they're here both to help engage you (the slightly-too-interested public) in the work I'm doing, as well as providing some transparency.

If you want to hear more about a certain issue, feel free to get in touch with me - see the About page for my contact details. I'd love to explain as much as I can to those who are interested!

Laying the Groundwork

The first task I faced was to go back to my original Django branch and get it up-to-date with the changes in trunk. The only change that affected the schema work was Aymeric Augustin's transaction changes - he's gone in and fixed a lot of the transaction API and cross-database differences with things like autocommit.

As a result, I got to simplify my code somewhat: https://github.com/andrewgodwin/django/commit/6e21a594

After that, the next step was to go in and fix the issues other core developers had with AppCache in the previous release - in particular, the way I was abusing it to make new models at runtime. But first, let me explain a little bit about how AppCache works, for the uninitiated.

AppCache

Other responses may include "templates", "the URL dispatcher" or possibly just "everything"

Ask a core developer what part of Django they dislike most, and chances are good that AppCache will appear somewhere in that list. It's a very old part of Django, and responsible for both knowing what apps are available to the project as well as which models are available.

Django depends far too heavily on it - anything app-related in Django generally touches it, even if it has nothing to do with the ORM. That's a problem being solved by the app-loading branch, which has been going for quite a while but is ever so close to landing.

However, my issues lie elsewhere. The main problem is that any schema migration design is going to have to be able to make historical versions of models - if you have a data migration to run before a schema migration, that data migration needs old model classes as the tables won't yet match the schema your project currently has.

Alas, every time you make a new models.Model subclass in Django, an entry gets placed into the AppCache for that model. This is very useful - it's how ForeignKeys know how to find the other end of their relation, for example - but it means that if we're making three or four old versions of an Author model it's going to trample all over the AppCache and mess everything up.

Resistance is... fine, actually

For those completely unaware, the Borg are an alien race in Star Trek who all share a single hive mind.

Even more excitingly, the AppCache class uses what's known as the "Borg Pattern" - any instance of that class will share state. That means we can't just make a second AppCache to put temporary models in!

The work I did was in two parts: de-borgify AppCache, and allow a per-model app_cache option.

AppCache actually still uses the Borg pattern, I've just moved all the logic down into a BaseAppCache (along with a setting which means additional caches don't try and load models from every app). This means that my code can now just call:

new_app_cache = BaseAppCache()

I might tidy up the class name into something more suitable, we'll see.

The second change is an app_cache option for models:

new_app_cache = BaseAppCache()
class Author(models.Model):
    class Meta:
        app_cache = new_app_cache

This means you can now assign models to something other than the default AppCache when they're created. Obviously this isn't meant for end-users to develop against; it's so we can make models at runtime into a separate, sandboxed AppCache, with ForeignKey resolution between them still working, but no pollution of the global cache.

You can see most of the changes here: https://github.com/andrewgodwin/django/commit/104ad050 and https://github.com/andrewgodwin/django/commit/75bf394d

Graphs, Graphs Everywhere

Now the groundwork is laid and models are easily creatable at runtime, the next step is to move onto the migrator itself. This will eventually do three main jobs: parsing the available migrations into a big dependency graph, building up versioned models from those migration files, and running the migrations to change the database schema.

It's best to start at the base of all this, which is the dependency graph. This is what migration files get fed into as they're read off disk, and how we work out which migrations to apply to achieve our end goal.

South just takes the filename, ASCII-sorts them, and uses that as the dependency graph for an app.

I'm making a few changes compared to South's original model of this graph; in particular, there won't be implicit dependencies between adjacent numbers (the fact that 0004 depends on 0003 will be recorded in 0004's file) and it'll be possible to "rebase" an app's migrations (throw away historical ones and start afresh).

The numbering dependency decision is so VCS merges can be handled more gracefully - rather than just trying to see a "hole" in the dependency history, it'll be possible to detect that an app has two topmost migrations and prompt the user for action (either an automated rearrange to get a linear history or a manual merge).

The "rebase" operation allows an app with a large number (say, 100) of historical migrations to get a new initial migration added at point 100 - in a way where old installations that are still below the new migration continue to run the old migrations, but any new installation just comes in straight away at migration 100 and runs the initial migration (and then perhaps continues up to 101, 102, etc.).

Since publication, and some suggestions, I've settled on "squash" for the name of this command.

Confusingly, the VCS-merge-automatic-inlining mechanism I outlined above is analogous to what git rebase does, while the rebase command does nothing like it. It's probably worth thinking of a better name for "adding a new initial migration to make tests and new installs faster" - suggestions welcome to @andrewgodwin!

Work on this is going on right now - I've taken a break from it to write this diary - and so next time we'll revisit it and see how it progressed, and if any problems appeared (I'm sure some will).

Also, I'll be giving a talk at DjangoCon EU next week titled "Migrating The Future", with all this kind of detail and more - I hope to see some of you there!