Django Diaries / 30th May 2013

What an Operation

Much of the work on migrations so far has been laying a good bit of groundwork, but now everything is starting to come together into a working whole.

The most important thing to land this week is Operations - the things which migrations will be structured around. Here's an example of what a migration would look like:

from django.db import migrations, models

class Migration(migrations.Migration):

    dependencies = [("myapp", "0001_initial")]

    operations = [
        migrations.AddField("Author", "rating", models.IntegerField(default=0)),

        migrations.CreateModel(
            "Book",
            [
                ("id", models.AutoField(primary_key=True)),
                ("author", models.ForeignKey("migrations.Author", null=True)),
            ],
        )
    ]

As you can see, the Operations define what the migration does, and serve as the top-level element that allows more declarative migrations - as opposed to those in South, which were an opaque block of procedural code.

What's in a name?

But what exactly do Operations do? That's a relatively easy question to answer - they have an interface of exactly three methods.

The first, state_forwards, takes a project state (that's an in-memory representation of all of your models and fields at a point in time) and modifies it to reflect the type of change it makes - AddField, for example, would add a field to the in-memory model state.

The other two are database_forwards and database_backwards, which take a SchemaEditor (an object allowing database changes) and two project states - the before and after state for the operation, the after state having come from state_forwards above.

When the code runs a migration, it runs through and calculates a project state for every interim step between operations, by applying successive state_forwards functions down the entire dependency tree, and it can then supply each database run with both what it's working from, and what it is working towards, which helps greatly with some more complex operations.

A good example to look at is the field operations - they use both the "from" and "to" project states, but are still relatively simple.

Tying it all together

Of course, Operations live in migrations, so if we want to run them we need something that understands migrations. Fortunately, that's now in place - a class known as the Executor uses the existing loading and graph-resolving pieces to provide an end-to-end way of running migrations.

All you need to do is call migrate() with a list of target migrations, and it handles the rest. There's also a migration_plan() method if you want to know what it's about to do - useful for some tests and some user commands.

In fact, user commands are the next step. While the Executor certainly offers most of the functionality of running migrations, it's not exactly in an easy-to-use CLI format.

User commands

Traditionally, South has had the migrate command to allow users to interact with the migration system. The issue here is, of course, that Django has traditionally used syncdb to let users create their database and add any new models.

South overrides syncdb so it doesn't touch the migrated apps, but that then leaves you having to go and run migrate yourself, and if you run migrate before syncdb it'll fail with an error.

My plan to fix this involves deprecating syncdb and structuring everything around a much improved migrate command, which handles both unmigrated and migrated apps.

I'll also be introducing coloured output, since it's such an easy win in terms of readability, and I hope to introduce a "smart" mode, which will give you estimates about how long each migration will take and what sort of locking it will do, so you can plan which migrations to run when. It is tempting to just make that mode print "DON'T DO IT" if you're on MySQL, though.

I started a discussion on django-developers about these proposed command changes; you should read it and reply if you have opinions on how this should work in the future.

Next time on Migrations

Work will now shift to getting a reasonable command-line client up and running for applying and unapplying migrations, and when that's done, the final pillar needs tacking: autodetection.

Although some people find it hard to believe, South shipped without autodetection for quite a while, and consisted just of a schema backend and a migration runner (essentially more primitive versions of what I've built so far).

These days, however, autodetection is possibly the most important feature. Anyone can throw some schema code in a file and have it run; having your framework work out that code and write it for you is the key step in making something like this easy-to-use.

The Field API work from last time provides a good basis to this, though I'm still unsure how exactly to structure the detection logic - especially because there's call for fuzzy matching for things like field renames.

I imagine it's going to end up being a score-based system where the detector works out all possible approaches to get from schema A to schema B and picks the best one, but I'll have more thoughts on that next time.