Django Diaries / 25th Sep 2013

Down and Dirty

It's been a while since my previous Django blog post - sorry about that! A combination of DjangoCon in Chicago, and Lanyrd being acquired, means that I've spent rather a lot of time away from home in the last month.

Fortunately, I got some good work done at the DjangoCon sprints and I'm back on my regular schedule again, so let's take a look at what's gone on since last time (when I'd just merged to Django master).

Raw SQL

One of the questions I've been asked by numerous people is "will the new migrations support raw SQL files?"

It's an idea that I bandied around a year or two ago - that you could have both .py and .sql files in a directory, with consistent numbering, and the framework would run them in order seamlessly.

Unfortunately, because migration files now need dependency information in them, and because I don't want some nasty hack involving reading comments at the top of files, raw SQL files by themselves are no longer feasible. However, you can still run raw SQL in migrations with the new RunSQL operation!

It looks like this:

migrations.RunSQL(
    "CREATE TABLE postgres_rocks (id int, i_love_arrays int[])",
    "DROP TABLE postgres_rocks",
)

You just have to provide a forwards and backwards SQL snippet, and you're away. But that's not all - you can omit the backwards half if you want, and it'll become an irreversible operation (and be handled gracefully), and you can even set multiple=True to pass in multi-line statements:

migrations.RunSQL(
    """
        CREATE TABLE foo (id int, bar text);
        CREATE INDEX ON foo USING gin(bar);
    """,
    multiple = True,
)

This means it's now very easy to just add custom column or index types, or alter columns using a CAST.

But what about the state? If you remember, migrations now use a state built from the migrations themselves in order to calculate what models look like. Without descending into the seventh circle of Hell, there's no way Django can parse your SQL and work out what your models should be.

That's why there's also the state_operations parameter. While some RunSQL calls won't change state - for example, adding secondary indexes or populating a column - some will, and for those you can accompany the call with a set of other operations that represent how your models should react to this operation:

migrations.RunSQL(
    "CREATE UNIQUE INDEX CONCURRENTLY ON foo (bar)",
    state_operations = [
        migrations.AlterField("Foo", "bar", models.IntegerField(unique=True)),
    ],
)

I'll be writing up some full documentation for this as we near release, but it should give you some idea of how you can drop down to custom definitions while still maintaining autodetection.

SQL in, SQL out

But wait, there's more! It's all well and good being able to use SQL in migrations, but what if you work in an environment where your database is closely managed by operations people or DBAs? They're unlikely to just let Python code by some no-good, fresh-faced Django core developer run against their production database; they want the changes as SQL files!

Fortunately, I have the solution for you: the new sqlmigrate command. Pass it a migration and it'll print out the SQL that represents those operations:

$ ./manage.py sqlmigrate books 0001
CREATE TABLE books_book ("id" integer NOT NULL PRIMARY KEY, "title" varchar(255));

There are caveats; some operations, like RunPython (see below) can't be represented (it'll place a big comment warning you stuff might be missing), and if you're doing things like deleting indexes (which Django has to look up the names of) you need to run sqlmigrate against a connection to a copy of the production database (which is usually easy to achieve).

Python for all

The final change is the companion operation to RunSQL: RunPython. This operation takes arbitary Python code, gives it access to a models object which contains versioned models and a schema_editor object for direct database manipulation, and lets you do whatever you like:

operation = migrations.RunPython(
    """
    Book = models.get_model("books", "Book")
    Book.objects.create(title="Migrations Are Awesome", year=2014)
    Book.objects.filter(year=2006).delete()
    """,
)

This is the new way of doing data migrations, and should be a bit more robust than the old method, as the models object is a true Django AppCache object with relationships and dependencies all solved correctly with the same version models.

You still have to be careful not to import the real models directly from their files, however. In addition, if you change the database schema inside RunPython that won't affect the in-memory model states; there is, however, a SeparateDatabaseAndState operation that allows you to combine two sets of operations - one for database changes and one for state changes - that you can use to make things work as expected.

UPDATE: This now also takes callables (functions, lambdas, etc.) as well as code strings, after some feedback on Twitter.

What's next?

A decent selection of minor bugs are being discovered, which I'll be looking at fixing soon, as well as starting on the squash functionality - where you can take a big set of existing migrations and reduce them down to just one or two, optimising away things in the process.

Squash is the last major feature left to be implemented, so once that's done focus will turn to documentation and polish, as well as starting to look at South 2 - the backport of migrations for older Django versions. Exciting!