2013-02-26

South migration with dynamically generated data

When you add a new non-null field in Django models, and you use South to migrate the schema, it usually gives you two options: either add the default value for the field, or use a simple one-off value as default. This works as far as the new field is non-unique. If you are dealing with unique fields, you cannot provide a simple one-off value because it would immediately cause the uniqueness constraint to be broken, and having a default value for a unique field doesn't make any sense. Fortunately, South migrations are written in Python, and it's trivial to add custom dynamically generated data. This post will demonstrate the process of adding a unique slug field to an existing model and migrating the schema using customized migration.

Note that this article assumes that you will be using Django 1.5 which has just been released.

Preparing the project

Let's start a by creating a new project called 'foobar'. You can skip this section if you want to get right to the juicy bits. It's only meant as a recap for people (relatively) new to South. We'll go ahead and routinely create a virtualenv and a project.

code/tests % virtualenv foobarvenv
New python executable in foobarvenv/bin/python2.7
Also creating executable in foobarvenv/bin/python.exe
Installing setuptools............done.
Installing pip...............done.
code/tests % mkdir foobar
code/tests % cd foobar
tests/foobar % source ../foobarvenv/bin/activate
(foobarvenv)tests/foobar % 

Next, install django and South packages using pip.

(foobarvenv)tests/foobar % pip install django South
Downloading/unpacking django
  Downloading Django-1.5.tar.gz (8.0MB): 8.0MB downloaded
  Running setup.py egg_info for package django

Downloading/unpacking South
  Downloading South-0.7.6.tar.gz (91kB): 91kB downloaded
  Running setup.py egg_info for package South

Installing collected packages: django, South
  Running setup.py install for django
    changing mode of build/scripts-2.7/django-admin.py from 644 to 755

    changing mode of /home/branko/code/tests/foobarvenv/bin/django-admin.py to 755
  Running setup.py install for South

Successfully installed django South
Cleaning up...

Don't forget to install the appropriate database drivers. You absolutely must use the same database engine you intend to use in production if you want reliable migrations. In my case, I'm using Postgres, so I also need a psycopg2 package. Prepare a database (optional) if you are using Postgres or MySQL. Be sure to edit all the database settings to match your database. This is all outside the scope of this article, so I won't go into details.

Start a new project:

(foobarvenv)tests/foobar % django-admin.py startproject foobar .

Edit the settings.py and add 'south' to installed apps, and then run syncdb command once so that South tables are created.

(foobarvenv)tests/foobar % python manage.py syncdb
Syncing...
Creating tables ...
Creating table auth_permission
Creating table auth_group_permissions
Creating table auth_group
Creating table auth_user_user_permissions
Creating table auth_user_groups
Creating table auth_user
Creating table django_content_type
Creating table django_session
Creating table django_site
Creating table south_migrationhistory

You just installed Django's auth system, which means you don't have any superusers defined.
Would you like to create one now? (yes/no): no
Installing custom SQL ...
Installing indexes ...
Installed 0 object(s) from 0 fixture(s)

Synced:
 > django.contrib.auth
 > django.contrib.contenttypes
 > django.contrib.sessions
 > django.contrib.sites
 > django.contrib.messages
 > django.contrib.staticfiles
 > south

Not synced (use migrations):
 -
(use ./manage.py migrate to migrate these)

Our model and initial migration

We need an app and a model for this project, so let's create an app called 'foos', and a model 'Foo' with a single field called 'name'.

(foobarvenv)tests/foobar % python manage.py startapp foos

Edit the models.py to look like this:

from django.db import models

class Foo(models.Model):
    name = models.CharField(max_length=100)

Add the new app to INSTALLED_APPS, and create the first (initial) migration.

(foobarvenv)tests/foobar % python manage.py schemamigration foos --init
Creating migrations directory at '/home/branko/code/tests/foobar/foos/migrations'...
Creating __init__.py in '/home/branko/code/tests/foobar/foos/migrations'...
 + Added model foos.Foo
Created 0001_initial.py. You can now apply this migration with: ...[SNIP]

Let's migrate the schema as instructed:

(foobarvenv)tests/foobar % python manage.py migrate foos
Running migrations for foos:
 - Migrating forwards to 0001_initial.
 > foos:0001_initial
 - Loading initial data for foos.
Installed 0 object(s) from 0 fixture(s)

We will also need some data for this demonstration, so let's drop into the shell, and add some manually.

(foobarvenv)tests/foobar % python manage.py shell
Python 2.7.3 (default, Dec 18 2012, 13:50:09)
[GCC 4.5.3] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from foos.models import Foo
>>> for name in ['foo', 'bar', 'fam', 'bam']:
...     Foo.objects.create(name=name)
...
<Foo: Foo object>
<Foo: Foo object>
<Foo: Foo object>
<Foo: Foo object>
>>> Foo.objects.count()
6
>>>

Adding the slug field and migration

We now have six records in the database, and we want to add the slug field non-destructively. The obvious thing to take care of first is the actual model. So let's edit it and add the slug field. The new model should look like this:

from django.db import models
from django.template.defaultfilters import slugify

class Foo(models.Model):
    name = models.CharField(max_length=100)
    slug = models.SlugField(max_length=100, unique=True)

    def save(self, *args, **kwargs):
        self.slug = slugify(name)
        super(Foo, self).save(*args, **kwargs)

Let's first create a migration normally, and specify a one-off default.

(foobarvenv)tests/foobar % python manage.py schemamigration foos --auto
 ? The field 'Foo.slug' does not have a default specified, yet is NOT NULL.
 ? Since you are adding this field, you MUST specify a default
 ? value to use for existing rows. Would you like to:
 ?  1. Quit now, and add a default to the field in models.py
 ?  2. Specify a one-off value to use for existing columns now
 ? Please select a choice: 2
 ? Please enter Python code for your one-off default value.
 ? The datetime module is available, so you can do e.g. datetime.date.today()
 >>> 'foo'
 + Added field slug on foos.Foo
Created 0002_auto__add_field_foo_slug.py. You can now apply this migration with: ...

What would happen if we just ran this? You can probably imagine, but let's do it anyway:

(foobarvenv)tests/foobar % python manage.py migrate foos
Running migrations for foos:
 - Migrating forwards to 0002_auto__add_field_foo_slug.
 > foos:0002_auto__add_field_foo_slug
FATAL ERROR - The following SQL query failed: ALTER TABLE "foos_foo" ADD COLUMN "slug" varchar(100) NOT NULL UNIQUE DEFAULT 'foo';
The error was: could not create unique index "foos_foo_slug_key"
DETAIL:  Key (slug)=(foo) is duplicated.

Error in migration: foos:0002_auto__add_field_foo_slug
Traceback (most recent call last):
    ...[SNIP]...
    return self.cursor.execute(query, args)
django.db.utils.IntegrityError: could not create unique index "foos_foo_slug_key"
DETAIL:  Key (slug)=(foo) is duplicated.

The database doesn't like it. So what do we do? We need to manually edit the migration to generate the slug. Open the file foos/migrations/0002_auto__add_field_foo_slug.py and first add an import:

from django.template.defaultfilters import slugify

Next edit the part that says unique=True to read unique=False in the second line of the forwards method (not counting the comment). Make it look like this:

db.add_column(u'foos_foo', 'slug', 
    self.gf('django.db.models.fields.SlugField')(
        default='foo', 
        unique=False, 
        max_length=100
    ),
    keep_defaults=False)

We will add the uniqueness constraint later when we've populated the slug field with appropriate data... which is what we'll do next. Add the following lines below the db.add_column() call:

foos = orm['foos.Foo'].objects.all()
for foo in foos:
    foo.slug = slugify(foo.name)
    foo.save()

Finally, we need to add the uniqueness constraint. Add the following lines to the end of the forwards method:

db.create_unique(u'foos_foo', ['slug'])

Save the file. We are now ready to run the migration:

(foobarvenv)tests/foobar % python manage.py migrate foos
Running migrations for foos:
 - Migrating forwards to 0002_auto__add_field_foo_slug.
 > foos:0002_auto__add_field_foo_slug
 - Loading initial data for foos.
Installed 0 object(s) from 0 fixture(s)

Double-check if everything worked

Drop into shell again, and test the new schema:

(foobarvenv)tests/foobar % python manage.py shell
Python 2.7.3 (default, Dec 18 2012, 13:50:09)
[GCC 4.5.3] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from foos.models import Foo
>>> [f.slug for f in Foo.objects.all()]
[u'foo', u'bar', u'fam', u'bam']
>>> # Next line should raise an exception if everything went well
>>> Foo.objects.create(name='foo')
Traceback (most recent call last):
    ...[SNIP]...
    return self.cursor.execute(query, args)
IntegrityError: duplicate key value violates unique constraint "foos_foo_slug_uniq"
DETAIL:  Key (slug)=(foo) already exists.

This tells us that a unique index has been successfully created on the slug column, and that all slugs for existing records have been successfully created.

You can verify this with your preferred database tool by looking up all indices that are created for the foos_foo table, too, but the test from the shell is sufficient.

Conclusion

The example isn't nearly as complete as real-life usage would be. For example, we have a non-unique field from which we are creating slugs, which means that, with real-life data, there is a chance of breaking the unique index. There could be other problems as well, but this article is not about such problems. We have successfully demonstrated a migration which populates new column with dynamically created data. Since South migrations are Python scripts, you can imagine that data doesn't only need to come from existing columns. It can come from anywhere, including web services, and other scripts.