Migrating GAE Datastore Schema
Inevitable fate of all web apps is evolution (the only other option being death, that is). During the lifetime of a web app, you will constantly modify it, add new features, remove old ones, and reconsider bad choices you’ve made in the past. The bad choices may not necessarily be bad, but merely a result of different parameters you operated by before. As the application evolves, so does the database schema. In most cases, you’ve already deployed the early versions and you have data in the live version of the application. You have to make sure the data remain intact. That’s why developers migrate their schema, rather than blowing up the database and creating a new one.
In Google App Engine, you write your database schema using classes in the language of your choice (currently the one between Python and Java). I assume you know this part of the story since you’re reading this.
So let me give you an example of a kind that needs to migrate.
from google.appengine.ext import db
class User(db.Model):
username = db.StringProperty(required=True)
password = db.StringProperty(required=True) # encrypted
salt = db.StringProperty(required=True)
Our schema is good enough for basic authentication. We deploy our app, and after a few months, we have a few hundred users, and we discover that they want to have avatars on their profile page, as well as their website URL. So we add these to our class:
class User(db.Model):
....
gravatar = db.StringProperty(required=True)
website = db.StringProperty(required=False)
We decided to use gravatars as avatars to simply things, and we decided to not require the website. Now, our live app doesn’t have these properties, so if we try to, for example, fetch the gravatars, the live application, operating on stale data, will inevitably break. Therefore we need to migrate our live data to match our new schema.
In order to do this, we need a few things. First we need URL mappings for mapreduce library, and we need to make a mapper for our migration.
To add the mapping for mapreduce, add this to the app.yaml file under the handlers: heading:
handlers:
....
- url: /mapreduce(/.*)?
script: $PYTHON_LIB/google/appengine/ext/mapreduce/main.py
login: admin
After this, you will be able to access your-app.appspot.com/mapreduce URL to see the mapreduce interface. If you don’t have a login handler, though, you will get an HTTP 404 for the /_ah/login_required URL. To get around this, you will need to add a login handler.
Here is one I’ve written using the bottle web framework:
#!/usr/bin/env python
from google.appengine.api import users
from bottle import route, request, run, redirect
@route('/_ah/login_required')
def login_required():
continue_url = request.params.get('continue')
redirect(users.create_login_url(continue_url))
if __name__ == '__main__':
run(server='gae')
Save this file into login.py and point to it from app.yaml:
handlers:
....
- url: /_ah/login_required
script: login.py
Now, you need to add a mapper. A mapper is a method that takes a single entity, modifies it, and puts it into the mapreduce operation queue. I keep my mappers in a file aptly named mappers.py. Let’s write the mapper for our example:
from hashlib import md5
from google.appengine.ext import db
from google.appengine.ext.mapreduce import operation as op
class User(db.Model):
username = db.StringProperty(required=True)
password = db.StringProperty(required=True) # encrypted
salt = db.StringProperty(required=True)
gravatar = db.StringProperty(required=Fase) # Note required is False
website = db.StringProperty(required=False)
def migrate_user(entity):
entity.gravatar = md5(entity.username.lower()).hexdigest()
entity.website = None
op.db.Put(entity)
Why do we have another model definition there? Because the gravatar property is required in the original class, mapreduce would complain that the property is missing in the live entities that are already in the database. It will throw a BadValueError exception. Because of this, we create a temporary class that would remove the required constraint from the gravatar property, until we have migrated the data. This class is moved to the mappers.py so that we don’t accidentally blow our real class up.
The migrate_user function takes a single argument entity. It is an entity of the User kind. We can modify it in our function as we see fit, and then finally call the op.db.Put on it. This puts the entity.put() in a queue (or so I believe).
Now we need some way to tell the mapreduce library to show this mapper in mapreduce UI. To this end, we will use the mapreduce.yaml configuration file:
mapreduce:
- name: Migrate user schema
mapper:
input_reader: google.appengine.ext.mapreduce.input_readers.DatastoreInputReader
handler: mappers.migrate_user
params:
- name: entity_kind
default: mappers.User
You don’t need to worry about the particulars of the configuration. In 99% of cases, you need it to look exactly like this, with the obvious parts modified to your needs (mappers.migrate_user, and mappers.User).
That is it. Now when you visit the mapreduce URL, you will see a web page with a drop down. You can then click on the Run button to launch the mapper.
Mapreduce library is a general-purpose library, so it’s not limited to schema migration. You can, obviously, perform different calculations during mapreduce and store the calculated values (gravatar property is one such example, albeit very basic).




