Two GAE Datastore Query Optimization Tips
I’ve done some adjustments to the Herd Hound app to make it run a bit faster, and here are two tips you can use to optimize performance of your datastore queries.
Use of limit with count method
There are two reasons (that I can think of right now) you might call count. One is when you want to know the total number of items in a query result, and one is when you want to see if there are any items (or more than X items) in the query result. In the latter case, you may do something like this:
q = MyKind.all().filter(...)
if q.count():
# do something
or
q = MyKind.all().filter(...)
if q.count() > 2:
# do something
In these cases, it’s far more efficient to call count with a limit argument. It turns out that count can be quite an expensive call, and the difference between a call with the limit, and and one without it can be measured in hundreds of milliseconds. So, to fix the above two examples:
q = MyKind.all().filter(...)
if q.count(1): # at least 1
# do something
or
q = MyKind.all().filter(...)
if q.count(3) > 2:
# do something
Use fetch when you need fixed number of items
As you probably know, when you want to do some work on every entity in the query result, you just iterate over the query object as an iterable. For example:
q = MyKind.all().filter(...)
props = [e.someprop for e in q]
You can also use subscript notation to fetch one item from the query results:
q = MyKind.all().filter(...)
first = q[0]
In the latter case, if you need the first item, you could do this instead:
q = MyKind.all().filter(...)
first = q.fetch(1)[0]
So what’s the difference? Well, except from the latter being more to type, it’s also more efficient. What fetch does is, it converts the query object to a list of model instances containing the specified number of items. This, it turns out, is much more efficient than treating the query object as an iterable.
So what if you wanted to do q[2] instead of q[0]? Well, you’re fetching the 3rd entity from the query results, so you’ll use an offset of 2:
third = q.fetch(1, 2)[0]
The second argument to fetch is the offset, of course. One thing to remeber is that performance penalty raises with both limit and offset. So fetch(2) is twice as slow as fetch(1), and fetch(2) is just as fast/slow as fetch(1, 1).
Finally, you can also use get() instead of fetch(1), but I think it may be slower by a close shave because it’s a shortcut for fetch(1).
Combo (the bonus track)
Finally, the combination of the two tips. If you want to work on a single entity from the datastore, and you want to make sure it’s there, just call fetch, and do a len() on the results. So, instead of:
q = MyKind.all().filter(...)
if q.count(1):
e = q.fetch(1)[0]
else:
return "Couldn't find any"
you would do:
e = MyKind.all().filter(...).fetch(1)
if not e:
return "Couldn't find any"




