Query datastore for the set of property values present

Go To StackoverFlow.com


I have a property column which can have a subset of the following values at any point in time: { a | b | c | d | e }. By this, I mean that sometimes it can be any of { a | d | e }, or at another time it can even be { x | y | z }. How do I query the datastore, so that I can find out what subset is present at that point in time, without having to dig into each entity?

Presently I'm doing it this way:

people = Person.all().fetch(100)
city = set()
for p in people:

I want to get the set of property values that are present at this point in time (i.e. no duplicates). For example, at one point in time all 5,000,000 people have an address of { Manila | Cebu | Davao }, then I want the set(Manila, Cebu, Davao).

At another point in time, all 5,000,000 people will have an address of { Iloilo | Laoag }, then I want the set(Iloilo, Laoag).

Before any query, I would not know what the set should compose of.

My present method requires that I dig through all the entities. It's terribly inefficient, any better way?

2012-04-04 03:09
by Kit
I don't understand what you're trying to do. It sounds like you have a model that has one property that can hold many values. Is this implemented as a list property? Paste your model. What data are you trying to get out? Only those entities with some exact subset of values in the above list property? The example you posted is unclear and not helpful - mjibson 2012-04-04 03:52
@mjibson Sorry about that, I've edited the question to make it more clear - Kit 2012-04-04 04:28


In AppEngine, it's almost always better to generate and store what you might need, during write time.

So in your use case, every time you add or edit a person entity, you add the city they are in to another model that lists all the cities, and then store that cities entity as well.

class Cities(db.Model):
    list_of_cities = db.TextProperty(default="[]") #we'll use a stringified json list of cities

#when creating a new person / or when editing
person = Person(city = city)
cities = Cities.all().get() #there's only one model that we'll use.
list_of_cities = simplejson.loads(cities.list_of_cities)
if city not in list_of_cities:
    list_of_cities.append(city) #add to the list of cities
    cities.list_of_cities = simplejson.dumps(list_of_cities)


You may want to use memcache on your cities entity to speed things up a bit. If you are also expecting to add more than one person in bursts of more than 1 write / second, then you might need to also consider sharding your list of cities.

2012-04-04 07:04
by Albert
Thanks @Albert. So, meta-entities are the way to go, then - Kit 2012-04-04 07:19
Note that this won't remove components that are no longer used from the list. It will also run into contention issues very, very quickly - Nick Johnson 2012-04-05 06:02


An alternative to the approach suggested by Albert is to compute these values periodically using a mapreduce. The App Engine Mapreduce library makes this fairly straightforward. Your mapper will output the city (for instance) for each record, while the reducer will output the value and the number of times it occurs for each.

2012-04-05 06:03
by Nick Johnson