Will Google fulfill the promise of WinFS?

OLPC’s datastore, Sugar’s persistent data storage subsystem for applications, is essentially an object database, and when considering object persistence, it’s wise to remember Ken Thompson’s quip:

We have persistent objects — they’re called files.

In other words, don’t write a filesystem replacement where a filesystem will do. OLPC had some compelling reasons to roll their own, revolving around the richness of data expression and manipulation as well as security. And from published papers, we know Google rolled their own internal data persistence system for reasons at the other end of the spectrum: massive fault-tolerant scalability.

I’ve been thinking about the general problem of rich object persistence for several years now, so I found the release of Google’s App Engine very interesting. And while App Engine itself was reasonably widely predicted in the industry, I wasn’t expecting that the SDK’s development web server would come with a complete implementation of a mock in-memory backend for Google’s datastore, along with a simple parser for GQL, its alternative SQL-like interface.

If you’d like to explore the data persistence aspect of App Engine, grab the SDK and look at:

  • google/appengine/api/datastore.py which defines entities, queries and iterators and provides methods for actually operating the datastore,
  • google/appengine/api/datastore_file_stub.py which is the mock in-memory datastore backend,
  • google/appengine/api/datastore_entities.py which defines some useful entity kinds from GData, and
  • google/appengine/ext/gql/__init__.py which implements a LL(1) parser for GQL.

Datastore requests are marshaled into protocol buffers en route to the backend; the marshaling code itself is in google/net/proto/ProtocolBuffer.py. To understand this aspect of things, you’ll want to read the (very short) section 4 of Google’s Sawzall paper. Finally, the Bigtable paper is excellent reading on Google’s general approach to persistence.

The most fascinating unanticipated effect of App Engine might be the code release — combined with Google’s general reach and developer mindshare — entrenching their datastore API far wider than they expected. Microsoft’s next-generation WinFS filesystem promised rich structured data storage on your disk, with deep query and search abilities that far exceed that of plain filesystems: Wikipedia’s example is querying for “the phone numbers of all persons who live in Acapulco and each have more than 100 appearances in my photo collection and with whom I have had an e-mail exchange within the last month”.

But WinFS was never delivered.

And now, as people begin to implement their own Google datastore API-compatible backends on top of everything from Apache CouchDB to (awkwardly-fitting) relational databases like MySQL and Postgres — which, make no mistake, will happen very soon — we might yet see the long-unfulfilled WinFS dream of rich metadata and tremendously powerful search come to life.

App Engine is making this happen on the web right now. The technology’s march to the desktop is all but inevitable.