Schematization and Google Base

This is sort of yesterday’s news, but it looks like Google’s building a big honkin’ schematized storage system that you can stick all sorts of stuff into. Dare says it reminds him of Amazon’s Simple Queue Service. Personally, it strikes as being a little like WinFS-in-the-sky. Of course nobody on the outside really knows what the heck Google Base is…

From the screenshots I’ve seen floating around the Web, it seems like Google Base is trying to do three basic things:

1)      Provide a backend storage solution for structured data.

2)      Schematize that data and paint it with rich metadata

3)      Provide search capability on both data and metadata to leverage stored information in interesting ways.

All of this fits in well with Google’s apparent strategy of “own as much information as possible, surface it through broad-reaching apps and API’s, and monetize it through advertising”.

(2) is a pretty fundamental step to enabling others to build applications on top of this system. Unifying data models is one of the really hard parts of integration work, and there’s a lot of power in being able to look at all the information you know about in terms of a common entity model. Once you have your information universe (infoverse?) framed in terms of these common entities, it becomes much easier to build new apps on top of that data store because the integration costs have been drastically lowered. In essence, you’re spending your time figuring out how to make your application cooler and less time worrying about how to make your Customer look like my Customer.

One thing that’s pretty apparent is that solving (2) is really hard. Building generalized, canonical schemas for things like Customers, Contacts, Events, [insert data entity here] is a tremendously challenging knowledge representation perspective (it’s hard to find a “one size fits all” solution). Google appears to tackle this by letting users schematize their information themselves (see this screenshot – note the “define a new type” option). It will be interesting to see if they’re going to try to apply a tagging/folksonomy approach to this – I define my version of a Recipe, you can use it if you want. Maybe they’re looking to “the community” to grow up a set of canonical schemas in sort of an ad hoc manner.  I suspect that if they go down this route they’ll have to solve one of the big problems with tagging today – proliferation of synonyms, where everyone uses different terms to define the same thing.

So where’s the money in this for Google? First off it gives people yet another reason for people to go look at a Google-owned site. More eyeballs == more ads == more money. I’m also wondering if they’re not trying to make something of a standards play – leverage the reach of Google Base and turn it into the de facto data model that every application speaks (why invent your own schema when everybody else already uses Google’s?). I don’t know how they’d monetize that leverage directly, though, since it doesn’t cost anything to use the format...

We'll see where this goes.