Archive for February, 2008

Thoughts on Data Portability [Chiming In]
February 12, 2008

In the last several months, the Web community has engaged in a discussion regarding the portability of your personal data into and out of the services you use. While most of this conversation is productive and interesting, some have limited their vision to imports and exports. It may be affected by concerns over the retention of personal data, or concerns that a given service will become unavailable or turn evil or something nasty. Some people don’t want to be tied in to a specific service. Regardless of perception, the actual definition of portability is commonly disputed.

Those concerned with portability often use the term data to include relationships like friends, groups, subscriptions, and attention. However, relationship data may not be best served by “portability”, but by logical “queryability”. From our perspective, the user is served best to keep data in its natural habitat and process updates, additions, deletions, or modified rules. A fine example is the Social Graph API. In using Google’s crawlers and indexes, the API allows you to query friendships across your various public profiles using annotated hyperlinks in the markup. Services which support this API would allow you to easily find existing friends across the Web and connect to them on a new service. Rather than trying to move data, these approaches enable developers to perform queries and glean information for their users. In addition to advantages to users, this allows platforms to clearly define policies and, like it or not, be able to hold competitive advantages.

Freshness, Control and Context

There is good reason to prefer queries over exports: to have fresh data, to respect changes in the rules of ownership and control, and to leave raw data where it belongs – with the context in which it was generated. The need for updated information and activity is self-evident, but ownership, control and context are less clear. For example, blogger Robert Scoble, equipped with a tool developed by Plaxo, crawled and screen-scraped the profiles of his friends on Facebook. Facebook promptly disabled his account. Legal issues aside, and regardless of what was being done with the data, some argue that he cannot assume permission to move his friends’ contact information out of Facebook. Some argue that he has the right to export this information. The situation affirmed that companies like Facebook have control and have terms of service that are enforced. In an age when we give up personal data freely, it is important to be selective, careful, and leave data with those in whom we trust.

Queries also yield to context, which is significant. Social networking sites co-exist, and thousands of niche social networks spring up continuously. They serve different purposes, with different communities, and while there may be common data you’d like to have among them, you can imagine how much sense it makes to have your MySpace profile available to all your LinkedIn contacts. A great example is when Google Reader got snuggly with Google Talk. The Reader team directly imported your Talk contacts, considered them your friends, and subscribed you to their shared items in Google Reader. Talk contacts are associated with several contexts including friends, family and business, which may not be appropriate when blindly sharing stories. This made a good number of people uncomfortable – in most part due to confused contexts and a lack of control.

Moving forward, control and automation are paramount. In order to enable control and automation, new systems belong above the data to perform logical queries and implement policy. Data exports may be filtered at best, but do not appeal to the new needs of an intelligent, automated, distributed Web.

Updates, Synchronization and Automation

A significant factor driving portability is the notion of social network fatigue – the tiresome act of signing up for, entering information into, and managing a continuously growing sea of accounts on various Web sites and networks. Wouldn’t it be great to sign in to a new service, click a button, and have it be fully configured automatically? The intention of technologies like OpenID and OPML is to improve this by reducing friction and stickiness when moving between new services. Of course this isn’t a one shot deal, the information needs to stay fresh, and things like activity need to be updated. Currently the best way to distribute activity is via RSS feeds. However, the optimal state of the service-oriented Web is to support event-triggered updates across distributed networks and products, and we’ll talk more about this. It will entail a new form of publisher-subscriber framework, not dissimilar to the notion of blog pings, though it would cut the middleman out.

Regardless, a limited case where exportable relationships can work is to move social graph data into a personal address book. Of course this defeats our requirements of freshness, ownership and context. However, the address book export is just a list of names and addresses, and could be considered a single, portable object.

Portability and Objects

Real portability is reserved for objects and media, though this is where legal and international issues of privacy and copyright come into play. Portable photos, documents, videos and blog posts allow the authors and owners of those objects to move them among services. Your identity would classify as a portable object in this model. The standard example, however, is a photo service which allows a printing service access to a user’s photos and print them. A Streamy example is to drag and drop a Facebook event to add it to your Google calendar. Of course, changes to an object in one place should be reflected in another, which creates a freshness requirement that gets very complex very quickly.

The true object-oriented Web will free media and documents from the shackles of closed systems and clearly establish ownership and control. While DRM may not be the answer, protecting your rights as an author and artist will play a big part on the open Web. We’ll talk more about copyright, though there are technical problems to address before then including authentication and delegation using technologies like OAuth.

Visions of an Open Web

The DataPortability workgroup’s strongest voice, Chris Saad, noted among the DataPortability Design Principles that the mission ought to be less “fight the man” and more practical and useful goals. It will be interesting to see if the group’s finalized vision is limited to the import/export use cases, because these surely have the least potential. Otherwise, portability may not really matter to an average user, but the benefits to the user experience surely do, and I think selling DataPortability means selling the huge benefits of an efficient, distributed Web experience.

A larger vision of the group, while some question it, will hopefully guide services to standardize their data interfaces. After all, most innovation on the Web owes its success to interoperability. The group recently met to nail down their logistics and are working to outline their future. With Streamy, we’re forging ahead with what’s available to create interesting experiences for our users. We’d like to remove friction between services, consolidate your Web experience, and bring you news from the sources you care about. Forward-thinkers are creating innovative interfaces to your distributed data, and products are going to evolve quickly. Figuring out what portability means, what the best technologies are, and how this all looks from the user’s perspective are all pieces of a large puzzle that will strongly affect the future of the Web.

How do you feel about this? What, in your opinion, are the most critical aspects of the open Web?

DM

Advertisements