Feed on
Posts
Comments

Waiting to be seated for what turned out to be just elegant sushi at Haruki East in Providence last Friday night, Hilary Mason and I were chatting about my current interest of trying to create an open and distributed Twitter-like microblogging design. “I’m concerned that might destroy the Twitter community”, she said. I was somewhat taken aback by that notion, having not thought much about the unintended consequences aspect of what I was doing. But I see now that that concern is well-grounded.

Twitter is a place and a culture, a “scene” you might say, as much as it is a tool or an application. There are norms of behavior, customs and vernacular all of which foster a real sense of community amongst Twitterers. We think of ourselves as belonging to that scene, and we’re proud of it. We have bonded with it and each other and silly as it seems, we are drawn closer together by the shared hardship of dealing with Twitter’s many outages, slowdowns and bugs.

I’m giving a talk this Saturday at BarCampBoston about Distributed Twitter, and hope to recruit some developers and perhaps designers to work together to create one or more prototypes. But it seems I have to add an item to my agenda now, which is “how to maintain the Twitter sense of community” even when people by design will be using different UIs and server implementations to connect into the microblogging cloud, as it were.

This should cause us to ask some really interesting questions. What causes people to have such allegiance to a community? Twitter isn’t just one community, it’s thousands of overlapping sub-communities of friends and friends-of-friends. Could those sub-communities break off and form their own microblogging server center? Could there be another “community” site that just kept track of the binding -together of people and left the status updates, SMS interface, and database crunching to the “medium”?

Kinda makes you say, “Hmmmm”. Thanks, Hilary, for the whack upside my head on that one. :)

If you don’t read all of this admittedly long post, please do skip to the end and check out the BarCampBoston info. I’ll be holding a session there on the topic of Distributed Microblogging.

Ok, so let’s talk about the hard bits of doing Distributed Microblogging. It’s easy to envision a multitude of servers exchanging microblog posts, and a UI that simply arranges the posts in chronological order. By the way, chronological order is easy to do if all the servers are synched with a time standard like nist.gov, and most are. But the hard bit is making it work in a way that performs well and scales to large populations of both microblog posters and readers. I’ve been thinking of some different alternatives for this, which I’ll lay out here. As always, your thoughts are welcome.

Performance

So, what do I mean by “performs well”? Well, microblog (ie, Twitter) updates happen much more frequently than what we’d consider traditional blog posts, but not quite as fast as instant messenger or chat updates. And microblogs don’t have the notion of presence attached to them. You don’t worry about whether a Twitter poster is “on line” at any particular time, although you may deduce that from the rapidity of their responses to you.

To put a number on it, I’d say that a microblog post notification should be transmitted in less than a minute, ideally in a few seconds, although a delay of up to 5 minutes is not that objectionable. I frequently will ignore my Twitter feed for many minutes, sometimes hours. I have no expectation that I will see someone’s posts immediately (i.e., less than a second) nor do I care to. Microblogging is a river of updates. You don’t expect to see every single one.

Size

Now let’s talk numbers of followers and following. A typical Twitter user has 100 or fewer people that they follow, and a similar number that are following them. But the edge cases are far bigger. Someone like Robert Scoble (@scobleizer) or Chris Brogan (@ChrisBrogan) follow literally thousands of other users, and have even more following them.

Ok, so we need to be able to update thousands of users in nominally less than a minute, but at the exreme less than five minutes. If you don’t buy that, please leave a comment explaining why these limits are not realistic.

Alternatives

RSS feed polling

The simplest solution would be polling, which is reader initiated. Followers would poll the feeds of the microblogs they were interested in. It seems to me that polling has to be discarded as a solution except for occasional or retrospective uses. I think a solution should include RSS feeds but it seems obvious to me that for someone with 100 friends to poll those RSS feeds every minute, or ideally faster because they might post something is a stupendous waste of bandwidth.

On the sending side, if that same person had 100 followers, each of those followers would be polling the person’s RSS feed every few seconds causing a very high server load. Now multiply that by however many people’s accounts are hosted on that server and the problem quickly blows up.

Notification

A much more efficient solution would be for senders (microblog posters) to notify followers when a new post has been made, and perhaps to proactively send the new content in the notification message. Then, resources are used only when there is actual traffic to send. Both senders and receivers are quiescent when no one is posting. So what are the possibilities for implementing notification? I see the following.

  1. RSS cloud API - a little-known part of the RSS specification, the cloud element allows a feed to publish a web-service address that readers of the feed can register with to be notified of changes using a SOAP or xml-rpc call.
  2. Jabber (XMPP) channels between DMB servers to carry notifications and content.
  3. UDP notification with http callback. UDP is lightweight for both senders and receivers. No open connections are required between senders and receivers. It’s sort of like RSS cloud, but narrowly and specifically designed for DMB, as opposed to generalized RSS.

RSS cloud API

The cloud API was specifically designed with this purpose of notifying readers of content updates. Its original intent, judging from the RSS 2.0 spec was to allow actual client feed readers to register with the cloud. In the case of DMB, it would be cooperating servers that would register for notification with each other.

The problem with RSS cloud is overhead. Microblog entries are tiny and frequent compared with blog entries or traditional site updates. To require a follower server to read an entire RSS document to get 140 characters of content, and have this happen every few minutes when the poster updates would be inefficient to say the least. In addition, there is experience with the cloud api that indicates just the HTTP session overhead for notifying many users becomes intolerable, although this was from the perspective of actual clients being notified as opposed to clients’ servers being notified.

Jabber (XMPP)

Jabber is a very tempting candidate for this application, and has been getting quite a bit of discussion in the development community of late. Here’s an example. The advantage to Jabber is that it maintains open sessions between servers, which eliminates the session setup/teardown overhead, and allows for almost instantaneous notification of all “following” parties.

But this may also be a disadvantage in situations where there are hundreds or thousands of “followers” for a single sender. IM or chat is typically one-to-one, or one to a few, but microblogging is frequently one to hundreds or one to thousands. I am not familiar enough with Jabber servers in actual practice to know what their performance or connection limitations are. Anyone with Jabber implementation or operational experience is strongly encouraged to comment.

Jabber’s concept of presence could be used to keep the number of messages by only requesting messages updates when a user is actually logged into his/her microblog system. What’s interesting about this notion is what “logged in” means. Microblogging, at least the way Twitter works, does not really require the concept of presence. For instance, you can be “logged in” to Twitter, but dormant for hours not getting any updates until you request a page refresh.

In fact, one key aspect of microblogs that differentiates them from IM or chat is that they don’t typically “auto-update”. And rather than this being a disadvantage, I-and I think many others-find this on-demand update to be much more useful than a streaming IM or chat window. It’s really more like reading small blog posts. I read when I want to, not when someone else decides to say something. So, using the presence capability without auto-updating will require a little clever UI design to, in a sense, auto-logoff the user when their web page hasn’t been refreshed in a certain period of time. This then, will also require the ability of followers to query the senders they follow for “back posts”, so they can see what happened in the past without keeping a client logged in all the time to save all the posts.

UDP for notification

In the past year, I did some work on a revamped email protocol I called IMTP that uses small UDP messages to notify receiving servers that the sender has some traffic for them. The receiver then calls back to the sender using TCP to get the message body. This was based on Prof. Daniel Bernstein’s Internet Mail 2000 proposal several years ago. He proposed, quite rightly in my opinion, that mail senders should bear the burden of storing the message contents, not the receivers, and that mail content should only be sent when a receiver actually wants to read it.

The advantage to UDP is that it is very light weight for both the sender and the receiver, not requiring any session overhead or setup/teardown. If used for microblogging, UDP notification messages could take the place of the continuously open TCP sessions that Jabber employs, thus reducing the session resource allocation load on both ends.

Now, of course, the issue with UDP is that it is not guaranteed to be delivered. The IMTP service we built would retry the UDP message at some frequency until the receiver called back to either fetch or reject the message.

Conceptually, using UDP messages with a 140 character payload and a message number would fit a microblogging application very well. Because microblogging has no expectation or requirement of presence or real-time delivery like chat and IM do, a dropped UDP message is not a tragedy. If the sender retried even once or twice per minute, that would be plenty of timeliness for microblogging. Plus, if the UDP messages carry a monotonically increasing message number, the receiver can know if they’re missed a message and simply call back to the sender to get it. The microblogging UI can reconstruct the sequence easily when the missing pieces, if any, finally come through.

A notification system using UDP would seem to minimize the resource requirements on both senders and receivers, and can perform the same kind of message fan-in/fan-out optimization that Jabber could. In other words, a given microblog update body needs to be sent only once to any server, regardless of how many followers there may be on that destination server.

Comments, please!

I am very interested in what others think and have to say about these issues. This problem of efficient and timely-enough notification, it seems to me, is the tough nut to crack for a good solution to microblogging.

By the way, I am going to do a session on Distributed Twitter (or Microblogging) at BarCampBoston3 on May 17, 18. We are going to try to bring in some mobile technology folks to discuss the other really interesting issue with Distributed Twitter, the SMS connection.

Also, at BarCampBoston, I am going to try to organize some kind of group to try implementing a Distributed Microblogging application going forward.

I want to quickly set down a high-altitude view of how I see a Distributed Twitter working. This should give you the basic concept, which I’ll then elaborate in more detail in subsequent posts.

First of all, let’s call it something more generic. I like Distributed MicroBlogging or DMB. The “Distributed” part is really the key. Unlike a centralized, proprietary walled garden system, DMB would be spread out over hundreds or thousands of different servers over the internet.

Just like email or Jabber, anyone could run a DMB server. People would register on a particular server with their OpenID and create or contribute to microblogs that other people could follow, ala Twitter. Note that people are different than microblogs as entities in the architecture. This is somewhat different than Twitter, in which there are only user accounts. This allows a form of the long-sought groups feature, implemented as microblogs that many people can contribute to.

So, people contribute to microblogs that are followed by other people. When someone updates a microblog, anyone on any server that is following (ie subscribing to) that microblog will get the update in whatever client they have running when the client fetches it. A Twitter-like client will display the posts from many different users interleaved in chronological order. Some clients could maintain a “live” real-time update, where other clients could display only on demand from the user, like the Twitter.com home page.

That’s another important point. The DMB architecture, like Jabber and SMTP, does not specify any particular user interface. How the microblogs are presented is up to a UI designer. The architecture only specifies what data are interchanged, not how it is presented.

So, that’s a very simple explanation of how I see a Distributed MicroBlogging working. There could be large public servers like Google or small private servers for individual companies or groups of people. A server could host hundreds or thousands of microblogs and users, or just one microblog with a single user.
I can envision a given individual’s domain delegating its microblogging functions to a larger server, much as an individual’s home site can delegate its OpenID functions to a large identity service company.

Next, I’ll talk about the single most challenging implementation problem for DMB - notification. How does a DMB server notify other following servers that a change has taken place on one of its microblogs?

Distributed Twitter

One of the things that drives everyone nuts about Twitter is its unreliability. In fact, it’s having a little case of the no-updates this morning (2008.04.20). This is a less frequently discussed but ultimately, I think, more important disadvantage to walled gardens. If you rely on a service and it goes down for maintenance or failure, or it is subjected to any one of several denial-of-service attacks by hackers, you’re out of luck. You’re subject to how much, or more properly, how little time and money the service’s administrators or financial backers have put into site dependability.

So, what could be done about this problem? The answer would be to design a service that is distributed instead of centralized. A distributed Twitter service would operate like email. There’s no single point of failure for email because there’s no single super email server or service through which everything flows. Email is based on a protocol, not a single service, or even group of services. Anyone can run an email server and exchange mail with anyone else running a server. I think sometimes people would like email to go down for a few days because they feel overwhelmed by it, but that’s a different blog post entirely. :)

Of course, you might say it all flows through the Internet, but the net itself is distributed. Yes, there are backbone subnets that carry huge amounts of traffic and would degrade your service if they went down (and they sometimes do), but there are well-known ways to re-route traffic around failed links.

Twitter could most definitely be distributed. Over the past few months there have been some real developments toward this end, and conversation in the development community about ways to do it. The single most complete development done so far is Prologue, a WordPress theme that allows several people to “co-author” blog posts and share their “update stream” via RSS. It’s been most talked about as a Twitter for corporate co-workers that in a certain sense gives the long-desired groups capability to Twitter.

Prologue is a step in the right direction and hints at what is possible, but is not a general solution that would scale to accommodate thousands of users. The guys at WordPress that developed Prologue have said they’re not particularly interested in developing a fully distributed Twitter-clone, but in the hopes that someone else might pick up the ball and advance it, they’ve made the code open and available under an open-source license.

I’ve been thinking a lot about this problem myself and in this introductory post I’d like to outline what I think are the major requirements and architectural issues to be solved in creating a fully distributed microblogging platform. Do you have any thoughts to offer on this? Please comment!

  1. Open
    Although it’s certainly possible to design a closed microblogging architecture (ie, distributed but a proprietary implementation), it’s certainly in the general user’s interest to make it open, which would allow for competing implementations. Why break down a walled garden only to create another one?
  2. Protocol -defined
    In order to be open and distributed, the architecture must defined by a protocol, not by a server program or database schema, or other implementation artifact.
  3. Minimal
    Occam’s Razor should be the guideline. A new microblogging architecture should specify as little as possible in terms of UI, security techniques, etc. OpenID is an excellent example of a standard that leaves many important features (such as authentication technique) to the various implementations.
  4. Privacy-capable
    I think there is a rising concern and general wariness of the completely open and public nature of many social media and social networking applications. Users want to be sure they know who sees their posts and control who can see what “friends” they have. So, there should be a general capability to control access to your microblog and its meta-data.
  5. Extensible, Forward-Compatible
    It should be possible for an implementation to add new capabilities without rendering previous versions inoperative or non-interoperable.
  6. Scalable
    The architecture should be scalable to the entire internet. A given user should be able to subscribe to or service subscriptions from tens of thousands of other users.
  7. Efficient
    In order to fulfill #6, the architecture must be efficient of network and computing resources. In particular, this would argue against polling RSS feeds as a way to collect input. Polling a blog once a day works, but a microblog stream like Twitter needs to react almost immediately.
  8. Standards-based
    Whenever possible, new open architectures should utilize existing standardized protocols and data formats whenever possible. This does not mean that a new protocol or format is out of the question, but there had better be a very good reason why an existing standard cannot be used.
  9. Open-source
    While proprietary implementations of a standard cannot be prevented, open source implementations should be favored, especially with respect to security issues. Only by inspecting the source can one be assured there are no easter-egg back doors or other weaknesses.

width=

Holy Moly! This coming Friday is PodCampNYC 2. I’m doing a session called “OpenID for Newbies” and I have to get cracking on getting the PowerPoint together. Fortunately, I have much of the source material already in the form of an OpenID preso I did at PodCampBoston2 last fall.

I’m really looking forward to this PodCamp because I’ve met so many people online since the first PodCampNYC about a year ago, and I want to meet them in person. In particular, I can’t wait to “virtureal” (meet in person someone one knows only online) my co-podcasters on PushMyFollow, Annie Boccio (@banannie), Christine Cavalier (@purplecar) and Michael Gaines (@istarman), all of whom live in the Jersey, Philly area.

Two more virtureals I’ve been wanting to make for a long time are video editor extraordinaire Bill Cammack (@billcammack) and video podcast goddess Roxanne Darling (@roxannedarling). I’ve gotten to know Bill through mutual friends online. If it works out time-wise, I’d love to meet up with Bill at the beer mecca Burp Castle in the East Village, where I’ve seen him in so many Flickr photos.

I’ve known Roxanne in the online sense for well over a year. Her Beachwalks with Rox is one of the longest running videoblogs/podcasts on the net. I have to say, Beachwalks was one of my inspirations to get into vlogging and social media. We’ve been promising each other for months that one day we’d meet up and I’ve even gotten a proxy-hug from Rox through Laura Fitton (@pistachio), but now it’s finally going to happen! Yay!

I’m also looking forward to renewing in-person friendships with NYC peeps like Kathryn Jones, Jesse Chenard, Grace Piper and Charles Hope that I haven’t seen in a while. A special ex-Boston-now-NYC friend I haven’t seen since her farewell party last November is Julia Roy (@juliaroy).

One technically oriented meetup I want to make is with David Recordon, the young distributed applications guru and OpenID advocate. I have a lot of the same interests in trying to break down the walled gardens of social media and social networking, and David is a leader in that movement.

But yeah, although there is plenty of cool content to consume in the sessions, it’s pretty obvious that the real reason I go to podcamps is for the friendships. I’m thankful that I don’t have to justify the “business value” of going to podcamp to some finance department troll in Corporate America. I’m the CEO of me, and I say it’s worth it.

Intro

This will be my personal blog that will most likely carry things related to technology, coding and life in general. The videos will stay over on blogspot for the moment at Joe’s Video, Etc.

Hello world!

Welcome to WordPress. This is your first post. Edit or delete it, then start blogging!

« Newer Posts