Skip to content

Bloglines and SQL

I moved from my own personal RSS reader (coded in perl by yours truly) to Bloglines about a year ago. The main reason is that Bloglines did everything my homegrown reader did and was free (in $ and in time to maintain it).

But with over 1 billion articles served as of Jan 2006, I always wondered why Bloglines didn’t do more collaborative filtering. They do have a ‘related feeds’ tab, but it doesn’t seem all that smart (though it does seem to get somewhat better as you have more subscribers). I guess there are a number of possible reasons:

  • It’s easier to find feeds that look like they’d be worth reading (I have 180 feeds that I attempt to keep track of)
  • blogrolls provide much of this kind of filtering at the user level
  • privacy concerns?
  • No demand from users

But this article, one of a series about data management in well known web applications, gives another possible answer: the infrastructure isn’t set up for easy querying. Sayeth Mark Fletcher of bloglines:

As evidenced by our design, traditional database systems were not appropriate (or at least the best fit) for large parts of our system. There’s no trace of SQL anywhere (by definition we never do an ad hoc query, so why take the performance hit of a SQL front-end?), we resort to using external (to the databases at least) caches, and a majority of our data is stored in flat files.

Incidentally, all of the articles in the ‘Database War Stories’ series are worth reading.