Wednesday, June 17, 2009

Keeping it simple with flatula

Paul has blogged about overcoming mnesia performance issues in the past, but I don't think we've talked much about the ultimate strategy -- keeping data out of mnesia altogether.

When we first started serving ads, we stored information about every single ad impression in a huge mnesia database, for retrieval on click, and for building behavioral profiles. Almost needless to say, this didn't scale very far. We spent many a day last summer delving into mnesia internals, fixing corrupted table fragments after node crashes, bemoaning how long it took new nodes to join the schema under heavy load, and so on.

One of the simplest and most effective changes that got us out of this mess was not to store any per-impression data in mnesia at all -- instead, we started logging the data to flat files on disk, and storing a small pointer to the data in a cookie so we could read it back the next time we saw the user. Hardly a revolutionary solution . . . it's well-known that disk seeking is the enemy of performance. The hardest part was coming to realizations like, "Hmm, I guess we don't really care if a node goes down and we lose part of that data!"

We've open-sourced one of the main components that enabled this strategy: flatula, an Erlang application that manages write-once "tables" that are really just collections of flat files. It looks a bit like dets, except that it doesn't support deletions, updates, or iteration, and you can't make up the keys. But when you don't need those things, it's hard to imagine a more efficient way to store data.

If you'd like to learn more, there's a brief tutorial on the Google Code site.

3 comments:

  1. That has to be one of the least auspicious project names ever.

    ReplyDelete
  2. Perhaps number two on the list, right after mnesia? :)

    ReplyDelete
  3. Well, third: the original name was "amnesia", or so Klacke has told me :-)

    ReplyDelete