Wednesday, June 17, 2009

Keeping it simple with flatula

Paul has blogged about overcoming mnesia performance issues in the past, but I don't think we've talked much about the ultimate strategy -- keeping data out of mnesia altogether.

When we first started serving ads, we stored information about every single ad impression in a huge mnesia database, for retrieval on click, and for building behavioral profiles. Almost needless to say, this didn't scale very far. We spent many a day last summer delving into mnesia internals, fixing corrupted table fragments after node crashes, bemoaning how long it took new nodes to join the schema under heavy load, and so on.

One of the simplest and most effective changes that got us out of this mess was not to store any per-impression data in mnesia at all -- instead, we started logging the data to flat files on disk, and storing a small pointer to the data in a cookie so we could read it back the next time we saw the user. Hardly a revolutionary solution . . . it's well-known that disk seeking is the enemy of performance. The hardest part was coming to realizations like, "Hmm, I guess we don't really care if a node goes down and we lose part of that data!"

We've open-sourced one of the main components that enabled this strategy: flatula, an Erlang application that manages write-once "tables" that are really just collections of flat files. It looks a bit like dets, except that it doesn't support deletions, updates, or iteration, and you can't make up the keys. But when you don't need those things, it's hard to imagine a more efficient way to store data.

If you'd like to learn more, there's a brief tutorial on the Google Code site.

Wednesday, June 10, 2009

Let parse transform

So the problem of intermediate variable naming came up again on erlang questions.


Subject:Versioned variable names
From: Attila Rajmund Nohl
Date: Tue, 9 Jun 2009 17:12:34 +0200

Hello!

I think there wasn't any grumbling this month about the
immutable local variables in Erlang, so here's real world
code I've found just today:

% Take away underscore and replace with hyphen
MO1 = re:replace(MO, "_", "-", [{return, list}, global]),
MO2 = toupper(MO1),
% Replace zeros
MO3 = re:replace(MO2,
"RX0",
"RXO",
[{return, list}, global]),
% Insert hyphen if missing
MO5 = case re:run(MO3, "-", [{capture, none}]) of
nomatch ->
insert_hyphen(MO3);
match ->
MO3
end,

...


Mikael Pettersson pointed out that this really has less to do with immutable local variables and more to do with the lack of a let expression. That was insightful, and since a let expression can be considered syntactic sugar for a lambda expression, I realized that a parse transform could provide let like functionality. Let is a reserved keyword in Erlang so I used lyet instead.

Essentially the parse transform rewrites
lyet:lyet (A = B, C)
as
(fun (A) -> C end) (B)
so the above code could be rewritten as

Result = lyet:lyet (
% Take away underscore and replace with hyphen
MO = re:replace(MO, "_", "-", [{return, list}, global]),
MO = toupper(MO),
% Replace zeros
MO = re:replace(MO,
"RX0",
"RXO",
[{return, list}, global]),
% Insert hyphen if missing
case re:run(MO, "-", [{capture, none}]) of
nomatch ->
insert_hyphen(MO);
match ->
MO
end),

You must provide at least one argument to lyet:lyet. All but the last argument to lyet:lyet must be an assignment, and the last argument has to be a single expression (but you can use begin and end for a block of expressions inside the lyet). As you can see above, you can reuse a variable name across the assignment arguments to lyet:lyet. You can even use lyet:lyet on the right hand side of the assignments, or as part of the expression argument. Some examples of usage are present in the unit test.

Update: per Ulf's suggestion, the parse transform also recognizes the local call let_ in addition to the remote call lyet:lyet. It definitely looks nicer with let_.

The software is available on Google code.