Wednesday, June 17, 2009

Keeping it simple with flatula

Paul has blogged about overcoming mnesia performance issues in the past, but I don't think we've talked much about the ultimate strategy -- keeping data out of mnesia altogether.

When we first started serving ads, we stored information about every single ad impression in a huge mnesia database, for retrieval on click, and for building behavioral profiles. Almost needless to say, this didn't scale very far. We spent many a day last summer delving into mnesia internals, fixing corrupted table fragments after node crashes, bemoaning how long it took new nodes to join the schema under heavy load, and so on.

One of the simplest and most effective changes that got us out of this mess was not to store any per-impression data in mnesia at all -- instead, we started logging the data to flat files on disk, and storing a small pointer to the data in a cookie so we could read it back the next time we saw the user. Hardly a revolutionary solution . . . it's well-known that disk seeking is the enemy of performance. The hardest part was coming to realizations like, "Hmm, I guess we don't really care if a node goes down and we lose part of that data!"

We've open-sourced one of the main components that enabled this strategy: flatula, an Erlang application that manages write-once "tables" that are really just collections of flat files. It looks a bit like dets, except that it doesn't support deletions, updates, or iteration, and you can't make up the keys. But when you don't need those things, it's hard to imagine a more efficient way to store data.

If you'd like to learn more, there's a brief tutorial on the Google Code site.

Wednesday, June 10, 2009

Let parse transform

So the problem of intermediate variable naming came up again on erlang questions.


Subject:Versioned variable names
From: Attila Rajmund Nohl
Date: Tue, 9 Jun 2009 17:12:34 +0200

Hello!

I think there wasn't any grumbling this month about the
immutable local variables in Erlang, so here's real world
code I've found just today:

% Take away underscore and replace with hyphen
MO1 = re:replace(MO, "_", "-", [{return, list}, global]),
MO2 = toupper(MO1),
% Replace zeros
MO3 = re:replace(MO2,
"RX0",
"RXO",
[{return, list}, global]),
% Insert hyphen if missing
MO5 = case re:run(MO3, "-", [{capture, none}]) of
nomatch ->
insert_hyphen(MO3);
match ->
MO3
end,

...


Mikael Pettersson pointed out that this really has less to do with immutable local variables and more to do with the lack of a let expression. That was insightful, and since a let expression can be considered syntactic sugar for a lambda expression, I realized that a parse transform could provide let like functionality. Let is a reserved keyword in Erlang so I used lyet instead.

Essentially the parse transform rewrites
lyet:lyet (A = B, C)
as
(fun (A) -> C end) (B)
so the above code could be rewritten as

Result = lyet:lyet (
% Take away underscore and replace with hyphen
MO = re:replace(MO, "_", "-", [{return, list}, global]),
MO = toupper(MO),
% Replace zeros
MO = re:replace(MO,
"RX0",
"RXO",
[{return, list}, global]),
% Insert hyphen if missing
case re:run(MO, "-", [{capture, none}]) of
nomatch ->
insert_hyphen(MO);
match ->
MO
end),

You must provide at least one argument to lyet:lyet. All but the last argument to lyet:lyet must be an assignment, and the last argument has to be a single expression (but you can use begin and end for a block of expressions inside the lyet). As you can see above, you can reuse a variable name across the assignment arguments to lyet:lyet. You can even use lyet:lyet on the right hand side of the assignments, or as part of the expression argument. Some examples of usage are present in the unit test.

Update: per Ulf's suggestion, the parse transform also recognizes the local call let_ in addition to the remote call lyet:lyet. It definitely looks nicer with let_.

The software is available on Google code.

Tuesday, May 19, 2009

Automatic .app file generation

At our startup, we have our own build system (framewerk) and our deployment framework (erlrc) which play well together. As I learned at the Bay Area Erlang Factory, mostly people have their own processes already, so what they want to extract some of the useful functionality and incorporate it into their way of doing things. This led to exposing automatic .appup file generation from erlrc in a reusable fashion; that technique requires .app files to be correct which is why we automatically generate them in framewerk. To compliment, I've isolated our automatic .app file generation and released it in a standalone form.

The escript is called fwte-makeappfile, and it basically takes a set of Erlang source code which comprise an application and does three things for you: 1) attempts to automatically discover registered processes, 2) attempts to automatically discover all the module names, and 3) attempts to automatically discover the start module for the application (if present). You can override these choices if you don't like them. Here's an example of how it works:

% ./fwte-makeappfile --application nitrogen --description 'Nitrogen Web Framework' --version '0.2009.05.11' ~/src/nitrogen-git/src/**/*.erl
{application,nitrogen,
[{description,"Nitrogen Web Framework"},
{vsn,"0.2009.05.11"},
{modules,[action_add_class,action_alert,action_animate,
action_appear,action_buttonize,action_comet_start,
action_confirm,action_disable_selection,action_effect,
action_event,action_fade,action_hide,
action_jquery_effect,action_remove_class,
action_script,action_show,action_toggle,
action_validate,action_validation_error,element_bind,
element_body,element_br,element_button,
element_checkbox,element_datepicker_textbox,
element_draggable,element_dropdown,element_droppable,
element_file,element_flash,element_google_chart,
element_gravatar,element_h1,element_h2,element_h3,
element_h4,element_hidden,element_hr,element_image,
element_inplace_textbox,element_label,
element_lightbox,element_link,element_list,
element_listitem,element_literal,element_p,
element_panel,element_password,element_placeholder,
element_radio,element_radiogroup,
element_rounded_panel,element_singlerow,
element_sortblock,element_sortitem,element_span,
element_spinner,element_table,element_tablecell,
element_tableheader,element_tablerow,element_template,
element_textarea,element_textbox,element_upload,
element_value,element_windex,element_wizard,mirror,
nitrogen,nitrogen_file,nitrogen_inets_app,
nitrogen_mochiweb_app,nitrogen_project,
nitrogen_yaws_app,sync,validator_confirm_password,
validator_custom,validator_is_email,
validator_is_integer,validator_is_required,
validator_js_custom,validator_max_length,
validator_min_length,web_x,wf,wf_bind,wf_cache,
wf_cache_server,wf_comet,wf_continuation,wf_convert,
wf_counter,wf_email,wf_handle,wf_handle_firstrequest,
wf_handle_postback,wf_handle_postback_multipart,
wf_http_basic_auth,wf_inets,wf_init,wf_mochiweb,
wf_multipart,wf_path,wf_platform,wf_platform_inets,
wf_platform_mochiweb,wf_platform_yaws,wf_query,
wf_redirect,wf_render,wf_script,wf_session,
wf_session_server,wf_session_sup,wf_state,wf_tags,
wf_utils,wf_validation,wf_yaws]},
{registered,[wf_session_server,wf_session_sup]},
{applications,[kernel,stdlib]},
{env,[]}]}
.
We use this in our build system, where the .app file is always automatically generated, but the developer can set overrides if the autodetection is f-ing up. I recommend this as a general strategy: you need to be able to manually specify for edge cases, but you don't want to count on your developers maintaining the routine cases correctly.