Wednesday, August 26, 2009

Dynamically Loading Webmachine Resources

I've been using webmachine lately, which is fabulous for REST server development. It has a modular concept of resources and the design could be summarized as a "RESTlet container". Because Erlang has hot code loading I was interested in doing this dynamically; in other words, I want to write a module which provides a webmachine resource, hot code load it into the system, and have webmachine start dispatching requests to it.

The good news is, you can change the dispatch list at any time, via
application:set_env (webmachine, dispatch_list, NewDispatchList).
This does not upset webmachine and it will see the change immediately. So for a single application leveraging webmachine, this seemed as simple as having a code_change handler execute something like
application:set_env
(webmachine,
dispatch_list,
lists:foldl (fun (Mod, Acc) ->
case catch Mod:dispatch_rules () of
{ 'EXIT', _ } -> Acc;
X -> X ++ Acc
end
end,
[],
element (2, application:get_key (drurly, modules)))).
(drurly is the name of the application in this case). Now my modules export a function like

dispatch_rules () ->
[ { [ "clip" ], ?MODULE, [] },
{ [ "clip", id ], ?MODULE, [] }
].
if they are webmachine resources.

The problem is that order is important: the webmachine_dispatch_list is consulted in order, and the first match is executed. To solve this I created a sort function which orders the dispatch rules by "specificity"
path_spec_priority ('*') -> 3;
path_spec_priority (X) when is_atom (X) -> 2;
path_spec_priority (X) when is_list (X) -> 1.

dispatch_specificity ({ PathSpecA, _, _ },
{ PathSpecB, _, _ }) ->
case erlang:length (PathSpecA) - erlang:length (PathSpecB) of
X when X > 0 ->
true;
X when X < 0 ->
false;
_ ->
PrioPathSpecA = [ path_spec_priority (X) || X <- PathSpecA ],
PrioPathSpecB = [ path_spec_priority (X) || X <- PathSpecB ],

case PrioPathSpecA =< PrioPathSpecB of
false ->
false;
true ->
FullPathSpecA = [ { path_spec_priority (X), X } || X <- PathSpecA ],
FullPathSpecB = [ { path_spec_priority (X), X } || X <- PathSpecB ],

FullPathSpecA =< FullPathSpecB
end
end.
Basically:
  • Longer dispatch paths come first.
  • If two dispatch paths have equal length, the more specific one comes first, where specificity is defined by examining the elements left-to-right, with
    • string literals are most specific
    • atoms except '*' are the next most specific
    • '*' is the least specific
  • If two dispatch paths have equal length and specificity, sort by Erlang term order (effectively, break ties arbitrarily)
The code change handler now becomes:
application:set_env
(webmachine,
dispatch_list,
lists:sort
(fun dispatch_specificity/2,
lists:foldl (fun (Mod, Acc) ->
case catch Mod:dispatch_rules () of
{ 'EXIT', _ } -> Acc;
X -> X ++ Acc
end
end,
[],
element (2, application:get_key (drurly, modules))))).
This has proven sufficient for a single application running webmachine.

For multiple applications that want to run under the same webmachine, I suspect the right way to go is to have a gen_server which
  • contains webmachine dispatch configuration "fragments" per key, where the key is intended to be the application name;
  • accepts commands to replace or delete a particular fragment by key;
  • for any replace or delete, rebuilds the complete dispatch list by concatenating all fragments together and sorting by specificity, and then updates the webmachine application env var.
For multiple applications that want to run under different webmachines, well unfortunately webmachine uses some global application settings under the fixed atom webmachine and thus currently, like the highlander, there can be only one. (In fact, you can only listen on one ip/port combination with webmachine right now. I might have to patch it to accept multiple ip/port combinations to listen to, since a standard trick of mine is to have nginx handle both regular and ssl connections and connect to a different back-end port to indicate whether or not the connection is secure.)

Saturday, August 22, 2009

Metaprogramming with ct_expand

Yesterday I posted about putting arbitrary Erlang terms into HTTP cookies. I suggested that constructing the base64 codec at compile time was an excellent application for ct_expand, but I didn't provide any details. Therefore, here is a follow-up.

ct_expand provides a simple interface to Erlang metaprogramming: evaluating an arbitrary Erlang term at compile time and substituting the results into the source code during compilation. This is especially useful for initializing a data structure at compile time, and in particular, it can be used to construct the forward and inverse maps for the base64 codec. We will specify our codec by providing a list of 64 unique characters, and ct_expand will do the rest. The following code is functionally identical to the termcookie module presented previously.

-module (termcookie2).
-compile ({ parse_transform, ct_expand }).
-export ([ decode/2,
encode/2 ]).

-define (CODEC, "abcdefghijklmnopqrstuvwxyz"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"0123456789.,").

%
% Public
%

decode (Encoded, Secret) when is_binary (Encoded) ->
<<Signature:28/binary, Payload/binary>> = Encoded,
Signature = to_base64 (crypto:sha ([ Payload, Secret ])),
erlang:binary_to_term (from_base64 (Payload)).

encode (Term, Secret) ->
Payload =
to_base64
(erlang:term_to_binary (Term,
[ compressed,
{ minor_version, 1 } ])),
Signature = to_base64 (crypto:sha ([ Payload, Secret ])),
<<Signature/binary, Payload/binary>>.

%
% Private
%

to_base64 (Bin) when (8 * byte_size (Bin)) rem 6 =:= 0 ->
to_base64_padded (Bin);
to_base64 (Bin) when (8 * byte_size (Bin)) rem 6 =:= 2 ->
to_base64_padded (<<Bin/binary, 0:16>>);
to_base64 (Bin) when (8 * byte_size (Bin)) rem 6 =:= 4 ->
to_base64_padded (<<Bin/binary, 0:8>>).

to_base64_padded (Bin) ->
<< <<(element (N + 1,
ct_expand:term (
begin
64 = length (?CODEC),
64 = length (lists:usort (?CODEC)),
list_to_tuple (?CODEC)
end
)
)
):8>>
|| <<N:6>> <= Bin >>.

from_base64 (Bin) ->
<< <<(element
(N + 1,
ct_expand:term
(element
(2,
lists:foldl
(fun (X, { K, T }) ->
{ K + 1, setelement (X + 1, T, K) }
end,
{ 0, erlang:make_tuple (256, -1) },
?CODEC)
)
)
)
):6>>
|| <<N:8>> <= Bin >>.

Some notes:
  • Lines 6-8 contain the specification of the codec.
  • Lines 43-44 are compile-time assertions on the codec specification, namely that it consists of 64 unique characters. If you modify the specification to violate these assertions, the module will not compile, although the resulting error message will be nearly unintelligible (try it!).
  • Line 45 constructs the forward mapping from the specification; the result is a (constant) tuple which is consulted at run time.
  • Lines 56-65 construct the inverse mapping from the specification; the result is a (constant) tuple which is consulted at run time.
Note you cannot use any functions from the current module inside the ct_expand:term/1 argument, because the current module has not been compiled yet! If you need to do something really complicated and don't like an unwieldy inline expression you can place helper code in a separate module which is compiled first.

The resulting software is easier to maintain than the original version, because if the codec needs to be changed, only the specification is modified. For instance we can replace lines 5-8 with

-define (CHARS, "abcdefghijklmnopqrstuvwxyz"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"0123456789.,").
-define (CODEC,
ct_expand:term (
fun () ->
random:seed (1, 2, 3),
[ X ||
{ _, X } <-
lists:sort (
[ { random:uniform (), Y }
|| Y <- ?CHARS
]
)
]
end ()
)).

which results in a proper codec utilizing a permutation of the original codec. Note the call to random:seed/3 is happening at compile time, and is setting the random seed of the compilation process. Therefore this is a stable codec definition. (Unfortunately, it doesn't really increase the opacity of the encoding scheme; the bits in an encoded Erlang term are highly degenerate so any interested party would be able to deduce the permutation given enough cookies).

Friday, August 21, 2009

Erlang Terms in Cookies

Erlang is an increasingly popular choice for web development. Such projects tend to heavily leverage HTTP cookies, and because Erlang defines an external format for any Erlang term, it turns out to be very easy to store an arbitrary term with a cookie (within cookie size limitations), which can be a useful trick.

The technique outlined here generates a signed but not encrypted cookie. That means it's fairly simple for anyone possessing one of your cookies to determine the contents, but difficult for them to forge a novel cookie. The latter is typically important for the application, but has additional importance here because we will be calling erlang:binary_to_term/1 on the cookie value and passing arbitrary data to erlang:binary_to_term/1 is a bad idea (for example, this could cause an extremely large memory allocation).

Here's the code:

-module (termcookie).
-export ([ decode/2,
encode/2 ]).

%
% Public
%

decode (Encoded, Secret) when is_binary (Encoded) ->
<<Signature:28/binary, Payload/binary>> = Encoded,
Signature = to_base64 (crypto:sha ([ Payload, Secret ])),
erlang:binary_to_term (from_base64 (Payload)).

encode (Term, Secret) ->
Payload =
to_base64
(erlang:term_to_binary (Term,
[ compressed,
{ minor_version, 1 } ])),
Signature = to_base64 (crypto:sha ([ Payload, Secret ])),
<<Signature/binary, Payload/binary>>.

%
% Private
%

to_base64 (Bin) when (8 * byte_size (Bin)) rem 6 =:= 0 ->
to_base64_padded (Bin);
to_base64 (Bin) when (8 * byte_size (Bin)) rem 6 =:= 2 ->
to_base64_padded (<<Bin/binary, 0:16>>);
to_base64 (Bin) when (8 * byte_size (Bin)) rem 6 =:= 4 ->
to_base64_padded (<<Bin/binary, 0:8>>).

to_base64_padded (Bin) ->
<< <<(to_base64_char (N)):8>> || <<N:6>> <= Bin >>.

to_base64_char (N) when N >= 0, N =< 25 -> $a + N;
to_base64_char (N) when N >= 26, N =< 51 -> $A + (N - 26);
to_base64_char (N) when N >= 52, N =< 61 -> $0 + (N - 52);
to_base64_char (62) -> $.;
to_base64_char (63) -> $,.

from_base64 (Bin) ->
<< <<(from_base64_char (N)):6>> || <<N:8>> <= Bin >>.

from_base64_char (N) when N >= $a, N =< $z -> N - $a;
from_base64_char (N) when N >= $A, N =< $Z -> 26 + (N - $A);
from_base64_char (N) when N >= $0, N =< $9 -> 52 + (N - $0);
from_base64_char ($.) -> 62;
from_base64_char ($,) -> 63.
We are taking advantage of the fact that erlang:binary_to_term/1 will ignore extra bytes at the end, which allows us to mindlessly pad for base 64 encoding.

If you really like to squeeze the last few drops of efficiency out of code, you can change those to_base64_char/1 and from_base64_char/1 functions into tuple lookups. If you are extra cool you can use Ulf Wiger's ct_expand parse transform to construct the tuples at compile time from a specified character list.

This code will throw an exception if anything is amiss with the input, including a signature fail.

% erl
Erlang (BEAM) emulator version 5.6.5 [source] [async-threads:0] [kernel-poll:false]

Eshell V5.6.5 (abort with ^G)
1> crypto:start ().
ok
2> termcookie:encode ({ "omg", erlang, rulz }, "wazzup").
<<"nQCwmuMgeK3bTPzBqKDSmSylIciaG2GdAWadB21NzaagzxjSyw5NzaaeCNvSEGaa">>
3> termcookie:decode (termcookie:encode ({ "omg", erlang, rulz }, "wazzup"), "wazzup").
{"omg",erlang,rulz}
4> termcookie:decode (termcookie:encode ({ "omg", erlang, rulz }, "huh"), "wazzup").
** exception error: no match of right hand side value <<"nQCwmuMgeK3bTPzBqKDSmSylIcia">>
in function termcookie:decode/2

Saturday, August 8, 2009

Implementing a DSL in Erlang

If you are willing to abide by Erlang syntax, then you can leverage erl_parse, erl_scan, and erl_eval to quickly whip up a domain specific language (DSL). You can manipulate the semantics via a combination of transformations on the parse tree (with the help of erl_syntax) and interception of function calls (which comes with erl_eval).

For instance, suppose we want a domain specific language which is just like Erlang, except that it has destructive assignment. This can be done in three steps: 1) parse the input using the Erlang parser, 2) transform the parse tree so that match expressions are rewritten as a special local function call, and 3) eval the result, intercept the special local function call and implement the destructive assignment by recursive eval.

Here's the code:

-module (dsl).
-export ([ compile/1,
run/2 ]).

%-========================================================-
%- Public -
%-========================================================-

compile (String) ->
{ ok, Scanned, _ } =
erl_scan:string (maybe_append_dot (String)),
{ ok, Parsed } = erl_parse:parse_exprs (Scanned),
{ dsl, transform (Parsed) }.

run ({ dsl, Parsed }, Bindings) ->
eval (Parsed, Bindings).

%-========================================================-
%- eval callbacks -
%-========================================================-

local ('_assign', [ Pattern, Expr ], Bindings) ->
{ value, Rhs, NewBindings } =
eval ([ Expr ], Bindings),
MatchExpr =
erl_syntax:match_expr (Pattern,
erl_syntax:abstract (Rhs)),
{ value, _Lhs, MatchBindings } =
eval ([ erl_syntax:revert (MatchExpr) ],
erl_eval:new_bindings ()),
{ value,
Rhs,
destructive_merge (NewBindings, MatchBindings) }.

%-========================================================-
%- Private -
%-========================================================-

destructive_merge (Bindings, MatchBindings) ->
lists:foldl
(fun ({ Key, Val }, Acc) ->
erl_eval:add_binding
(Key,
Val,
erl_eval:del_binding (Key, Acc))
end,
Bindings,
erl_eval:bindings (MatchBindings)).

eval (Parsed, Bindings) ->
erl_eval:exprs (Parsed,
Bindings,
{ eval, fun local/3 }).

maybe_append_dot (String) ->
case lists:last (string:strip (String,
right)) of
$. -> String;
_ -> String ++ "."
end.

postorder (F, Tree) ->
F (case erl_syntax:subtrees (Tree) of
[] ->
Tree;
List ->
erl_syntax:update_tree
(Tree,
[[ postorder (F, Subtree)
|| Subtree <- Group ]
|| Group <- List ])
end).

transform (Parsed) ->
F = fun (Node) ->
case erl_syntax:type (Node) of
% match_expr: rewrite as (destructive) assignment
match_expr ->
erl_syntax:application
(none,
erl_syntax:atom ("_assign"),
[ erl_syntax:match_expr_pattern (Node),
erl_syntax:match_expr_body (Node) ]);
_ ->
Node
end
end,

[ erl_syntax:revert (postorder (F, X))
|| X <- Parsed ].

dsl:compile/1 scans and parses the string in the standard way, but transforms the resulting parse tree rewriting match expressions as calls to a fictitious local function _assign/2. This local function call is intercepted by specifying a local function handler to erl_eval:exprs/3. At that point (lines 22-33), the sequence is: 1) evaluate the right hand side using the current bindings to reduce the right hand side to a constant, 2) evaluate a match of the left hand side to the constant expression given an empty set of bindings, and 3) merge the new bindings with the existing bindings, overwriting as necessary. By proceeding in this fashion, we can still error out if we have a structural mismatch, or if an unbound variable is used on a right hand side.

% erl
Erlang (BEAM) emulator version 5.6.5 [source] [async-threads:0] [kernel-poll:false]

Eshell V5.6.5 (abort with ^G)
1> c (dsl).
{ok,dsl}
2> dsl:run (dsl:compile ("A = 7, A = A + 3"), erl_eval:new_bindings ()).
{value,10,[{'A',10}]}
3> dsl:run (dsl:compile ("B = 7, A = A + 3"), erl_eval:new_bindings ()).
** exception error: variable 'A' is unbound
4> dsl:run (dsl:compile ("A = 7, { _, A } = A + 3"), erl_eval:new_bindings ()).
** exception error: no match of right hand side value 10