Tag Archives: transactions

erldis – an Erlang Redis Client

Since it’s now featured on the redis homepage, I figure I should tell people about my fork of erldis, an erlang redis client focused on synchronous operations.

Synchronicity

The original client, which still exists as erldis_client.erl, implements asynchronous pipelining. This means you send a bunch of redis commands, then collect all the results at the end. This didn’t work for me, as I needed a client that could handle parallel synchronous requests from multiple concurrent processes. So I copied erldis_client.erl to erldis_sync_client.erl and modified it to send replies back as soon as they are received from redis (in FIFO order). Many thanks to dialtone_ for writing the original erldis app as I’m not sure I would’ve created the synchronous client without it. And thanks to cstar for patches, such as making erldis_sync_client the default client for all functions in erldis.erl.

Extras

In addition to the synchronous client, I’ve added some extra functions and modules to make interfacing with redis more erlangy. Here’s a brief overview…

erldis_sync_client:transact

erldis_sync_client:transact is analagous to mnesia:transaction in that it does a unit of work against a redis database, like so:

  1. starts erldis_sync_client
  2. calls your function with the client PID as the argument
  3. stops the client
  4. returns the result of your function

The goal being to reduce boilerplate start/stop code.

erldis_dict module

erldis_dict provides similar semantics as the dict module in stdlib, using redis key-value commands.

erldis_list module

erldis_list provides a number of functions operating on redis lists, inspired by the array, lists, and queue modules in stdlib. You must pass in both the client PID and a redis list key.

erldis_sets module

erldis_sets works like the sets module, but you have to provide both the client PID and a redis set key.

Usage

Despite the low version numbers, I’ve been successfully using erldis as a component in parallel/distributed information retrieval (in conjunction with plists), and for accessing data shared with python / django apps. It’s a fully compliant erlang application that you can include in your target system release structure.

If also you’re using erldis for your redis needs, I’d love to hear about it.

How to Fix Erlang Out of Memory Crashes When Using Mnesia

If you’re getting erlang out of memory crashes when using mnesia, chances are you’re doing it wrong, for various values of it. These out of memory crashes look something like this:

Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 999999999 bytes of memory (of type "heap")

Possible Causes

  1. You’re doing it wrong
  2. Someone else is doing it wrong
  3. You don’t have enough RAM

While it’s possible that the crash is due to not having enough RAM, or that some other program or process is using too much RAM for itself, chances are it’s your fault.

One of the reasons these crashes can catch you by surprise is that the erlang VM is using a lot more memory than you might think. Erlang is a functional language with single assignment and no shared memory. A major consequence is that when you change a variable or send a message to another process, a new copy of the variable is created. So an operation as simple as dict:update_counter(“foo”, 1, Dict1) consumes twice the memory of Dict1 since Dict1 is copied to create the return value. And anything you do with ets, dets, or mnesia will result in at least 2 copies of every term: 1 copy for your process, and 1 copy for each table. This is because mnesia uses ets and/or dets for storage, which both use 1 process per table. That means every table operation results in a message pass, sending your term to the table or vice-versa. So that’s why erlang may be running out of memory. Here’s how to fix it.

Use Dirty Operations

If you’re doing anything in a transaction, try to figure out how to do it dirty, or at least move as many operations as possible out of the transaction. Mnesia transactions are separate processes with their own temporary ets tables. That means there’s the original term(s) that must be passed in to the transaction or read from other tables, any updated copies that your code creates, copies of terms that are written to the temporary ets table, the final copies of terms that are written to actual table(s) at the end of the transaction, and copies of any terms that are returned from the transaction process. Here’s an example to illustrate:

example() ->
    T = function() ->
        Var1 = mnesia:read(example_table, "foo"),
        Var2 = update(Var2), % a user-defined function to update Var1
        ok = mnesia:write(Var2),
        Var2
    end,
    {atomic, Var2} = mnesia:transaction(T),
    Var2.

First off, we already have a copy of Var1 in example_table. It gets sent to the transaction process when you do mnesia:read, creating a second copy. Var1 is then updated, resulting in Var2, which I’ll assume has the same memory footprint of Var1. So now we have 3 copies. Var2 is then written to a temporary ets table because mnesia:write is called within a transaction, creating a fourth copy. The transaction ends, sending Var2 back to the original process, and also overwriting Var1 with Var2 in example_table. That’s 2 more copies, resulting in a total of 6 copies. Let’s compare that to a dirty operation.

example() ->
    Var1 = mnesia:dirty_read(example_table, "foo"),
    Var2 = update(Var1),
    ok = mnesia:dirty_write(Var2),
    Var2.

Doing it dirty results in only 4 copies: the original Var1 in example_table, the copy sent to your process, the updated Var2, and the copy sent to mnesia to be written. Dirty operations like this will generally have 2/3 the memory footprint of operations done in a transaction.

Reduce Record Size

Figuring out how to reduce your record size by using different data structures can create huge gains by drastically reducing the memory footprint of each operation, and possibly removing the need to use transaction. For example, let’s say you’re storing a large record in mnesia, and using transactions to update it. If the size of the record grows by 1 byte, then each transactional operation like the above will require an additional 5 bytes of memory, or dirty operations will require an additional 3 bytes. For multi-megabyte records, this adds up very quickly. The solution is to figure how to break that record up into many small records. Mnesia can use any term as a key, so for example, if you’re storing a record with a dict in mnesia such as {dict_record, “foo”, Dict}, you can split that up into many records like [{tuple_record, {“foo”, Key1}, Val1}]. Each of these small records can be accessed independently, which could eliminate the need to use transactions, or at least drastically reduce the memory footprint of each transaction.

Iterate in Batches

Instead of getting a whole bunch of records from mnesia all at once, using mnesia:dirty_match_object or mnesia:dirty_select, iterate over the records in batches. This is analagous to using lists operations on mnesia tables. The match_object methods may return a huge number of records, and all those records have to be sent from the table process to your process, doubling the amount of memory required. By iteratively doing operations on batches of records, you’re only accessing a portion at a time, reducing the amount of memory being used at once. Here’s some code examples that only access 1 record at a time. Note that if the table changes during iteration, the behavior is undefined. You could also use the select operations to process records in batches of NObjects at a time.

Dirty Mnesia Foldl

dirty_foldl(F, Acc0, Table) ->
    dirty_foldl(F, Acc0, Table, mnesia:dirty_first(Table)).

dirty_foldl(_, Acc, _, '$end_of_table') ->
    Acc;
dirty_foldl(F, Acc, Table, Key) ->
    Acc2 = lists:foldl(F, Acc, mnesia:dirty_read(Table, Key)),
    dirty_foldl(F, Acc2, Table, mnesia:dirty_next(Table, Key)).

Dirty Mnesia Foreach

dirty_foreach(F, Table) ->
    dirty_foreach(F, Table, mnesia:dirty_first(Table)).

dirty_foreach(_, _, '$end_of_table') ->
    ok;
dirty_foreach(F, Table, Key) ->
    lists:foreach(F, mnesia:dirty_read(Table, Key)),
    dirty_foreach(F, Table, mnesia:dirty_next(Table, Key)).

Conclusion

  1. It’s probably your fault
  2. Do as little as possible inside transactions
  3. Use dirty operations instead of transactions
  4. Reduce record size
  5. Iterate in small batches