Category Archives: erlang

Mnesia Records to MongoDB Documents

I recently migrated about 50k records from mnesia to MongoDB using my fork of emongo, which adds supervisors with transparent connection restarting, for reasons I’ll explain below.

Why Mongo instead of Mnesia

mnesia is great for a number of reasons, but here’s why I decided to move weotta’s place data into MongoDB:

Converting Records to Docs and vice versa

First, I needed to convert records to documents. In erlang, mongo documents are basically proplists. Keys going into emongo can be atoms, strings, or binaries, but keys coming out will always by binaries. Here’s a simple example of record to document conversion:

record_to_doc(Record, Attrs) ->
    % tl will drop record name
    lists:zip(Attrs, tl(tuple_to_list(Record))).

This would be called like record_to_doc(MyRecord, record_info(fields, my_record)). If you have nested dicts then you’ll have to flatten them using dict:to_list. Also note that list values are coming out of emongo are treated like yaws JSON arrays, i.e. [{key, {array, [val]}}]. For more examples, check out the emongo docs.

Heavy Write Load

To do the migration, I used etable:foreach to insert each document. Bulk insertion would probably be more efficient, but etable makes single record iteration very easy.

I started using the original emongo with a pool size of 10, but it was crashy when I dumped records as fast as possible. So initially I slowed it down with timer:sleep(200), but after adding supervised connections, I was able to dump with no delay. I’m not exactly sure what I fixed in this case, but I think the lesson is that using supervised gen_servers will give you reliability with little effort.

Read Performance

Now that I had data in mongo to play with, I compared the read performance to mnesia. Using timer:tc, I found that mnesia:dirty_read takes about 21 microseconds, whereas emongo:find_one can take anywhere from 600 to 1200 microseconds, querying on an indexed field. Without an index, read performance ranged from 900 to 2000 microseconds. I also tested only requesting specific fields, as recommended on the MongoDB Optimiziation page, but with small documents (<10 fields) that did not seem to have any effect. So while mongodb queries are pretty fast at 1ms, mnesia is about 50 times faster. Further inspection with fprof showed that nearly half of the cpu time of emongo:find is taken by BSON decoding.

Heavy Read Load

Under heavy read load (thousands of find_one calls in less than second), emongo_conn would get into a locked state. Somehow the process had accumulated unparsable data and wouldn’t reply. This problem went away when I increased the size of the pool size to 100, but that’s a ridiculous number of connections to keep open permanently. So instead I added some code to kill the connection on timeout and retry the find call. This was the main reason I added supervision. Now, every pool is locally registered as a simple_one_for_one supervisor that supervises every emongo_server connection. This pool is in turn supervised by emongo_sup, with dynamically added child specs. All this supervision allowed me to lower the pool size back to 10, and made it easy to kill and restart emongo_server connections as needed.

Why you may want to stick with Mnesia

Now that I have experience with both MongoDB and mnesia, here’s some reasons you may want to stick with mnesia:

Despite all that, I’m very happy with MongoDB. Installation and setup were a breeze, and schema-less data storage is very nice when you have variable fields and a high probability of adding and/or removing fields in the future. It’s simple, scalable, and as mentioned above, it’s very easy to access from many different languages. emongo isn’t perfect, but it’s open source and will hopefully benefit from more exposure.

erldis – an Erlang Redis Client

Since it’s now featured on the redis homepage, I figure I should tell people about my fork of erldis, an erlang redis client focused on synchronous operations.

Synchronicity

The original client, which still exists as erldis_client.erl, implements asynchronous pipelining. This means you send a bunch of redis commands, then collect all the results at the end. This didn’t work for me, as I needed a client that could handle parallel synchronous requests from multiple concurrent processes. So I copied erldis_client.erl to erldis_sync_client.erl and modified it to send replies back as soon as they are received from redis (in FIFO order). Many thanks to dialtone_ for writing the original erldis app as I’m not sure I would’ve created the synchronous client without it. And thanks to cstar for patches, such as making erldis_sync_client the default client for all functions in erldis.erl.

Extras

In addition to the synchronous client, I’ve added some extra functions and modules to make interfacing with redis more erlangy. Here’s a brief overview…

erldis_sync_client:transact

erldis_sync_client:transact is analagous to mnesia:transaction in that it does a unit of work against a redis database, like so:

  1. starts erldis_sync_client
  2. calls your function with the client PID as the argument
  3. stops the client
  4. returns the result of your function

The goal being to reduce boilerplate start/stop code.

erldis_dict module

erldis_dict provides similar semantics as the dict module in stdlib, using redis key-value commands.

erldis_list module

erldis_list provides a number of functions operating on redis lists, inspired by the array, lists, and queue modules in stdlib. You must pass in both the client PID and a redis list key.

erldis_sets module

erldis_sets works like the sets module, but you have to provide both the client PID and a redis set key.

Usage

Despite the low version numbers, I’ve been successfully using erldis as a component in parallel/distributed information retrieval (in conjunction with plists), and for accessing data shared with python / django apps. It’s a fully compliant erlang application that you can include in your target system release structure.

If also you’re using erldis for your redis needs, I’d love to hear about it.

Erlang Webserver & Database Links

Erlang databases:

Erlang web servers and frameworks:

Erlang Release Handling with Fab and Reltools

You’ve already got a first target system installed, and now you’ve written some new code and want to deploy it. This article will show you how to setup make and fab commands that use reltools to build & install new releases.

Appup

Your code should be part of an OTP application structure. Additionally, you will need an appup file in the ebin/ directory for each application you want to upgrade. There’s a lot you can do in an appup file:

Once you’ve updated app files with the newest version and configuration and created appup files with all the necessary commands, you’re ready to create a new release.

Note: The app configuration will always be updated to the newest version, even if you have no appup commands.

Release

To create a new release, you’ll need a new rel file, which I’ll refer to as NAME-VSN.rel. VSN should be greater than your previous release version. My usual technique is to copy my latest rel file to NAME-VSN.rel, then update the release VSN and all the application versions.

Note: reltools assumes that the rel file will be in $ROOTDIR/releases/, where $ROOTDIR defaults to code:root_dir(). This path is also used below in the make and fab commands. You can pass a different value for $ROOTDIR, but releases/ is hard coded. This may change in the future, but for now your rel files must be in $ROOTDIR/releases/ if you want to use reltools.

Reltools

Before you finalize the new release, make sure reltools is in your code path. There 2 ways to do this:

  1. Make a copy of reltools and add it to your application.
  2. Clone elib and add it to your code path with erl -pa PATH/TO/elib/ebin.

If you choose option 1, be sure to include reltools in your app modules, and add it to your appup file with {add_module, reltools}.

But I’ll assume you want option 2 because it provides cleaner code separation and easier release handling. Keeping elib external means you can easily pull new code, and only need to add the elib application to your rel file with the latest vsn.

Make Upgrade

Now that you have a new release defined, and elib is in your code path, you’re ready to build release upgrade packages. Below is the make command I use to call reltools:make_upgrade("NAME-VSN"). Be sure to update PATH/TO/ to your particular code paths.

ERL=erl # use default erl command, but can override path on command line

src: FORCE
    @$(ERL) -pa lib/*/ebin -make # requires an Emakefile

upgrade: src
    @$(ERL) -noshell \                          # run erlang with no shell
        -pa lib/*/ebin \                        # include your local code repo
        -pa PATH/TO/elib/ebin \                 # include elib
        -pa PATH/TO/erlang/lib/*/ebin \         # include local erlang libs
        -run reltools make_upgrade $(RELEASE) \ # run reltools:make_upgrade
        -s init stop                            # stop the emulator when finished

FORCE: # empty rule to force run of erl -make

Using the above make rules, you can do make upgrade RELEASE=PATH/TO/releases/NAME-VSN to build a release upgrade package. Once you can do this locally, you can use fab to do remote release builds and installs. But in order to build a release remotely, you need to get the code onto the server. There are various ways to do this, the simplest being to clone your repo on the remote server(s), and push your updates to each one.

fab release build install

Below is an example fabfile.py for building and installing releases remotely using fab. Add your own hosts and roles as needed.

PATH/TO/TARGET should be the path to your first target system.

release is a separate command so that it you are only asked for NAME-VSN once, no matter how many hosts you build and install on.

build will run make upgrade RELEASE=releases/NAME-VSN on the remote system, using the target system’s copy of erl. Theoretically, you could build a release package once, then distribute it to each target system’s releases/ directory. But that requires each target system being exactly the same, with all the same releases and applications installed. If that’s the case, modify the above recipe to run build on a single build server, have it put the release package into all the other node’s releases/ directory, then run install on each node.

install uses _rpcall to run rpc:call(NODE@HOST, reltools, install_release, ["NAME-VSN"]). I’ve kept _rpcall separate so you can see how to define your own fab commands by setting env.mfa.

from fabric.api import env, prompt, require, run

env.erl = 'PATH/TO/TARGET/bin/erl'

def release():
    '''Prompt for release NAME-VSN. rel file must be in releases/.'''
    prompt('Specify release as NAME-VERSION:', 'release',
        validate=r'^\w+-\d+(\.\d+)*$')

def build():
    '''Build upgrade release package.'''
    require('release')
    run('cd PATH/TO/REPO && hg up && make upgrade ERL=%s RELEASE=releases/%s' % (env.erl, env.release))

def install():
    '''Install release to running node.'''
    require('release')
    env.mfa = 'reltools,install_release,["%s"]' % env.release
    _rpccall()

def _rpccall():
    require('mfa')
    evalstr = 'io:format(\"~p~n\", [rpc:call(NODE@%s, %s)])' % (env.host, env.mfa)
    # NOTE: local user must have same ~/.erlang.cookie as running nodes
    run("%s -noshell -sname fab -eval '%s' -s init stop" % (env.erl, evalstr))

Workflow

Once you’ve updated your Makefile and created fabfile.py, your workflow can be something like this:

  1. Write new application code.
  2. Update the app and appup files for each application to upgrade.
  3. Create a new rel file as releases/NAME-VSN.rel.
  4. Commit and push your changes.
  5. Run fab release build install.
  6. Enter NAME-VSN for your new release.
  7. Watch your system hot upgrade in real-time :)

Troubleshooting

Sometimes reltools:install_release(NAME-VSN) can fail, usually when the release_handler can’t find an older version of your code. In this case, your new release will be unpacked but not installed. You can see the state of all the known releases using release_handler:which_releases().. This can usually be fixed by removing old releases and trying again. Shell into your target system and do something like this (where OLDVSN is the VSN of a release marked as old):

See the release_handler manual for more information.

release_handler:remove_release("OLDVSN"). % repeat as necessary
release_handler:install_release("VSN").
release_handler:make_permanent("VSN").

How to Create an Erlang First Target System

One of the neatest aspects of erlang is the OTP release system, which allows you to do real-time upgrades of your application code. But before you can take advantage of it, you need to create a embedded first target system. Unfortunately, the documentation can be quite hard to follow, so this is my attempt at clearly explaining how to create your own first target system. At 12 steps, it’s definitely not simple, but you only have to get it right once :)

Assumptions

  • You’re running linux/unix, probably Ubuntu.
  • You already have the desired version of erlang installed. I’ll refer to the install dir as $ERLDIR, which should be the same as code:root_dir(). The latest release, as of 6/1/2009, is R13B01, with erts-5.7.2.
  • You have your own application code that you want to include in the target system. These apps are located in $REPODIR/lib/, follow the OTP directory structure, and have app files in ebin/.

Steps

  1. Create the initial release resource file , which I’ll refer to as FIRST.rel. I’ll also assume the release version is 1.0. The rel file should include your own applications as well as any OTP applications your code depends on. Put FIRST.rel in the directory you want to use for creating your target system, such as /tmp/build/. Warning: do not put this file in $REPODIR/releases/. Otherwise step 5 will not work because systools will have issues creating the package.
  2. Optional: Create sys.config in the same directory as FIRST.rel. sys.config can be used to override the default application configuration for any application include in the release.
  3. Open an erlang console in the same directory as FIRST.rel. This directory is where the target system will be created.
  4. Call systools:make_script("FIRST", [no_module_tests, {path, ["$REPODIR/lib/*/ebin"]}]). This will create a boot script for the target system. The script file must be created for the next step to work.
  5. Call systools:make_tar("FIRST", [no_module_tests, {path, ["$REPODIR/lib/*/ebin"]}, {dirs, [include, src]}, {erts, "$ERLDIR"}]). This will create a release package containing your code and include files, plus all the .beam files for the included OTP applications. Noteno_module_tests will ignore errors that don’t matter, such as missing src code, which is common for OTP apps.
  6. Exit the console. You should find FIRST-1.0.tar.gz in your current directory. Ideally, this would be the last step, but more likely, you’ll need to do the customizations covered below. Unpack the tarball into your target directory and cd into it. For a different take on these first steps, check out An Introduction to Releases with Erlybank.
  7. Copy erts-5.7.2/bin/start into bin/ (if bin/ doesn’t exist, create it). Edit bin/start and set the ROOTDIR to your target directory (which should also be your current directory). This is the same $ROOTDIR referred to below. Also copy erts-5.7.2/bin/run_erl and erts-5.7.2/bin/start_erl into bin/, then do mkdir log (or change the paths at the bottom of bin/start). At this point, you may also want to add your own emulator flags, such as -sname NODE -smp auto -setcookie MYCOOKIE +A 128.
  8. Copy erts-5.7.2/bin/erl into bin/ and set the same ROOTDIR as you did in bin/start.
  9. Copy $ERLDIR/bin/start_clean.boot or $ERLDIR/bin/start_sasl.boot to bin/start.boot. I like using start_sasl.boot since it provides more logging. But if you don’t want extra logging, use start_clean.boot.
  10. Run echo "5.7.2 1.0" > releases/start_erl.data. This tells erlang which version of erts to run, and which release version to use at startup.
  11. Run bin/erl and call release_handler:create_RELEASES("$ROOTDIR", "$ROOTDIR/releases/", "$ROOTDIR/releases/FIRST.rel", []). Exit the console, and there should be a file releases/RELEASES containg a spec.
  12. That’s it, you’re done! At this point you should be able to run bin/start, then use bin/to_erl to get the console (Ctrl-D to exit). If you want to deploy to other nodes, you can repack the target system, distribute it to each node, then unpack it and run bin/start. If you do distribute to other nodes, make sure to unpack in the same location on each node, otherwise you’ll have to go back to step 7 and modify ROOTDIR.

Fin

At this point you should have customized, self-contained erlang target system that you can distribute and run on all your nodes. Now you can finally take advantage of release handling with hot code swapping. In an upcoming article, I’ll cover how to deploy release upgrades using reltools and fab.

mapfilter

Have you ever mapped a list, then filtered it? Or filtered first, then mapped? Why not do it all in one pass with mapfilter?

mapfilter?

mapfilter is a function that combines the traditional map & filter of functional programming by using the following logic:

  1. if your function returns false, then the element is discarded
  2. any other return value is mapped into the list

Why?

Doing a map and then a filter is O(2N), whereas mapfilter is O(N). That’s twice as a fast! If you are dealing with large lists, this can be a huge time saver. And for the case where a large list contains small IDs for looking up a larger data structure, then using mapfilter can result in half the number of database lookups.

Obviously, mapfilter won’t work if you want to produce a list of boolean values, as it would filter out all the false values. But why would you want to map to a list booleans?

Erlang Code

Here’s some erlang code I’ve been using for a while:

mapfilter(F, List) -> lists:reverse(mapfilter(F, List, [])).

mapfilter(_, [], Results) ->
        Results;
mapfilter(F, [Item | Rest], Results) ->
        case F(Item) of
                false -> mapfilter(F, Rest, Results);
                Term -> mapfilter(F, Rest, [Term | Results])
        end.

Has anyone else done this for themselves? Does mapfilter exist in any programming language? If so, please leave a comment. I think mapfilter is a very simple & useful concept that should be a included in the standard library of every (functional) programming language. Erlang already has mapfoldl (map-reduce in one pass), so why not also have mapfilter?

Static Analysis of Erlang Code with Dialyzer

Dialyzer is a tool that does static analysis of your erlang code. It’s great for identifying type errors and unreachable code. Here’s how to use it from the command line.

dialyzer -r PATH/TO/APP -I PATH/TO/INCLUDE

Pretty simple! PATH/TO/APP should be an erlang application directory containing your ebin/ and/or src/ directories. PATH/TO/INCLUDE should be a path to a directory that contains any .hrl files that need to be included. The -I is optional if you have no include files. You can have as many -r and -I options as you need. If you add -q, then dialyzer runs more quietly, succeeding silently or reporting any errors found.

If you have a test/ directory with Common Test suites, then you’ll want to add “-I /usr/lib/erlang/lib/test_server*/include/” and “-I /usr/lib/erlang/lib/common_test*/include/”. I’ve actually set this up in my Makefile to run as make check. It’s been great for catching bad return types, misspellings, and wrong function parameters.

How to Fix Erlang Out of Memory Crashes When Using Mnesia

If you’re getting erlang out of memory crashes when using mnesia, chances are you’re doing it wrong, for various values of it. These out of memory crashes look something like this:

Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 999999999 bytes of memory (of type "heap")

Possible Causes

  1. You’re doing it wrong
  2. Someone else is doing it wrong
  3. You don’t have enough RAM

While it’s possible that the crash is due to not having enough RAM, or that some other program or process is using too much RAM for itself, chances are it’s your fault.

One of the reasons these crashes can catch you by surprise is that the erlang VM is using a lot more memory than you might think. Erlang is a functional language with single assignment and no shared memory. A major consequence is that when you change a variable or send a message to another process, a new copy of the variable is created. So an operation as simple as dict:update_counter(“foo”, 1, Dict1) consumes twice the memory of Dict1 since Dict1 is copied to create the return value. And anything you do with ets, dets, or mnesia will result in at least 2 copies of every term: 1 copy for your process, and 1 copy for each table. This is because mnesia uses ets and/or dets for storage, which both use 1 process per table. That means every table operation results in a message pass, sending your term to the table or vice-versa. So that’s why erlang may be running out of memory. Here’s how to fix it.

Use Dirty Operations

If you’re doing anything in a transaction, try to figure out how to do it dirty, or at least move as many operations as possible out of the transaction. Mnesia transactions are separate processes with their own temporary ets tables. That means there’s the original term(s) that must be passed in to the transaction or read from other tables, any updated copies that your code creates, copies of terms that are written to the temporary ets table, the final copies of terms that are written to actual table(s) at the end of the transaction, and copies of any terms that are returned from the transaction process. Here’s an example to illustrate:

example() ->
    T = function() ->
        Var1 = mnesia:read(example_table, "foo"),
        Var2 = update(Var2), % a user-defined function to update Var1
        ok = mnesia:write(Var2),
        Var2
    end,
    {atomic, Var2} = mnesia:transaction(T),
    Var2.

First off, we already have a copy of Var1 in example_table. It gets sent to the transaction process when you do mnesia:read, creating a second copy. Var1 is then updated, resulting in Var2, which I’ll assume has the same memory footprint of Var1. So now we have 3 copies. Var2 is then written to a temporary ets table because mnesia:write is called within a transaction, creating a fourth copy. The transaction ends, sending Var2 back to the original process, and also overwriting Var1 with Var2 in example_table. That’s 2 more copies, resulting in a total of 6 copies. Let’s compare that to a dirty operation.

example() ->
    Var1 = mnesia:dirty_read(example_table, "foo"),
    Var2 = update(Var1),
    ok = mnesia:dirty_write(Var2),
    Var2.

Doing it dirty results in only 4 copies: the original Var1 in example_table, the copy sent to your process, the updated Var2, and the copy sent to mnesia to be written. Dirty operations like this will generally have 2/3 the memory footprint of operations done in a transaction.

Reduce Record Size

Figuring out how to reduce your record size by using different data structures can create huge gains by drastically reducing the memory footprint of each operation, and possibly removing the need to use transaction. For example, let’s say you’re storing a large record in mnesia, and using transactions to update it. If the size of the record grows by 1 byte, then each transactional operation like the above will require an additional 5 bytes of memory, or dirty operations will require an additional 3 bytes. For multi-megabyte records, this adds up very quickly. The solution is to figure how to break that record up into many small records. Mnesia can use any term as a key, so for example, if you’re storing a record with a dict in mnesia such as {dict_record, “foo”, Dict}, you can split that up into many records like [{tuple_record, {“foo”, Key1}, Val1}]. Each of these small records can be accessed independently, which could eliminate the need to use transactions, or at least drastically reduce the memory footprint of each transaction.

Iterate in Batches

Instead of getting a whole bunch of records from mnesia all at once, using mnesia:dirty_match_object or mnesia:dirty_select, iterate over the records in batches. This is analagous to using lists operations on mnesia tables. The match_object methods may return a huge number of records, and all those records have to be sent from the table process to your process, doubling the amount of memory required. By iteratively doing operations on batches of records, you’re only accessing a portion at a time, reducing the amount of memory being used at once. Here’s some code examples that only access 1 record at a time. Note that if the table changes during iteration, the behavior is undefined. You could also use the select operations to process records in batches of NObjects at a time.

Dirty Mnesia Foldl

dirty_foldl(F, Acc0, Table) ->
    dirty_foldl(F, Acc0, Table, mnesia:dirty_first(Table)).

dirty_foldl(_, Acc, _, '$end_of_table') ->
    Acc;
dirty_foldl(F, Acc, Table, Key) ->
    Acc2 = lists:foldl(F, Acc, mnesia:dirty_read(Table, Key)),
    dirty_foldl(F, Acc2, Table, mnesia:dirty_next(Table, Key)).

Dirty Mnesia Foreach

dirty_foreach(F, Table) ->
    dirty_foreach(F, Table, mnesia:dirty_first(Table)).

dirty_foreach(_, _, '$end_of_table') ->
    ok;
dirty_foreach(F, Table, Key) ->
    lists:foreach(F, mnesia:dirty_read(Table, Key)),
    dirty_foreach(F, Table, mnesia:dirty_next(Table, Key)).

Conclusion

  1. It’s probably your fault
  2. Do as little as possible inside transactions
  3. Use dirty operations instead of transactions
  4. Reduce record size
  5. Iterate in small batches

How to Eliminate Mnesia Overload Events

If you’re using mnesia disc_copies tables and doing a lot of writes all at once, you’ve probably run into the following message

=ERROR REPORT==== 10-Dec-2008::18:07:19 ===
Mnesia(node@host): ** WARNING ** Mnesia is overloaded: {dump_log, write_threshold}

This warning event can get really annoying, especially when they start happening every second. But you can eliminate them, or at least drastically reduce their occurance.

Synchronous Writes

The first thing to do is make sure to use sync_transaction or sync_dirty. Doing synchronous writes will slow down your writes in a good way, since the functions won’t return until your record(s) have been written to the transaction log. The alternative, which is the default, is to do asynchronous writes, which can fill transaction log far faster than it gets dumped, causing the above error report.

Mnesia Application Configuration

If synchronous writes aren’t enough, the next trick is to modify 2 obscure configuration parameters. The mnesia_overload event generally occurs when the transaction log needs to be dumped, but the previous transaction log dump hasn’t finished yet. Tweaking these parameters will make the transaction log dump less often, and the disc_copies tables dump to disk more often. NOTE: these parameters must be set before mnesia is started; changing them at runtime has no effect. You can set them thru the command line or in a config file.

dc_dump_limit

This variable controls how often disc_copies tables are dumped from memory. The default value is 4, which means if the size of the log is greater than the size of table / 4, then a dump occurs. To make table dumps happen more often, increase the value. I’ve found setting this to 40 works well for my purposes.

dump_log_write_threshold

This variable defines the maximum number of writes to the transaction log before a new dump is performed. The default value is 100, so a new transaction log dump is performed after every 100 writes. If you’re doing hundreds or thousands of writes in a short period of time, then there’s no way mnesia can keep up. I set this value to 50000, which is a huge increase, but I have enough RAM to handle it. If you’re worried that this high value means the transaction log will rarely get dumped when there’s very few writes occuring, there’s also a dump_log_time_threshold configuration variable, which by default dumps the log every 3 minutes.

How it Works

I might be wrong on the theory since I didn’t actually write or design mnesia, but here’s my understanding of what’s happening. Each mnesia activity is recorded to a single transaction log. This transaction log then gets dumped to table logs, which in turn are dumped to the table file on disk. By increasing the dump_log_write_threshold, transaction log dumps happen much less often, giving each dump more time to complete before the next dump is triggered. And increasing dc_dump_limit helps ensure that the table log is also dumped to disk before the next transaction dump occurs.