Tag Archives: database

Mnesia Records to MongoDB Documents

I recently migrated about 50k records from mnesia to MongoDB using my fork of emongo, which adds supervisors with transparent connection restarting, for reasons I’ll explain below.

Why Mongo instead of Mnesia

mnesia is great for a number of reasons, but here’s why I decided to move weotta’s place data into MongoDB:

Converting Records to Docs and vice versa

First, I needed to convert records to documents. In erlang, mongo documents are basically proplists. Keys going into emongo can be atoms, strings, or binaries, but keys coming out will always by binaries. Here’s a simple example of record to document conversion:

record_to_doc(Record, Attrs) ->
    % tl will drop record name
    lists:zip(Attrs, tl(tuple_to_list(Record))).

This would be called like record_to_doc(MyRecord, record_info(fields, my_record)). If you have nested dicts then you’ll have to flatten them using dict:to_list. Also note that list values are coming out of emongo are treated like yaws JSON arrays, i.e. [{key, {array, [val]}}]. For more examples, check out the emongo docs.

Heavy Write Load

To do the migration, I used etable:foreach to insert each document. Bulk insertion would probably be more efficient, but etable makes single record iteration very easy.

I started using the original emongo with a pool size of 10, but it was crashy when I dumped records as fast as possible. So initially I slowed it down with timer:sleep(200), but after adding supervised connections, I was able to dump with no delay. I’m not exactly sure what I fixed in this case, but I think the lesson is that using supervised gen_servers will give you reliability with little effort.

Read Performance

Now that I had data in mongo to play with, I compared the read performance to mnesia. Using timer:tc, I found that mnesia:dirty_read takes about 21 microseconds, whereas emongo:find_one can take anywhere from 600 to 1200 microseconds, querying on an indexed field. Without an index, read performance ranged from 900 to 2000 microseconds. I also tested only requesting specific fields, as recommended on the MongoDB Optimiziation page, but with small documents (<10 fields) that did not seem to have any effect. So while mongodb queries are pretty fast at 1ms, mnesia is about 50 times faster. Further inspection with fprof showed that nearly half of the cpu time of emongo:find is taken by BSON decoding.

Heavy Read Load

Under heavy read load (thousands of find_one calls in less than second), emongo_conn would get into a locked state. Somehow the process had accumulated unparsable data and wouldn’t reply. This problem went away when I increased the size of the pool size to 100, but that’s a ridiculous number of connections to keep open permanently. So instead I added some code to kill the connection on timeout and retry the find call. This was the main reason I added supervision. Now, every pool is locally registered as a simple_one_for_one supervisor that supervises every emongo_server connection. This pool is in turn supervised by emongo_sup, with dynamically added child specs. All this supervision allowed me to lower the pool size back to 10, and made it easy to kill and restart emongo_server connections as needed.

Why you may want to stick with Mnesia

Now that I have experience with both MongoDB and mnesia, here’s some reasons you may want to stick with mnesia:

Despite all that, I’m very happy with MongoDB. Installation and setup were a breeze, and schema-less data storage is very nice when you have variable fields and a high probability of adding and/or removing fields in the future. It’s simple, scalable, and as mentioned above, it’s very easy to access from many different languages. emongo isn’t perfect, but it’s open source and will hopefully benefit from more exposure.

Scalable Database Links

Redis:
Cassandra:
Performance Tradeoffs:
Other:

Erlang Webserver & Database Links

Erlang databases:

Erlang web servers and frameworks:

How to Fix Erlang Out of Memory Crashes When Using Mnesia

If you’re getting erlang out of memory crashes when using mnesia, chances are you’re doing it wrong, for various values of it. These out of memory crashes look something like this:

Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 999999999 bytes of memory (of type "heap")

Possible Causes

  1. You’re doing it wrong
  2. Someone else is doing it wrong
  3. You don’t have enough RAM

While it’s possible that the crash is due to not having enough RAM, or that some other program or process is using too much RAM for itself, chances are it’s your fault.

One of the reasons these crashes can catch you by surprise is that the erlang VM is using a lot more memory than you might think. Erlang is a functional language with single assignment and no shared memory. A major consequence is that when you change a variable or send a message to another process, a new copy of the variable is created. So an operation as simple as dict:update_counter(“foo”, 1, Dict1) consumes twice the memory of Dict1 since Dict1 is copied to create the return value. And anything you do with ets, dets, or mnesia will result in at least 2 copies of every term: 1 copy for your process, and 1 copy for each table. This is because mnesia uses ets and/or dets for storage, which both use 1 process per table. That means every table operation results in a message pass, sending your term to the table or vice-versa. So that’s why erlang may be running out of memory. Here’s how to fix it.

Use Dirty Operations

If you’re doing anything in a transaction, try to figure out how to do it dirty, or at least move as many operations as possible out of the transaction. Mnesia transactions are separate processes with their own temporary ets tables. That means there’s the original term(s) that must be passed in to the transaction or read from other tables, any updated copies that your code creates, copies of terms that are written to the temporary ets table, the final copies of terms that are written to actual table(s) at the end of the transaction, and copies of any terms that are returned from the transaction process. Here’s an example to illustrate:

example() ->
    T = function() ->
        Var1 = mnesia:read(example_table, "foo"),
        Var2 = update(Var2), % a user-defined function to update Var1
        ok = mnesia:write(Var2),
        Var2
    end,
    {atomic, Var2} = mnesia:transaction(T),
    Var2.

First off, we already have a copy of Var1 in example_table. It gets sent to the transaction process when you do mnesia:read, creating a second copy. Var1 is then updated, resulting in Var2, which I’ll assume has the same memory footprint of Var1. So now we have 3 copies. Var2 is then written to a temporary ets table because mnesia:write is called within a transaction, creating a fourth copy. The transaction ends, sending Var2 back to the original process, and also overwriting Var1 with Var2 in example_table. That’s 2 more copies, resulting in a total of 6 copies. Let’s compare that to a dirty operation.

example() ->
    Var1 = mnesia:dirty_read(example_table, "foo"),
    Var2 = update(Var1),
    ok = mnesia:dirty_write(Var2),
    Var2.

Doing it dirty results in only 4 copies: the original Var1 in example_table, the copy sent to your process, the updated Var2, and the copy sent to mnesia to be written. Dirty operations like this will generally have 2/3 the memory footprint of operations done in a transaction.

Reduce Record Size

Figuring out how to reduce your record size by using different data structures can create huge gains by drastically reducing the memory footprint of each operation, and possibly removing the need to use transaction. For example, let’s say you’re storing a large record in mnesia, and using transactions to update it. If the size of the record grows by 1 byte, then each transactional operation like the above will require an additional 5 bytes of memory, or dirty operations will require an additional 3 bytes. For multi-megabyte records, this adds up very quickly. The solution is to figure how to break that record up into many small records. Mnesia can use any term as a key, so for example, if you’re storing a record with a dict in mnesia such as {dict_record, “foo”, Dict}, you can split that up into many records like [{tuple_record, {“foo”, Key1}, Val1}]. Each of these small records can be accessed independently, which could eliminate the need to use transactions, or at least drastically reduce the memory footprint of each transaction.

Iterate in Batches

Instead of getting a whole bunch of records from mnesia all at once, using mnesia:dirty_match_object or mnesia:dirty_select, iterate over the records in batches. This is analagous to using lists operations on mnesia tables. The match_object methods may return a huge number of records, and all those records have to be sent from the table process to your process, doubling the amount of memory required. By iteratively doing operations on batches of records, you’re only accessing a portion at a time, reducing the amount of memory being used at once. Here’s some code examples that only access 1 record at a time. Note that if the table changes during iteration, the behavior is undefined. You could also use the select operations to process records in batches of NObjects at a time.

Dirty Mnesia Foldl

dirty_foldl(F, Acc0, Table) ->
    dirty_foldl(F, Acc0, Table, mnesia:dirty_first(Table)).

dirty_foldl(_, Acc, _, '$end_of_table') ->
    Acc;
dirty_foldl(F, Acc, Table, Key) ->
    Acc2 = lists:foldl(F, Acc, mnesia:dirty_read(Table, Key)),
    dirty_foldl(F, Acc2, Table, mnesia:dirty_next(Table, Key)).

Dirty Mnesia Foreach

dirty_foreach(F, Table) ->
    dirty_foreach(F, Table, mnesia:dirty_first(Table)).

dirty_foreach(_, _, '$end_of_table') ->
    ok;
dirty_foreach(F, Table, Key) ->
    lists:foreach(F, mnesia:dirty_read(Table, Key)),
    dirty_foreach(F, Table, mnesia:dirty_next(Table, Key)).

Conclusion

  1. It’s probably your fault
  2. Do as little as possible inside transactions
  3. Use dirty operations instead of transactions
  4. Reduce record size
  5. Iterate in small batches

How to Eliminate Mnesia Overload Events

If you’re using mnesia disc_copies tables and doing a lot of writes all at once, you’ve probably run into the following message

=ERROR REPORT==== 10-Dec-2008::18:07:19 ===
Mnesia(node@host): ** WARNING ** Mnesia is overloaded: {dump_log, write_threshold}

This warning event can get really annoying, especially when they start happening every second. But you can eliminate them, or at least drastically reduce their occurance.

Synchronous Writes

The first thing to do is make sure to use sync_transaction or sync_dirty. Doing synchronous writes will slow down your writes in a good way, since the functions won’t return until your record(s) have been written to the transaction log. The alternative, which is the default, is to do asynchronous writes, which can fill transaction log far faster than it gets dumped, causing the above error report.

Mnesia Application Configuration

If synchronous writes aren’t enough, the next trick is to modify 2 obscure configuration parameters. The mnesia_overload event generally occurs when the transaction log needs to be dumped, but the previous transaction log dump hasn’t finished yet. Tweaking these parameters will make the transaction log dump less often, and the disc_copies tables dump to disk more often. NOTE: these parameters must be set before mnesia is started; changing them at runtime has no effect. You can set them thru the command line or in a config file.

dc_dump_limit

This variable controls how often disc_copies tables are dumped from memory. The default value is 4, which means if the size of the log is greater than the size of table / 4, then a dump occurs. To make table dumps happen more often, increase the value. I’ve found setting this to 40 works well for my purposes.

dump_log_write_threshold

This variable defines the maximum number of writes to the transaction log before a new dump is performed. The default value is 100, so a new transaction log dump is performed after every 100 writes. If you’re doing hundreds or thousands of writes in a short period of time, then there’s no way mnesia can keep up. I set this value to 50000, which is a huge increase, but I have enough RAM to handle it. If you’re worried that this high value means the transaction log will rarely get dumped when there’s very few writes occuring, there’s also a dump_log_time_threshold configuration variable, which by default dumps the log every 3 minutes.

How it Works

I might be wrong on the theory since I didn’t actually write or design mnesia, but here’s my understanding of what’s happening. Each mnesia activity is recorded to a single transaction log. This transaction log then gets dumped to table logs, which in turn are dumped to the table file on disk. By increasing the dump_log_write_threshold, transaction log dumps happen much less often, giving each dump more time to complete before the next dump is triggered. And increasing dc_dump_limit helps ensure that the table log is also dumped to disk before the next transaction dump occurs.