Mnesia Records to MongoDB Documents

I recently migrated about 50k records from mnesia to MongoDB using my fork of emongo, which adds supervisors with transparent connection restarting, for reasons I’ll explain below.

Why Mongo instead of Mnesia

mnesia is great for a number of reasons, but here’s why I decided to move weotta’s place data into MongoDB:

Converting Records to Docs and vice versa

First, I needed to convert records to documents. In erlang, mongo documents are basically proplists. Keys going into emongo can be atoms, strings, or binaries, but keys coming out will always by binaries. Here’s a simple example of record to document conversion:

record_to_doc(Record, Attrs) ->
    % tl will drop record name
    lists:zip(Attrs, tl(tuple_to_list(Record))).

This would be called like record_to_doc(MyRecord, record_info(fields, my_record)). If you have nested dicts then you’ll have to flatten them using dict:to_list. Also note that list values are coming out of emongo are treated like yaws JSON arrays, i.e. [{key, {array, [val]}}]. For more examples, check out the emongo docs.

Heavy Write Load

To do the migration, I used etable:foreach to insert each document. Bulk insertion would probably be more efficient, but etable makes single record iteration very easy.

I started using the original emongo with a pool size of 10, but it was crashy when I dumped records as fast as possible. So initially I slowed it down with timer:sleep(200), but after adding supervised connections, I was able to dump with no delay. I’m not exactly sure what I fixed in this case, but I think the lesson is that using supervised gen_servers will give you reliability with little effort.

Read Performance

Now that I had data in mongo to play with, I compared the read performance to mnesia. Using timer:tc, I found that mnesia:dirty_read takes about 21 microseconds, whereas emongo:find_one can take anywhere from 600 to 1200 microseconds, querying on an indexed field. Without an index, read performance ranged from 900 to 2000 microseconds. I also tested only requesting specific fields, as recommended on the MongoDB Optimiziation page, but with small documents (<10 fields) that did not seem to have any effect. So while mongodb queries are pretty fast at 1ms, mnesia is about 50 times faster. Further inspection with fprof showed that nearly half of the cpu time of emongo:find is taken by BSON decoding.

Heavy Read Load

Under heavy read load (thousands of find_one calls in less than second), emongo_conn would get into a locked state. Somehow the process had accumulated unparsable data and wouldn’t reply. This problem went away when I increased the size of the pool size to 100, but that’s a ridiculous number of connections to keep open permanently. So instead I added some code to kill the connection on timeout and retry the find call. This was the main reason I added supervision. Now, every pool is locally registered as a simple_one_for_one supervisor that supervises every emongo_server connection. This pool is in turn supervised by emongo_sup, with dynamically added child specs. All this supervision allowed me to lower the pool size back to 10, and made it easy to kill and restart emongo_server connections as needed.

Why you may want to stick with Mnesia

Now that I have experience with both MongoDB and mnesia, here’s some reasons you may want to stick with mnesia:

Despite all that, I’m very happy with MongoDB. Installation and setup were a breeze, and schema-less data storage is very nice when you have variable fields and a high probability of adding and/or removing fields in the future. It’s simple, scalable, and as mentioned above, it’s very easy to access from many different languages. emongo isn’t perfect, but it’s open source and will hopefully benefit from more exposure.

  • http://www.mongodb.org/ dm

    Hi, what driver were you using when you had the glitch with find_one?

  • http://www.mongodb.org/ dm

    Hi, what driver were you using when you had the glitch with find_one?

  • http://streamhacker.com/ Jacob Perkins

    I was using the original emongo before forking it here.

  • Jacob

    I was using the original emongo before forking it here.

  • http://www.mongodb.org/ dm

    ok we’ll look into it – could be something with the driver (first thought).

  • http://www.mongodb.org/ dm

    ok we’ll look into it – could be something with the driver (first thought).

  • http://streamhacker.com/ Jacob Perkins

    Coo, thanks. My guess is that it happens in the packet accumulation in emongo_conn line 70 (or in my equivalent emongo_server). When the response can’t be decoded, it gets appended to the Leftover (and keeps looping) and somehow the result never becomes a decodable packet.

  • Jacob

    Coo, thanks. My guess is that it happens in the packet accumulation in emongo_conn line 70 (or in my equivalent emongo_server). When the response can’t be decoded, it gets appended to the Leftover (and keeps looping) and somehow the result never becomes a decodable packet.

  • Casey

    Have you considered using mnesia as a local cache for data that’s accessed a lot? If you know what the data is, it may help your performance if you start running into slowdowns.

  • Casey

    Have you considered using mnesia as a local cache for data that’s accessed a lot? If you know what the data is, it may help your performance if you start running into slowdowns.

  • http://streamhacker.com/ Jacob Perkins

    Thanks Casey, I’ve definitely been thinking about doing that. I think mnesia would be a great distributed in-memory cache.

  • Jacob

    Thanks Casey, I’ve definitely been thinking about doing that. I think mnesia would be a great distributed in-memory cache.

  • Pingback: Tweets that mention Migrating Mnesia Records to MongoDB Documents using Erlang emongo driver « streamhacker.com -- Topsy.com()

  • Tkor

    The big “FOLLOW ME” frame to the left leaves your blog all but unusable. Really, you’d be better off removing it completely.

  • Tkor

    uh, turns out it’s only a problem when you zoom it in, like I have to, because I am half blind. so, nevermind.

  • http://streamhacker.com/ Jacob Perkins

    I’m not sure how much good it was doing anyway, so I removed it.

  • JLM

    Hi
    I’m really new to Erlang and I find it a really exciting languge and platform. Though it is a little bit rough and sometimes low-level for newbies.
    I understand almost all terms you used in this post but finally without viewing code, it stays too theorical. Would you mind sharing your code for newbies like me ? :-)
    Sorry if my comment comes a little bit late.

  • http://streamhacker.com/ Jacob Perkins

    I don’t have the code anymore, but if you’re interested in using erlang with MongoDB, checkout emongo: https://github.com/JacobVorreuter/emongo. Or for some mnesia helper code, there’s my etable module in elib: https://github.com/japerk/elib/blob/master/src/etable.erl. The transfer from mnesia to mongo was really just pattern matching & record conversion using both of those.

  • DG

    Hi Jacob,
    I’m a newbie to erlang (but quite solid in C and DBMS like sqlite, postgres, db2) and trying to use mnesia for a project at university. I found really interesting your comments here and more interesting your elib code… the problem is I’m not able to figure out how to use it :( 
    My intentions would be to have parallel processes (read, math functions, write) on data from one or more tables (possibly partitioning on several nodes). I’ve seen you implemented ptable in elib using plist (by you and Stephen Marsh)…

    I searched a while for someone using it but seems difficult, surely i’m missing something…
    Could you post an (even extremely) simple example of the combination ptable-mnesia?

  • http://streamhacker.com/ Jacob Perkins

    The idea with plists is to be a drop-in replacement for functions in lists, with the option of customizing the parallelization by specifying a Malt. ptable then expands on that, but instead of providing a list, you give a mnesia table name and match specification. So for the ptable:foreach functions, F is the function to call on each record in mnesia, Table and Spec are passed thru to mnesia:select in order to select the objects, and Malt is passed to plists:foreach along with F and a list of objects from mnesia. I don’t have an example handy since I haven’t used this code in years, so I recommend reading up on plists & Malt, and mnesia match specifications.