I recently migrated about 50k records from mnesia to MongoDB using my fork of emongo, which adds supervisors with transparent connection restarting, for reasons I’ll explain below.
Why Mongo instead of Mnesia
mnesia is great for a number of reasons, but here’s why I decided to move weotta’s place data into MongoDB:
- easy to access from python and other languages
- schema-less data, so you’re not constrained to records, and will never have to do mnesia:transform_table ever again
- don’t have to keep everything in memory (or only on disk as the case may be)
- simple & flexible indexing & querying
Converting Records to Docs and vice versa
First, I needed to convert records to documents. In erlang, mongo documents are basically proplists. Keys going into emongo can be atoms, strings, or binaries, but keys coming out will always by binaries. Here’s a simple example of record to document conversion:
record_to_doc(Record, Attrs) -> % tl will drop record name lists:zip(Attrs, tl(tuple_to_list(Record))).
This would be called like record_to_doc(MyRecord, record_info(fields, my_record)).
If you have nested dicts then you’ll have to flatten them using dict:to_list. Also note that list values are coming out of emongo are treated like yaws JSON arrays, i.e. [{key, {array, [val]}}]
. For more examples, check out the emongo docs.
Heavy Write Load
To do the migration, I used etable:foreach to insert each document. Bulk insertion would probably be more efficient, but etable makes single record iteration very easy.
I started using the original emongo with a pool size of 10, but it was crashy when I dumped records as fast as possible. So initially I slowed it down with timer:sleep(200)
, but after adding supervised connections, I was able to dump with no delay. I’m not exactly sure what I fixed in this case, but I think the lesson is that using supervised gen_servers will give you reliability with little effort.
Read Performance
Now that I had data in mongo to play with, I compared the read performance to mnesia. Using timer:tc, I found that mnesia:dirty_read takes about 21 microseconds, whereas emongo:find_one
can take anywhere from 600 to 1200 microseconds, querying on an indexed field. Without an index, read performance ranged from 900 to 2000 microseconds. I also tested only requesting specific fields, as recommended on the MongoDB Optimiziation page, but with small documents (<10 fields) that did not seem to have any effect. So while mongodb queries are pretty fast at 1ms, mnesia is about 50 times faster. Further inspection with fprof showed that nearly half of the cpu time of emongo:find
is taken by BSON decoding.
Heavy Read Load
Under heavy read load (thousands of find_one
calls in less than second), emongo_conn would get into a locked state. Somehow the process had accumulated unparsable data and wouldn’t reply. This problem went away when I increased the size of the pool size to 100, but that’s a ridiculous number of connections to keep open permanently. So instead I added some code to kill the connection on timeout and retry the find call. This was the main reason I added supervision. Now, every pool is locally registered as a simple_one_for_one supervisor that supervises every emongo_server connection. This pool is in turn supervised by emongo_sup, with dynamically added child specs. All this supervision allowed me to lower the pool size back to 10, and made it easy to kill and restart emongo_server connections as needed.
Why you may want to stick with Mnesia
Now that I have experience with both MongoDB and mnesia, here’s some reasons you may want to stick with mnesia:
- very fast in-memory reads
- transactional
- simple master-master replication
- great for distributed read-heavy applications
Despite all that, I’m very happy with MongoDB. Installation and setup were a breeze, and schema-less data storage is very nice when you have variable fields and a high probability of adding and/or removing fields in the future. It’s simple, scalable, and as mentioned above, it’s very easy to access from many different languages. emongo isn’t perfect, but it’s open source and will hopefully benefit from more exposure.