erldis – an Erlang Redis Client
Since it's now featured on the redis homepage, I figure I should tell people about my fork of erldis, an erlang redis client focused on synchronous operations.
Synchronicity
The original client, which still exists as erldis_client.erl, implements asynchronous pipelining. This means you send a bunch of redis commands, then collect all the results at the end. This didn't work for me, as I needed a client that could handle parallel synchronous requests from multiple concurrent processes. So I copied erldis_client.erl to erldis_sync_client.erl and modified it to send replies back as soon as they are received from redis (in FIFO order). Many thanks to dialtone_ for writing the original erldis app as I'm not sure I would've created the synchronous client without it. And thanks to cstar for patches, such as making erldis_sync_client the default client for all functions in erldis.erl.
Extras
In addition to the synchronous client, I've added some extra functions and modules to make interfacing with redis more erlangy. Here's a brief overview...
erldis_sync_client:transact
erldis_sync_client:transact is analagous to mnesia:transaction in that it does a unit of work against a redis database, like so:
- starts
erldis_sync_client - calls your function with the client PID as the argument
- stops the client
- returns the result of your function
The goal being to reduce boilerplate start/stop code.
erldis_dict module
erldis_dict provides similar semantics as the dict module in stdlib, using redis key-value commands.
erldis_list module
erldis_list provides a number of functions operating on redis lists, inspired by the array, lists, and queue modules in stdlib. You must pass in both the client PID and a redis list key.
erldis_sets module
erldis_sets works like the sets module, but you have to provide both the client PID and a redis set key.
Usage
Despite the low version numbers, I've been successfully using erldis as a component in parallel/distributed information retrieval (in conjunction with plists), and for accessing data shared with python / django apps. It's a fully compliant erlang application that you can include in your target system release structure.
If also you're using erldis for your redis needs, I'd love to hear about it.
Scalable Database Links
Redis:
- Redis vs MySQL vs Tokyo Tyrant (on EC2) « Colin Howe’s Blog
- Key-Value Stores for Ruby (Part 4): To Redis or Not To Redis? | Engine Yard Blog
Cassandra:
- Jonathan Ellis's Programming Blog - Spyced: Why I like the Cassandra distributed database
- ieure's python-cassandra at master - GitHub
- digg's lazyboy at master - GitHub
Performance Tradeoffs:
- Debunking a Myth: Column-Stores vs. Indexes - The Database Column
- Debunking Another Myth: Column-Stores vs. Vertical Partitioning - The Database Column
- Code Monkeyism: Essential storage tradeoff: Simple Reads vs. Simple Writes
Other:
A Few Database Links
Building a NLTK FreqDist on Redis
Say you want to build a frequency distribution of many thousands of samples with the following characteristics:
- fast to build
- persistent data
- network accessible (with no locking requirements)
- can store large sliceable index lists
The only solution I know that meets those requirements is Redis. NLTK's FreqDist is not persistent , shelve is far too slow, BerkeleyDB is not network accessible (and is generally a PITA to manage), and AFAIK there's no other key-value store that makes sliceable lists really easy to create & access. So far I've been quite pleased with Redis, especially given how new it is. It's quite fast, is network accessible, atomic operations make locking unnecessary, supports sortable and sliceable list structures, and is very easy to configure.
Classification
Building a FreqDist allows you to create a ProbDist, which in turn can be used for classification. Having it be persistent lets you examine the data later. And the ability to create sliceable lists allows you to make sorted indexes for paging thru your samples.
Here's some more concrete use cases for persistent frequency distributions:
RedisFreqDist
I put the code I've been using to build frequency distributions over large sets of words up at BitBucket. probablity.py contains RedisFreqDist, which works just like the NTLK FreqDist, except it stores samples and frequencies as keys and values in Redis. That means samples must be strings. Internally, RedisFreqDist also stores a set of all the samples under the key __samples__ for efficient lookup and sorting. Here's some example code for using it. For more info, checkout the wiki, or read the code.
def make_freq_dist(samples, host='localhost', port=6379, db=0): freqs = RedisFreqDist(host=host, port=port, db=db) for sample in samples: freqs.inc(sample)
Unfortunately, I had to muck about with some of FreqDist's internal implementation to remain compatible, so I can't promise the code will work beyond NLTK version 0.9.9. probablity.py also includes ConditionalRedisFreqDist for creating ConditionalProbDists.
Lists
For creating lists of samples, that very much depends on your use case, but here's some example code for doing so. r is a redis object, key is the index key for storing the list, and samples is assumed to be a sorted list. The get_samples function demonstrates how to get a slice of samples from the list.
def index_samples(r, key, samples): r.delete(key) for word in words: r.push(key, word, tail=True) def get_samples(r, key, start, end): return r.lrange(key, start, end)
Yes, Redis is still fairly alpha, so I wouldn't use it for critical systems. But I've had very few issues so far, especially compared to dealing with BerkeleyDB. I highly recommend it for your non-critical computational needs ![]()




