Scalaris is a new distributed key-value datastore, recently announced and code posted to Google Code.
It was announced and demoed at Erlang eXchange 2008. Joe Armstrong (father of Erlang) later wrote on his blog: "my gut feeling is that what Alexander Reinefeld showed us will be the first killer application in Erlang"
Armstrong's summary:
- They make a peer to peer system based on the chord algorithm
- They added a replication later using the paxos algorithm
- They added a transaction layer
- The injected the wikipedia
It went faster that the existing wikipedia
"Applied to Wikipedia, Scalaris serves 2,500 transactions per second with just 16 CPUs, which is better than the public Wikipedia."
One downside: it's presently a memory-only store, so it's quite useless for permanent data storage. (One full power-outage in a data center will obliterate all of your data. Doh!)
nmdb is yet another distributed key-value store, this one implemented in ~5000 lines of C and using qdbm or berkeley db as the back-end store. It looks simple and stable. Major limitations: it's distributed, not replicated, so is more like a persistent memcache (like Tugela and memcachedb). There is also a hard 64kB size limit on key+value packets.
As you might have guessed from my articles on this topic -- I am looking a "Bigtable-like" datastore that I can recommend to clients. My criteria are:
- It must be reasonably mature: losing data isn't an option!
- It must be open-source, and the project must "have legs" (not abandoned)
- It must be fast enough to serve as the primary datastore behind a web service API
- It must store a few terabytes of data on < 5 machines (not in RAM!), and be able to grow capacity by just adding more machines
I still haven't found anything I'd recommend. Dang it guys, finish one of these projects! :) Maybe I'll have to build something custom on top of MogileFS from scratch after all?