Project Voldemort: A distributed database.
Voldemort is a distributed key-value storage system
- Data is automatically replicated over multiple servers.
- Data is automatically partitioned so each server contains only a subset of the total data.
- Provides tunable consistency (strict quorum or eventual consistency)
- Server failure is handled transparently
- Pluggable Storage Engines — BDB-JE, MySQL, Read-Only
- Pluggable serialization — Protocol Buffers, Thrift, Avro, and Java Serialization
- Data items are versioned to maximize data integrity in failure scenarios without compromising the availability of the system
- Each node is independent of other nodes with no central point of failure or coordination
- Good single node performance: you can expect 10-20k operations per second depending on the machines, the network, the disk system, and the data replication factor
- Support for pluggable data placement strategies to support things like distribution across data centers that are geographically far apart.
It is used at LinkedIn by numerous critical services powering a large portion of the site.
Comparison to relational databases
Voldemort is not a relational database; it does not attempt to satisfy arbitrary relations while satisfying ACID properties. Nor is it an object database that attempts to map object reference graphs transparently. Nor does it introduce a new abstraction, such as document orientation. It is just a big, distributed, persistent, fault-tolerant hash table. For applications that can use an O/R mapper like active-record or hibernate, this will provide horizontal scalability and much higher availability but at a significant loss of convenience. For large applications under internet-type scalability pressure, a system may likely consist of several functionally partitioned services or APIs, which may manage storage resources across multiple data centers using storage systems that may be horizontally partitioned. For applications in this space, arbitrary in-database joins are already impossible since all the data is unavailable in any single database. A typical pattern introduces a caching layer that will require hashtable semantics anyway. For these applications, Voldemort offers many advantages:
- Voldemort combines in-memory caching with the storage system so that a separate caching tier is not required (instead, the storage system itself is just fast)
- Unlike MySQL replication, both reads and writes scale horizontally
- Data portioning is transparent and allows for cluster expansion without rebalancing all data.
- A simple API decides data replication and placement to accommodate various application-specific strategies.
- The storage layer is completely mockable, so development and unit testing can be done against a throw-away in-memory storage system without needing a real cluster (or even a natural storage system) for simple testing.
The source code is available under the Apache 2.0 license. We are actively looking for contributors, so if you have ideas, code, bug reports, or fixes you would like to contribute, please do so.