Consistency in Modern DBMS
Consistency in modern NoSQL-style databases has always been a hot topic, and it indeed is critical when determining the correct DBMS for a situation. I wrote a paper explaining some of the differences between the solutions of Dynamo, Amazon’s database solution, and Chubby’s backend fault-tolerant database. Each database was examined in context of the CAP theorem, by Eric Brewer, which states that a DBMS cannot completely guarantee consistency (C), availability (A), and partition tolerance (P) at the same time. This result was proven by a research group at MIT several years later. The two DBMSs examined, however, provide novel solutions that still do not guarantee all three properties, but rather sacrifice a little bit of each property to meet the specifications of the application which the database was built for.
Dynamo uses an unusual approach to consistency. Due to the workload of the applications that Amazon provides, such as the shopping cart, they needed a database that would provide up to 99.9% availability, while also being distributed to hundreds of thousands of servers around the world. However, the CAP theorem then states that we cannot guarantee consistency, given that we need availability and partition tolerance. Amazon found a way around the need for guaranteed consistency by allowing any node in a cluster of Dynamo servers to accept a write at any time without checking for conflicts. Then, these conflicts are resolved when the value is read by returning multiple versions, if multiple versions are found. The application is then responsible for resolving conflicts.
Chubby, on the other hand, must guarantee consistency and must be distributed, leaving availability to suffer. They required consistency because the types of data that Chubby handles is not tolerant of conflicts – for example, it is easy to merge a shopping cart, but not so easy to merge two versions of a term paper. The Paxos algorithm was implemented to guarantee consistency in a partitioned environment, even if a subset of servers go down.
To read more about consistency in modern DBMSs, view the survey [PDF].