MongoDB and the CAP Theorem
The
CAP theorem by Brewer basically says that a distributed systems can only have two of the following three properties:
- Consistency i.e. each node has the same data
- Availability i.e. a node will always answer queries if possible
- Partition tolerance i.e. work despite a network failure so nodes cannot communicate with one another
In real life things are a little different: You cannot really sacrifice partition tolerance. The network will eventually fail. Or nodes might fail.
So here is a different approach to understanding the CAP theorem: Imagine a cluster of nodes. Each has a replicated set of data. When the network or a node fails there are two options to answer a query: First a node can give an answer to a query based on the data on that node. This information might be outdated. The other nodes might have received some update that are not propagated yet. So the node might therefore give an incorrect answer - so the system sacrifices consistency.
The other option in such a situation is to give no answer. Then the system rather won't answer queries than give a potentially incorrect answer. This sacrifices availability - the node does not answer the query even though it is still up.
So let's take the CAP theorem to better understand MongoDB. MongoDB uses a Master / Slave
replication scheme. Data is written to a master node and then replicated to the slaves. If a network failure occurs or the master is down a slave takes over as the new master. So how does MongoDB solve the issue concerning CAP? There are settings that influence Mongos behavior:
- Write Concerns let you choose when a write attempt is considered successful. The setting vary from "error ignored" up to settings that define how many nodes must have acknowledged the write operation.
- Read Preferences allow you to choose whether you want to read from the master or also from slaves.
So concerning CAP it leaves you with different options:
- Using the write concerns you can enforce different level of consistency in the cluster - you can choose how many nodes the data must be stored at. There is a trade off with availability: The write will fail if the number of nodes the data should be stored at is higher than the currently available number of nodes.
- The read preference can be used to choose which node data should be read from. If you decide to read data from the master only you will get the data even if it has not been propagated to all nodes.
-
Besides a trade off between availability and consistency these settings obviously also influence performance.
If you like you can read details about what happens if a MongoDB partitions in
Jepsen.
So bottom line: MongoDB allows you to fine tune the trade off between consistency and availability using write concerns and read preferences. Concerning partition tolerance there is really no choice - it will eventually happen. So it can be tuned to be AP or CP or something in between - depending on how you tune it. Final note: This is my take on CAP and where MongoDB stands. If you browse around on the web you might find different takes on it. I am happy to discuss the details - leave a comment!
Labels: CAP, CAP Theorem, MongoDB