Shahzad Bhatti Welcome to my ramblings and rants!

August 3, 2020

Summary of Data Consistency in Relational and NoSQL Databases

Filed under: Uncategorized — admin @ 8:34 pm

The relational databases generally guarantee transactions in terms of ACID properties that include:

  • A – Atomicity – transaction either succeeds or fails.
  • C – Consistency – all data will remain consistent.
  • I – Isolation – transaction will not be affected by other transactions.
  • D – Durability – changes from the transaction will be stored persistently.

Following is a list of transaction isolation levels:

  • Dirty Read – a transaction can read data that has not yet been committed by another transaction.
  • Non Repeatable Read – a transaction sees different data when reading same row again due to concurrency.
  • Phantom Read – a transaction sees different set of rows when running the same query again.

The SQL standard defines following isolation levels:

  • Read-Uncommitted – a transaction may see uncommitted changes by other transactions, thus allowing dirty reads.
  • Read Committed – a transaction only sees committed changes, thus preventing dirty reads.
  • Repeatable Read – prevents non-repeatable reads
  • Serializable – a highest isolation level where executing transactions appear to be executing serially.

NoSQL based Distributed systems define following consistency levels:

  • Strict consistency – a strongest consistency level that returns most recent updates when reading a value.
  • Sequential consistency – a weaker model as defined by Lamport(1979)
  • Linearizability (atomic) – guarantees sequential consistency with the real-time constraint
  • Causal consistency – a weaker model than Linearizability that guarantees write operations that are casually related must be seen in the same order

Most NoSQL databases lack ACID transaction guarantees and instead offer tradeoffs in terms of CAP theorem and PACELC, where CAP theorem states that a database can only guarantee two of three properties:

  • Consistency – Every node in the cluster responds with the most recent data that may require blocking the request until all replicas are updated.
  • Availability – Every node returns an immediate response even if the response isn’t the most recent data.
  • Partition Tolerance – The system continues to operate even if a node loses connectivity with other nodes.

Consistency in CAP is different than that of ACID where consistency in ACID means a transaction won’t corrupt the database and guarantees database correctness with transaction order but in CAP, it means maintaining Linearizability property that guarantees having the most up-to-date data. Serialization is highest form of isolation between transactions in ACID model with multi-operation, multi-object, arbitrary total order whereas linearizability is a single-operation, single-object, real-time order that applies to distributed systems.

In the event of a network failure (MTBF, MTTF, MTTR), you must choose Partition Tolerance from , so choice is between AP and CP (availability vs consistency). PACELC theorem extends CAP where you choose between availability (A) and consistency in presence of network partitioning (P) but choose between latency (L) and consistency) otherwise (E). Most NoSQL database choose availability and support Basically Available, Soft State, Eventual Consistency (Base) instead of strict serializability or linearizability. The eventual consistency only guarantees liveness where updates will be observed eventually. Some of modern NoSQL databases also support strong eventual consistency using conflict-free replicated data types.

No Comments

No comments yet.

RSS feed for comments on this post. TrackBack URL

Sorry, the comment form is closed at this time.

Powered by WordPress