Choose right database

Master Slave Replication: If too much load

Advantage:

  1. Strict consistency (always get correct data).

    To achieve consistency we can implement master-slave replication. 1 master for writes because of data consistency and multiple read replicas for reads with load balancing.

  2. Relational database management system | Qiang Zhang

    Highly available (I need the data right now)

Disadvantage:

  1. Not partition tolerance

  2. Very hard to scale

  3. Single point of failure

  4. Writes are not scaled

  5. Latency because of replication

Sharding: If too much data

Advantage:

  1. We can scale the write operations

Disadvantage:

  1. Complex queries

  2. May need to access multiple databases to get your data

When to use what?

Master Slave if:

  1. too much load

  2. needs more reads than writes

Sharding if:

  1. too much data

Brewer's CAP Theorem

Relational Database: It is very hard to scale

Advantage:

  1. Strict consistency

  2. Highly available

Disadvantage:

  1. Scaling writes is very difficult and limited

  2. Vertical scaling is limited and expensive

  3. Horizontal scaling is limited and complex

  4. The schema is fixed

  5. Does not easily handle unstructured and semi-structured data

NoSQL Database: It is easy to scale (scaling is built-in)

Advantage:

  1. Not fixed schema

  2. Can scale***(almost)***unlimited

  3. Key-value store

Disadvantage:

  1. Latency because of Eventual consistency.

Key-value store (eg. Redis => Remote Dictionary Service):

Advantage:

  1. Very fast since the in-memory (temporary purpose for example caching) database

  2. Scalable

    Use case: If we do not need to query the data rather just get, put or delete. For example: storing session data, shopping cart data, profiles and preferences.

Disadvantage:

  1. No way to query based on the content of the value

Document Database (eg. MongoDB):

  • Stores data in BSON

  • Structure/organize the data according to your queries: Define Queries first and then create collections accordingly.

Advantage:

  1. Fast

  2. Can handle unstructured or semi-structured data (Schema Free)

Disadvantage:

  1. Data duplication (for example same name field in multiple documents)
[
{
"professor_name": "John",
"student_name": "Bikash"
},
{
"professor_name": "John",
"student_name": "Bimal"
}
]

Column Family Database (eg. Cassandra, DynamoDB):

  • Structure/organize the data according to your queries. Define Queries and then create tables accordingly.

Advantage:

  1. Schema Free (each row can have different columns)

  2. Key-value store

  3. Read operation is very fast

Disadvantage:

  1. Write operation is very slow

  2. Data duplication in multiple tables as shown in the picture below

    Partition Key:

    Composite Key:

    Clustering Key:

Graph (eg. Neo4j, OrientDB, ArrangoDB): Has only nodes and edges

Use Cases:

Advantage:

  1. No joins needed

  2. Very fast in analyzing data

  3. Not fixed schema so easy to evolve dataset as needed

When to use a graph database?

-> When you are interested in relationships between entities(nodes)