T09 - NoSQL 1
T09 - NoSQL 1
Mainack Mondal
Sandip Chakraborty
CS 60203
Autumn 2024
Outline
• What is NoSQL?
● Geographic Distribution
○ What if network latency increases b/w geographically distributed nodes?
NoSQL vs SQL
SQL NoSQL
Supports Relationships and Joins No support for Joins and relationships
High Maintenance Cost Low Maintenance Cost
Predefined Schema Dynamic Schema
Vertically Scalable Horizontally Scalable
Follows ACID property Does not follow ACID property
Eg: PostgreSQL, MySQL etc. Eg: Cassandra, Neo4j etc.
But … Why should you choose NoSQL?
Benefits of NoSQL
● Agility
○ SQL has a fixed data model hence, does not support agile development
○ A key principle of agile development is the ability to adapt to changing application
requirements
○ NoSQL being able to support dynamic schema, supports agile development
Source: Link
Benefits of NoSQL
● Scalability
○ NoSQL supports Horizontal Scaling (add more commodity servers or cloud instances)
○ SQL does not support horizontal scaling (Why?)
○ Vertical scaling requires significant additional engineering (like making joins faster)
○ Examples:
■ Games like Pokemon Go, Clash of Clans etc. stores data of millions of users
■ IoT devices:
● More than billion IoT devices are connected to the Internet
● This data is semi-structured and continuous
Benefits of NoSQL
● Auto-Sharding
○ NoSQL databases often comes with built in auto-sharding features
○ This is essential for horizontal scaling
● Polyglot Persistence
○ Means when when storing data use
multiple data storage technologies,
chosen based on the way data is used
○ Similar to Polyglot Programming
NoSQL Tradeoffs
Now, the question is what are we losing ?
● No Relationship among data ⇒ No Joins
● However, we are losing something more ⇒ consistency (What !!)
● CAP Theorem
○ Consistency: Once data is written, all future read requests will contain that data
○ Availability: The database is always available and responsive
○ Partition Tolerance: One part of the database can go down without affecting others
● This theorem says that in a distributed we can choose only 2 out C, A and P.
● NoSQL ensures availability and partition-tolerance
● However it ensures eventual consistency
Outline
• What is NoSQL?
● Graph Based
○ Used to store information about networks of data, such as social connections
○ Examples: Neo4j, Giraph etc.
NoSQL Database Types (contd.)
● Key-Value Pairs
○ Similar to hash tables with a unique key and pointer to a data (usually BLOBs)
○ Use Case: maintaining session info, user profiles, preferences, shopping cart etc.
○ Examples: Redis, Amazon DynamoDB, Facebook’s Memcached etc.
○ Note: Avoid using K-V pairs if you want to query by data
● Column based
○ Data is arranged as columns instead of rows, with keys pointing to multiple columns
○ Supports efficient representation of sparse data
○ Designed to store and process large amounts of data distributed over many machines
○ Examples: Apache Cassandra, HBase etc.
How to choose between SQL and NoSQL?
Criteria Use Case SQL NoSQL
ACID Banking Systems, Inventory Suitable: SQL ensures Not Suitable (No ACID
Compliance Management Systems ACID compliance compliance)
Complex Queries Reporting, analytics, and data Suitable: Supports Not Suitable: Best for
manipulation complex queries with simple queries and fast
JOINs lookups
Scalability Handling large amounts of Not Suitable (Vertical Suitable (Horizontal
data Scaling) Scaling)
Data E-commerce System: Suitable Not Suitable
Relationships Managing products,
categories, and customers.
Data Variety Structures (ERP), Suitable for Structured Suitable for Unstructured
Unstructured (Big Data data Data
Applications)
Outline
• What is NoSQL?
● Multi-Tenant
● Flexible Schema
● Predictable Performance
● Highly Available
● Boundless Scale
source: Link
DynamoDB Architecture
DynamoDB Tables
● Consists of items which is in turn a collection of attributes
● Items uniquely identified by primary key
● Schema of primary key specified at table creation
● The primary key can be a simple partition key or a composite key, or a
combination of both partition and sort keys
● Partition key determines the physical storage location of the item
● DynamoDB also supports secondary indexes to query data using alternate keys
DynamoDB Architecture (contd.)
Interface
DynamoDB Architecture (contd.)
Partitioning and Replication
A DynamoDB table is divided into multiple partitions. This provides two benefits:
● Handling more throughput as requests increase
● Store more data as the table grows
But what about the availability guarantees of these partitions?
● Each partition has multiple replicas distributed across availability zones
● Together, these replicas form a replication group and improve the partition’s
availability and durability
What are these ?
DynamoDB Architecture (contd.)