0% found this document useful (0 votes)

7 views

T09 - NoSQL 1

Uploaded by

jeeko1089

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

T09 - NoSQL 1

Uploaded by

jeeko1089

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

NoSQL : Not only SQL

Mainack Mondal
Sandip Chakraborty

CS 60203
Autumn 2024
Outline
• What is NoSQL?

• How is it diﬀerent from SQL?

• Why do we need NoSQL?

• NoSQL Database types

• How to choose between SQL and NoSQL?

• Case Study: Amazon DynamoDB

Introduction to NoSQL
NoSQL
● Stands for “Not Only SQL”
● Basically a non-relational, schema-less and largely distributed database
● Developed in late 2000s to deal with limitations of SQL databases

Umm … What is a Distributed Database ?

Distributed Database
● Ever wondered how companies
like Amazon manage their DB?

● Basically, Database is logically

divided and distributed across
multiple computers

● All these computers are

connected in a network
Need of Distributed Databases
● Scalability
○ What if your database size exceeds 100GB?
○ Is read/write speed still same?

● Fault Tolerance and High Availability

○ What if your database system fails? Can it recover by itself ?

● Geographic Distribution
○ What if network latency increases b/w geographically distributed nodes?
NoSQL vs SQL
SQL NoSQL
Supports Relationships and Joins No support for Joins and relationships
High Maintenance Cost Low Maintenance Cost
Predefined Schema Dynamic Schema
Vertically Scalable Horizontally Scalable
Follows ACID property Does not follow ACID property
Eg: PostgreSQL, MySQL etc. Eg: Cassandra, Neo4j etc.
But … Why should you choose NoSQL?
Benefits of NoSQL
● Agility
○ SQL has a fixed data model hence, does not support agile development
○ A key principle of agile development is the ability to adapt to changing application
requirements
○ NoSQL being able to support dynamic schema, supports agile development

● Handling Unstructured Data

○ NoSQL supports dynamic Schema, hence can handle unstructured data
○ SQL needs relationship between diﬀerent data to be able perform ‘Joins’

Source: Link
Benefits of NoSQL
● Scalability
○ NoSQL supports Horizontal Scaling (add more commodity servers or cloud instances)
○ SQL does not support horizontal scaling (Why?)
○ Vertical scaling requires significant additional engineering (like making joins faster)
○ Examples:
■ Games like Pokemon Go, Clash of Clans etc. stores data of millions of users
■ IoT devices:
● More than billion IoT devices are connected to the Internet
● This data is semi-structured and continuous
Benefits of NoSQL
● Auto-Sharding
○ NoSQL databases often comes with built in auto-sharding features
○ This is essential for horizontal scaling

● Polyglot Persistence
○ Means when when storing data use
multiple data storage technologies,
chosen based on the way data is used
○ Similar to Polyglot Programming
NoSQL Tradeoﬀs
Now, the question is what are we losing ?
● No Relationship among data ⇒ No Joins
● However, we are losing something more ⇒ consistency (What !!)
● CAP Theorem
○ Consistency: Once data is written, all future read requests will contain that data
○ Availability: The database is always available and responsive
○ Partition Tolerance: One part of the database can go down without aﬀecting others
● This theorem says that in a distributed we can choose only 2 out C, A and P.
● NoSQL ensures availability and partition-tolerance
● However it ensures eventual consistency
Outline
• What is NoSQL?

• How is it diﬀerent from SQL?

• Why do we need NoSQL?

• NoSQL Database types

• How to choose between SQL and NoSQL?

• Case Study: Amazon DynamoDB

NoSQL Database Types
4 types of NoSQL DB:
● Document Based
○ Uses collections and documents rather than tables and rows
○ Usual formats: XML, JSON, BSON
○ Use cases: CMS, blogging platforms, real-time analytics, ecommerce-applications
○ Examples: MongoDB, CouchDB, Amazon DocumentDB etc.

● Graph Based
○ Used to store information about networks of data, such as social connections
○ Examples: Neo4j, Giraph etc.
NoSQL Database Types (contd.)
● Key-Value Pairs
○ Similar to hash tables with a unique key and pointer to a data (usually BLOBs)
○ Use Case: maintaining session info, user proﬁles, preferences, shopping cart etc.
○ Examples: Redis, Amazon DynamoDB, Facebook’s Memcached etc.
○ Note: Avoid using K-V pairs if you want to query by data

● Column based
○ Data is arranged as columns instead of rows, with keys pointing to multiple columns
○ Supports eﬃcient representation of sparse data
○ Designed to store and process large amounts of data distributed over many machines
○ Examples: Apache Cassandra, HBase etc.
How to choose between SQL and NoSQL?
Criteria Use Case SQL NoSQL
ACID Banking Systems, Inventory Suitable: SQL ensures Not Suitable (No ACID
Compliance Management Systems ACID compliance compliance)
Complex Queries Reporting, analytics, and data Suitable: Supports Not Suitable: Best for
manipulation complex queries with simple queries and fast
JOINs lookups
Scalability Handling large amounts of Not Suitable (Vertical Suitable (Horizontal
data Scaling) Scaling)
Data E-commerce System: Suitable Not Suitable
Relationships Managing products,
categories, and customers.
Data Variety Structures (ERP), Suitable for Structured Suitable for Unstructured
Unstructured (Big Data data Data
Applications)
Outline
• What is NoSQL?

• How is it diﬀerent from SQL?

• Why do we need NoSQL?

• NoSQL Database types

• How to choose between SQL and NoSQL?

• Case Study: Amazon DynamoDB

Amazon DynamoDB
Let’s begin with a story:

In 2021, there was a 66-hour Amazon Prime Day shopping event

● The event generated some staggering stats:

● Trillions of API calls were made to the database by Amazon applications
● The peak load to the database reached 89 million requests per second
● The DB provided single-digit ms performance while maintaining high availability

All of this was made possible by DynamoDB

Amazon DynamoDB (contd.)
What is DynamoDB?
● Fully managed NoSQL database

● Multi-Tenant

● Flexible Schema

● Predictable Performance

● Highly Available

● Boundless Scale

source: Link
DynamoDB Architecture
DynamoDB Tables
● Consists of items which is in turn a collection of attributes
● Items uniquely identified by primary key
● Schema of primary key specified at table creation
● The primary key can be a simple partition key or a composite key, or a
combination of both partition and sort keys
● Partition key determines the physical storage location of the item
● DynamoDB also supports secondary indexes to query data using alternate keys
DynamoDB Architecture (contd.)
Interface
DynamoDB Architecture (contd.)
Partitioning and Replication
A DynamoDB table is divided into multiple partitions. This provides two benefits:
● Handling more throughput as requests increase
● Store more data as the table grows
But what about the availability guarantees of these partitions?
● Each partition has multiple replicas distributed across availability zones
● Together, these replicas form a replication group and improve the partition’s
availability and durability
What are these ?
DynamoDB Architecture (contd.)

More on Replication Groups

● They consist of storage replicas containing:
○ Write-Ahead Logs (WALs)
○ B-tree that stores the key value data
● They can also contain just the WAL entries
● They are known as log replicas
DynamoDB Architecture (contd.)
An issue in Partitioning and Replication
While replicating data across multiple nodes, guaranteeing a consensus becomes
a big issue. What if each partition has a different value for a particular key?
⇒ DynamoDB solves it using Multi-Paxos
Key idea is as follows:
● The leader processes all write requests by generating a WAL record and sending
it to the replicas. A write is acknowledged to the application once a quorum of
replicas stores the log record to their local write-ahead logs.
● The leader also serves strongly consistent read requests. On the other hand, any
other replica can serve eventually consistent reads.
DynamoDB Architecture (contd.)
DynamoDB Request Flow
DynamoDB Architecture (contd.)
DynamoDB Request Flow
● Requests arrive at the request router service. This service is responsible for
routing each request to the appropriate storage node
● The request router first checks whether the request is valid by calling the
authentication service (AWS IAM)
● Next, the request router fetches the routing information from the metadata
service. The metadata service stores routing information about the tables,
indexes, and replication groups for keys of a given table or index
● The request router also checks the global admission control to make sure that
the request doesn’t exceed the resource limit for the table
Hot Partitions and Throughput dilation
● In the initial release, DynamoDB allowed customers to explicitly specify the
throughput requirements for a table in terms of read capacity units (RCUs) and write
capacity units (WCUs).
● As the demand from a table changed (based on size and load), it could be split into
partitions.
For eg:
● Let’s say a partition has a maximum throughput of 1000 WCUs.
● Table Capacity 3200 WCUs ⇒ 4 partitions, each of 800 WCU
● Now, if Table Capacity increases to 6000 WCUs ⇒ 8 partitions, each of 750 WCU
Hot Partitions and Throughput dilation (contd.)
● All of this was controlled by the admission control system to make sure that
storage nodes don’t become overloaded.
● However, this approach assumed a uniform distribution of throughput across all
partitions, resulting in some problems.
● Two direct consequences of this approach:
○ Hot Partitions: More traffic consistently went to a few items on the tables
rather than an even distribution
○ Throughput dilution: Splitting a partition reduces per-partition throughput, as
it is equally divided among the child partitions( in earlier example: 1000 WCU
→ 800 WCU → 750 WCU)
Hot Partitions and Throughput dilation (contd.)
Well… then how did the Amazon Engineers solved it ?
They introduced 2 main ideas to solve it:
● Bursting:
○ The idea behind bursting was to let applications tap into this unused capacity at a
partition level to absorb short-lived spikes for up to 300 seconds.
○ It’s the same as storing money in the bank from your salary each month to buy a
new car with all those savings.
● Adaptive Capacity:
○ monitors the provisioned and consumed capacity of all the tables
○ If a table experiences throttling while staying within its table-level throughput, it
automatically boosts the allocated throughput of its partitions and vice-versa
References
● Introduction to NoSQL: Link
● NoSQL Databases ~ Couchbase: Link
● DynamoDB paper: Link (Usenix ATC 2022), Link (annotated by Arpit Bhayani)
● A deep dive into DynamoDB ~ ByteByteGo: Link
● Amazon AWS DynamoDB page: Link
● Apache Cassandra: Link
● Memcached: Link
● ScyllaDB: Link

Dell 1 Files 10043 13275 Folder Median
100% (2)
Dell 1 Files 10043 13275 Folder Median
808 pages
Amazon Dynamo DB - Presentation
100% (1)
Amazon Dynamo DB - Presentation
30 pages
Dynamo DB
No ratings yet
Dynamo DB
20 pages
41 NoSQL Introduction.pptx
No ratings yet
41 NoSQL Introduction.pptx
18 pages
Module 5_NoSQL databases
No ratings yet
Module 5_NoSQL databases
33 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
NoSQL_Notes
No ratings yet
NoSQL_Notes
11 pages
Dod Unit2
No ratings yet
Dod Unit2
22 pages
01 NSQL
No ratings yet
01 NSQL
5 pages
NoSQL Tutorial - New
No ratings yet
NoSQL Tutorial - New
10 pages
BDA MODULE 3
No ratings yet
BDA MODULE 3
20 pages
UNIT 4 CAP MONGODB
No ratings yet
UNIT 4 CAP MONGODB
23 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
NoSQL Databases
No ratings yet
NoSQL Databases
8 pages
Dynamo DB
No ratings yet
Dynamo DB
42 pages
Unit No 1
No ratings yet
Unit No 1
34 pages
NoSQL (1)
No ratings yet
NoSQL (1)
12 pages
What Is NoSQL
No ratings yet
What Is NoSQL
10 pages
CS3492-DBMS unit-5
No ratings yet
CS3492-DBMS unit-5
9 pages
Lec 15 Notes
No ratings yet
Lec 15 Notes
3 pages
Unit 2
No ratings yet
Unit 2
26 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
No SQL
No ratings yet
No SQL
109 pages
NoSQL MongoDB HBase Cassandra
100% (1)
NoSQL MongoDB HBase Cassandra
142 pages
Mongo Nosql
No ratings yet
Mongo Nosql
12 pages
RK NoSQL
No ratings yet
RK NoSQL
35 pages
1842-week6-NoSQL
No ratings yet
1842-week6-NoSQL
51 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
15 pages
PPT 2.2.1
No ratings yet
PPT 2.2.1
26 pages
Unit 3 NoSQL
No ratings yet
Unit 3 NoSQL
98 pages
Bda Unit-2
No ratings yet
Bda Unit-2
29 pages
No SQL Lecture Notes
No ratings yet
No SQL Lecture Notes
17 pages
CHAPTER 03: Big Data Technology Landscape
No ratings yet
CHAPTER 03: Big Data Technology Landscape
81 pages
Chapter 5-NoSQL PDF
No ratings yet
Chapter 5-NoSQL PDF
47 pages
No SQL - Types, CAP Theorem(4)
No ratings yet
No SQL - Types, CAP Theorem(4)
12 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
No ratings yet
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
17 pages
Full Stack-Unit-Iii
No ratings yet
Full Stack-Unit-Iii
56 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
No SQL
No ratings yet
No SQL
11 pages
No SQLMongo DB
No ratings yet
No SQLMongo DB
47 pages
NoSQL
No ratings yet
NoSQL
18 pages
Unit 3
No ratings yet
Unit 3
10 pages
NoSQL Big Data Management
No ratings yet
NoSQL Big Data Management
36 pages
RSDB En-Us SG M06 Dynamodb
No ratings yet
RSDB En-Us SG M06 Dynamodb
20 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
No SQL
No ratings yet
No SQL
19 pages
04 Surveys Cattell PDF
No ratings yet
04 Surveys Cattell PDF
16 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Unit 2 _ Big Data Analytics_CCS334
No ratings yet
Unit 2 _ Big Data Analytics_CCS334
36 pages
NoSQL Databases (MongoDB-Cassandra)
No ratings yet
NoSQL Databases (MongoDB-Cassandra)
13 pages
AWS1-1
No ratings yet
AWS1-1
38 pages
Introduction To Nosql: - Key Value Databases
No ratings yet
Introduction To Nosql: - Key Value Databases
14 pages
BIG Data 2
No ratings yet
BIG Data 2
18 pages
Big Data - RDBMS, NoSQL and DynamoDB
No ratings yet
Big Data - RDBMS, NoSQL and DynamoDB
6 pages
2 - Disadvantages of NoSQL Technology
No ratings yet
2 - Disadvantages of NoSQL Technology
3 pages
IntroNoSQL (3)
No ratings yet
IntroNoSQL (3)
44 pages
The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management
From Everand
The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management
Robert Johnson
No ratings yet
DBA's Guide to NoSQL
From Everand
DBA's Guide to NoSQL
The Enlightened DBA
5/5 (1)
HCL GDPR Compliance White Paper
No ratings yet
HCL GDPR Compliance White Paper
12 pages
K P I S: PJ Erdana T Ervices
No ratings yet
K P I S: PJ Erdana T Ervices
9 pages
The Sovereign Individual - Habeas Data and Right To Informational Privacy ALJ
No ratings yet
The Sovereign Individual - Habeas Data and Right To Informational Privacy ALJ
8 pages
Distributed Database Management Systems: Week-4
No ratings yet
Distributed Database Management Systems: Week-4
24 pages
Journal Impact Factors: A Tool To Help Identify Key Research in Your Field
No ratings yet
Journal Impact Factors: A Tool To Help Identify Key Research in Your Field
28 pages
Box Governance Datasheet (External)
No ratings yet
Box Governance Datasheet (External)
2 pages
Kegunaan Opac Pada Sarana Temu Kembali Informasi Di Perpustakaan Fakultas Adab Dan Humaniora Uin Raden Fatah Palembang
No ratings yet
Kegunaan Opac Pada Sarana Temu Kembali Informasi Di Perpustakaan Fakultas Adab Dan Humaniora Uin Raden Fatah Palembang
10 pages
13 SQL Statements For 90 - of Your Data Analysis Tasks. by Abhishek Saud Mar, 2023 Medium
No ratings yet
13 SQL Statements For 90 - of Your Data Analysis Tasks. by Abhishek Saud Mar, 2023 Medium
18 pages
DBMS Int 306 PDF
No ratings yet
DBMS Int 306 PDF
12 pages
Accounting Information Systems 14e Romney Chapter 4
100% (1)
Accounting Information Systems 14e Romney Chapter 4
40 pages
Mla - 2 (Cia - 3) - 20221013
No ratings yet
Mla - 2 (Cia - 3) - 20221013
21 pages
Mgt1051 Business-Analytics-For-Engineers TH 1.1 47 Mgt1051
No ratings yet
Mgt1051 Business-Analytics-For-Engineers TH 1.1 47 Mgt1051
2 pages
Shivansh Hedau Resume
No ratings yet
Shivansh Hedau Resume
1 page
File Organization
No ratings yet
File Organization
6 pages
L01-Introduction To Data Warehouse and Business Intelligence
No ratings yet
L01-Introduction To Data Warehouse and Business Intelligence
42 pages
Herramientas Osint
No ratings yet
Herramientas Osint
4 pages
Pam's Tillerson Research Log
No ratings yet
Pam's Tillerson Research Log
2 pages
Olap (Online Analytical Processing)
No ratings yet
Olap (Online Analytical Processing)
8 pages
Net - How To Connect Access Database in C# - Stack Overflow
No ratings yet
Net - How To Connect Access Database in C# - Stack Overflow
4 pages
Microsoft Azure Data Fundamentals
No ratings yet
Microsoft Azure Data Fundamentals
60 pages
Amaan UIP Expt-1
No ratings yet
Amaan UIP Expt-1
7 pages
The Truth About Blockchain
No ratings yet
The Truth About Blockchain
9 pages
Etech Lesson 5
No ratings yet
Etech Lesson 5
14 pages
ASE 15.7 Using Backup Server With IBM Tivoli Storage Manager Sybase Inc
No ratings yet
ASE 15.7 Using Backup Server With IBM Tivoli Storage Manager Sybase Inc
20 pages
Google Cloud Professional ML Engineer Certification Notes
No ratings yet
Google Cloud Professional ML Engineer Certification Notes
7 pages
Power BI - Data Modeling
100% (1)
Power BI - Data Modeling
17 pages
The Mysterious Island - Jules Verne
No ratings yet
The Mysterious Island - Jules Verne
526 pages
An Introduction To Oracle Hyperion Technology by Amit Sharma
No ratings yet
An Introduction To Oracle Hyperion Technology by Amit Sharma
16 pages
EvaTalk A Chatbot System For The Brazilian Government Virtual School
No ratings yet
EvaTalk A Chatbot System For The Brazilian Government Virtual School
7 pages