Recent Trends - Nosql Database Management

The document discusses NoSQL databases and the CAP theorem. It provides an introduction to NoSQL and why it was created, examples of NoSQL databases, and an explanation of the CAP theorem. The CAP theorem states that a distributed system can only guarantee two of three properties: consistency, availability, and partition tolerance. NoSQL databases typically sacrifice consistency for availability and partition tolerance in order to be highly scalable and handle large, growing amounts of data.

Uploaded by

Vibhuti Srivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views26 pages

Recent Trends - Nosql Database Management

Uploaded by

Vibhuti Srivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

RECENT TRENDS - NOSQL

DATABASE MANAGEMENT

• GROUP 19-
• 18BCI0162- SHUBHAM
SAREEN
• 18BCE0754-RAJ ADROJA
• 18BCI0111-VIBHUTI
SRIVASTAVA
INTRODUCTION

• In the computing system (web and business applications), there are enormous data that
comes out every day from the web. A large section of these data is handled by Relational
database management systems (RDBMS). The idea of relational model came with
E.F.Codd’s 1970 paper "A relational model of data for large shared data banks" which
made data modeling and application programming much easier. Beyond the intended
benefits, the relational model is well-suited to client-server programming and today it is
predominant technology for storing structured data in web and business applications.
WHAT IS NOSQL?

Stands for Not Only SQL

• Class of non-relational data storage systems
• Usually do not require a fixed table schema nor do they use the concept of joins
• All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)
• NoSQL is a non-relational database management systems, different from traditional relational database
management systems in some significant ways. It is designed for distributed data stores where very large
scale of data storing needs (for example Google or Facebook which collects terabits of data every day
for their users).
• These type of data storing may not require fixed schema, avoid join operations and typically scale
horizontally
WHY NOSQL?

• In today’s time data is becoming easier to access and capture through third parties such as Facebook, Google+
and others. Personal user information, social graphs, geo location data, user-generated content and machine
logging data are just a few examples where the data has been increasing exponentially. To avail the above service
properly, it is required to process huge amount of data. Which SQL databases were never designed. The
evolution of NoSql databases is to handle these huge data properly.

• For data storage, an RDBMS cannot be the be-all/end-all

• Just as there are different programming languages, need to have other data storage tools in
the toolbox
• A NoSQL solution is more acceptable to a client now than even a year ago
• Think about proposing a Ruby/Rails or Groovy/Grails solution now versus a couple of years ago
WEB APPLICATIONS DRIVING DATA GROWTH
EXAMPLES

• Social-network graph:
• Each record: UserID1, UserID2
• Separate records: UserID, first_name,last_name, age, gender,...
• Task: Find all friends of friends of friends of ... friends of a given user.
BRIEF HISTORY OF NOSQL

• The term NoSQL was coined by Carlo Strozzi in the year 1998. He used this term to name
his Open Source, Light Weight, DataBase which did not have an SQL interface.
• In the early 2009, when last.fm wanted to organize an event on open-source distributed
databases, Eric Evans, a Rackspace employee, reused the term to refer databases which are
non-relational, distributed, and does not conform to atomicity, consistency, isolation, durability
- four obvious features of traditional relational database systems.
• In the same year, the "no:sql(east)" conference held in Atlanta, USA, NoSQL was discussed and
debated a lot.
• And then, discussion and practice of NoSQL got a momentum, and NoSQL saw an
unprecedented growth.
RDBMS VS NOSQL

• RDBMS
• Structured and organized data
• Structured query language (SQL)
• Data and its relationships are stored in separate tables.
• Data Manipulation Language, Data Definition Language
• Tight Consistency
NO SQL

• Stands for Not Only SQL

• No declarative query language
• No predefined schema
• Key-Value pair storage, Column Store, Document Store, Graph databases
• Eventual consistency rather ACID property
• Unstructured and unpredictable data
• CAP Theorem
• Prioritizes high performance, high availability and scalability
• BASE Transaction
HOW DID WE GET HERE?

• Explosion of social media sites (Facebook, Twitter) with large data needs
• Rise of cloud-based solutions such as Amazon S3 (simple storage solution)
• Just as moving to dynamically-typed languages (Ruby/Groovy), a shift to dynamically-typed
data with frequent schema changes
• Open-source community
DYNAMO AND BIGTABLE

• Three major papers were the seeds of the NoSQL movement

• BigTable (Google)
• Dynamo (Amazon)
• Gossip protocol (discovery and error detection)
• Distributed key-value data store
• Eventual consistency
• CAP Theorem (discuss in a sec ..)
THE PERFECT STORM

• Large datasets, acceptance of alternatives, and dynamically-typed data has come together
in a perfect storm
• Not a backlash/rebellion against RDBMS
• SQL is a rich query language that cannot be rivaled by the current list of NoSQL
offerings
NO SQL PROS/CONS

• Advantages :
• High scalability
• Distributed Computing
• Lower cost
• Schema flexibility, semi-structure data
• No complicated Relationships
• Disadvantages
• No standardization
• Limited query capabilities (so far)
• Eventual consistent is not intuitive to program for
CAP THEOREM

• Three properties of a system: consistency, availability and partitions

• You can have at most two of these three properties for any shared-data
system
• To scale out, you have to partition. That leaves either consistency or
availability to choose from
• In almost all cases, you would choose availability over consistency
AVAILABILITY

• Traditionally, thought of as the server/process available five 9’s (99.999 %).

• However, for large node system, at almost any point in time there’s a good
chance that a node is either down or there is a network disruption among
the nodes.
• Want a system that is resilient in the face of network disruption
NOSQL CATEGORIES

There are four general types (most common categories) of NoSQL databases. Each of these
categories has its own specific attributes and limitations. There is not a single solutions which is
better than all the others, however there are some databases that are better to solve specific
problems. To clarify the NoSQL databases, lets discuss the most common categories :
• • Key-value stores
• • Column-oriented
• • Graph
• • Document oriented
CAP THEOREM

The CAP theorem is a tool used to makes system designers aware of the trade-offs while
designing networked shared-data systems. CAP has influenced the design of many
distributed data systems. It made designers aware of a wide range of tradeoffs to consider
while designing distributed data systems. Over the years, the CAP theorem has been a
widely misunderstood tool used to categorize databases. There is much misinformation
floating around about CAP. Most blog posts on CAP are historical and possibly incorrect.
• The CAP theorem applies to distributed systems that store state. Eric Brewer, at the
2000 Symposium on Principles of Distributed Computing (PODC), conjectured that in
any networked shared-data system there is a fundamental trade-off between consistency,
availability, and partition tolerance.
• In 2002, Seth Gilbert and Nancy Lynch of MIT published a formal proof of Brewer's
conjecture. The theorem states that networked shared-data systems can only
guarantee/strongly support two of the following three properties:
• Cons istency — A guarantee that every node in a distributed cluster returns the same, most recent,
successful write. Consistency refers to every client having the same view of the data. There are various
types of consistency models. Consistency in CAP (used to prove the theorem) refers to linearizability or
sequential consistency, a very strong form of consistency.
• Availability — Every non-failing node returns a response for all read and write requests in a reasonable
amount of time. The key word here is every. To be available, every node on (either side of a network
partition) must be able to respond in a reasonable amount of time.
• Partition Tolerant — The system continues to function and upholds its consistency guarantees in spite
of network partitions. Network partitions are a fact of life. Distributed systems guaranteeing partition
tolerance can gracefully recover from partitions once the partition heals.
• The CAP theorem categorizes systems into three categories:
• CP (Consistent and Partition Tolerant) — At first glance, the CP category is confusing, i.e., a
system that is consistent and partition tolerant but never available. CP is referring to a
category of systems where availability is sacrificed only in the case of a network partition.
• CA (Consistent and Available) — CA systems are consistent and available systems in the
absence of any network partition. Often a single node's DB servers are categorized as CA
systems. Single node DB servers do not need to deal with partition tolerance and are thus
considered CA systems. The only hole in this theory is that single node DB systems are not a
network of shared data systems and thus do not fall under the preview of CAP. [^11]
• The part where all three sections intersect is white because it is impossible to have all
three properties in networked shared-data systems. A Venn diagram or a triangle is
an incorrect visualization of the CAP. Any CAP theorem visualization such as a triangle or
a Venn diagram is misleading. The correct way to think about CAP is that in case of a
network partition (a rare occurrence) one needs to choose between availability
and consistency.
• In any networked shared-data systems partition tolerance is a must. Network partitions
and dropped messages are a fact of life and must be handled appropriately. Consequently,
system designers must choose between consistency and availability.
• Simplistically speaking, a network partition forces designers to either choose perfect
consistency or perfect availability. Picking consistency means not being able to answer a
client's query as the system cannot guarantee to return the most recent write. This
sacrifices availability.
• Network partition forces nonfailing nodes to reject clients' requests as these nodes
cannot guarantee consistent data. At the opposite end of the spectrum, being available
means being able to respond to a client's request but the system cannot guarantee
consistency, i.e., the most recent value written. Available systems provide the best
possible answer under the given circumstance.
• During normal operation (lack of network partition) the CAP theorem does not
impose constraints on availability or consistency.
• The CAP theorem is responsible for instigating the discussion about the various tradeoffs
in a distributed shared data system. It has played a pivotal role in increasing our
understanding of shared data systems. Nonetheless, the CAP theorem is criticized for
being too simplistic and often misleading. Over a decade after the release of the CAP
theorem, Brewer acknowledges that the CAP theorem oversimplified the choices
available in the event of a network partition.
• According to Brewer, the CAP theorem prohibits only a “tiny part of the design space:
perfect availability and consistency in the presence of partitions, which are rare." System
designers have a broad range of options for dealing and recovering from network
partitions. The goal of every system must be to “maximize combinations of consistency
and availability that make sense for the specific application.”

NOSQL
No ratings yet
NOSQL
23 pages
Tut 4
No ratings yet
Tut 4
5 pages
Module 2.3
No ratings yet
Module 2.3
25 pages
IntroNoSQL (3)
No ratings yet
IntroNoSQL (3)
44 pages
IntroNoSQL Revised
No ratings yet
IntroNoSQL Revised
28 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
RK NoSQL
No ratings yet
RK NoSQL
35 pages
Big Data Analytics Lecture 3A
No ratings yet
Big Data Analytics Lecture 3A
27 pages
Module_1
No ratings yet
Module_1
69 pages
Unit 4
No ratings yet
Unit 4
47 pages
2- NoSQL
No ratings yet
2- NoSQL
32 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
Nosql
No ratings yet
Nosql
12 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
BDA MODULE 3
No ratings yet
BDA MODULE 3
20 pages
ngd unit 1-4
No ratings yet
ngd unit 1-4
43 pages
UNIT 4 CAP MONGODB
No ratings yet
UNIT 4 CAP MONGODB
23 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
No SQL
No ratings yet
No SQL
19 pages
No SQL
No ratings yet
No SQL
109 pages
BDA UT2 QB Answers
100% (1)
BDA UT2 QB Answers
22 pages
NoSQL Database
No ratings yet
NoSQL Database
64 pages
Data Engineering Unit 3
No ratings yet
Data Engineering Unit 3
4 pages
Unit 4-DBP
No ratings yet
Unit 4-DBP
66 pages
4unit NoSQL
No ratings yet
4unit NoSQL
27 pages
DBMS - Unit 6 (Advances in Databases)
No ratings yet
DBMS - Unit 6 (Advances in Databases)
19 pages
BDA Module-3
No ratings yet
BDA Module-3
7 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
4.NoSQL 1
No ratings yet
4.NoSQL 1
69 pages
big_data_topic4_[nosql_database]_[thanh_binh_nguyen].TextMark
No ratings yet
big_data_topic4_[nosql_database]_[thanh_binh_nguyen].TextMark
53 pages
8.4 NoSQL Database
No ratings yet
8.4 NoSQL Database
36 pages
nosql-kk
No ratings yet
nosql-kk
23 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
Unit VI_1
No ratings yet
Unit VI_1
31 pages
NoSQL
No ratings yet
NoSQL
29 pages
CAP Theorem
No ratings yet
CAP Theorem
39 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
30 pages
Module-2
No ratings yet
Module-2
100 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
30 pages
NoSQL (1)
No ratings yet
NoSQL (1)
12 pages
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
No ratings yet
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
17 pages
NoSQL_Notes
No ratings yet
NoSQL_Notes
11 pages
Unit-5 Notes
No ratings yet
Unit-5 Notes
17 pages
4 NoSql
No ratings yet
4 NoSql
25 pages
Chapter_4_3d6b7fe08203468c915d52f43c8757c0_1712934164766
No ratings yet
Chapter_4_3d6b7fe08203468c915d52f43c8757c0_1712934164766
28 pages
Chapter24 Nosql Dbs
No ratings yet
Chapter24 Nosql Dbs
35 pages
Intro to NoSQL DBs
No ratings yet
Intro to NoSQL DBs
44 pages
Module 5_NoSQL databases
No ratings yet
Module 5_NoSQL databases
33 pages
Module-2
No ratings yet
Module-2
104 pages
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
CAP Theorem
No ratings yet
CAP Theorem
15 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
NoSQL
No ratings yet
NoSQL
39 pages
Mongo DB
No ratings yet
Mongo DB
66 pages
1504846528Session31-NoSQL
No ratings yet
1504846528Session31-NoSQL
12 pages
No SQL Lecture Notes
No ratings yet
No SQL Lecture Notes
17 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Database Schema New
No ratings yet
Database Schema New
12 pages
CCS341-Data Warehousing
No ratings yet
CCS341-Data Warehousing
7 pages
Database Approach
No ratings yet
Database Approach
12 pages
Bondoc Dec
No ratings yet
Bondoc Dec
12 pages
DFD 2
No ratings yet
DFD 2
9 pages
Brochure DevSecOps
No ratings yet
Brochure DevSecOps
1 page
Supply Chain Management-9724949948
No ratings yet
Supply Chain Management-9724949948
17 pages
Introduction CH 1 To Information Management Settingthe Scene 2007
No ratings yet
Introduction CH 1 To Information Management Settingthe Scene 2007
10 pages
Getting Started With Designing A Relational Database: Information Technology Services
No ratings yet
Getting Started With Designing A Relational Database: Information Technology Services
16 pages
SQL Cheat Sheet: By: Ika Purnamasari
No ratings yet
SQL Cheat Sheet: By: Ika Purnamasari
2 pages
12 Introduction To OLTP and OLAP
No ratings yet
12 Introduction To OLTP and OLAP
116 pages
What Is Data Ingestion? Big Data Architecture - Where Does Data Ingestion Fit ?
No ratings yet
What Is Data Ingestion? Big Data Architecture - Where Does Data Ingestion Fit ?
3 pages
Encryption - The Report Server Was Unable To Validate The Integrity of Encrypted Data - Stack Overflow
No ratings yet
Encryption - The Report Server Was Unable To Validate The Integrity of Encrypted Data - Stack Overflow
4 pages
Amazon Dynamodb A Scalable Predictably Performant and Fully Managed Nosql Database Service
No ratings yet
Amazon Dynamodb A Scalable Predictably Performant and Fully Managed Nosql Database Service
12 pages
ETL Testing Goals
No ratings yet
ETL Testing Goals
3 pages
DMDW Notes
100% (1)
DMDW Notes
62 pages
Strategic IT Resources-AIE
No ratings yet
Strategic IT Resources-AIE
4 pages
Backup Process For McAfee Devices
No ratings yet
Backup Process For McAfee Devices
11 pages
Scense Remove Client Files
No ratings yet
Scense Remove Client Files
3 pages
Data Governance Maturity Assessment at PT. XYZ. Case Study: Data Management Division
No ratings yet
Data Governance Maturity Assessment at PT. XYZ. Case Study: Data Management Division
7 pages
Report On Data Warehousing
No ratings yet
Report On Data Warehousing
12 pages
Model Question Analysis and Design of Information System IT-401
No ratings yet
Model Question Analysis and Design of Information System IT-401
5 pages
CV Product Introduction - 082022
No ratings yet
CV Product Introduction - 082022
39 pages
Operational Acceptance Testing Whitepaper
No ratings yet
Operational Acceptance Testing Whitepaper
11 pages
SQL Server 2012 Development
No ratings yet
SQL Server 2012 Development
8 pages
Oracle Database 19c: Backup and Recovery: Course Contents
No ratings yet
Oracle Database 19c: Backup and Recovery: Course Contents
5 pages
Bhavin BA TrinityGlobalTech
No ratings yet
Bhavin BA TrinityGlobalTech
5 pages
02.SAP HANA SP08 Course Content Details
No ratings yet
02.SAP HANA SP08 Course Content Details
3 pages
1) Architecture of Data Mining
No ratings yet
1) Architecture of Data Mining
10 pages