0% found this document useful (0 votes)
32 views26 pages

Recent Trends - Nosql Database Management

The document discusses NoSQL databases and the CAP theorem. It provides an introduction to NoSQL and why it was created, examples of NoSQL databases, and an explanation of the CAP theorem. The CAP theorem states that a distributed system can only guarantee two of three properties: consistency, availability, and partition tolerance. NoSQL databases typically sacrifice consistency for availability and partition tolerance in order to be highly scalable and handle large, growing amounts of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views26 pages

Recent Trends - Nosql Database Management

The document discusses NoSQL databases and the CAP theorem. It provides an introduction to NoSQL and why it was created, examples of NoSQL databases, and an explanation of the CAP theorem. The CAP theorem states that a distributed system can only guarantee two of three properties: consistency, availability, and partition tolerance. NoSQL databases typically sacrifice consistency for availability and partition tolerance in order to be highly scalable and handle large, growing amounts of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

RECENT TRENDS - NOSQL

DATABASE MANAGEMENT

• GROUP 19-
• 18BCI0162- SHUBHAM
SAREEN
• 18BCE0754-RAJ ADROJA
• 18BCI0111-VIBHUTI
SRIVASTAVA
INTRODUCTION

• In the computing system (web and business applications), there are enormous data that
comes out every day from the web. A large section of these data is handled by Relational
database management systems (RDBMS). The idea of relational model came with
E.F.Codd’s 1970 paper "A relational model of data for large shared data banks" which
made data modeling and application programming much easier. Beyond the intended
benefits, the relational model is well-suited to client-server programming and today it is
predominant technology for storing structured data in web and business applications.
WHAT IS NOSQL?

Stands for Not Only SQL


• Class of non-relational data storage systems
• Usually do not require a fixed table schema nor do they use the concept of joins
• All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)
• NoSQL is a non-relational database management systems, different from traditional relational database
management systems in some significant ways. It is designed for distributed data stores where very large
scale of data storing needs (for example Google or Facebook which collects terabits of data every day
for their users).
• These type of data storing may not require fixed schema, avoid join operations and typically scale
horizontally
WHY NOSQL?

• In today’s time data is becoming easier to access and capture through third parties such as Facebook, Google+
and others. Personal user information, social graphs, geo location data, user-generated content and machine
logging data are just a few examples where the data has been increasing exponentially. To avail the above service
properly, it is required to process huge amount of data. Which SQL databases were never designed. The
evolution of NoSql databases is to handle these huge data properly.

• For data storage, an RDBMS cannot be the be-all/end-all


• Just as there are different programming languages, need to have other data storage tools in
the toolbox
• A NoSQL solution is more acceptable to a client now than even a year ago
• Think about proposing a Ruby/Rails or Groovy/Grails solution now versus a couple of years ago
WEB APPLICATIONS DRIVING DATA GROWTH
EXAMPLES

• Social-network graph:
• Each record: UserID1, UserID2
• Separate records: UserID, first_name,last_name, age, gender,...
• Task: Find all friends of friends of friends of ... friends of a given user.
BRIEF HISTORY OF NOSQL

• The term NoSQL was coined by Carlo Strozzi in the year 1998. He used this term to name
his Open Source, Light Weight, DataBase which did not have an SQL interface.
• In the early 2009, when last.fm wanted to organize an event on open-source distributed
databases, Eric Evans, a Rackspace employee, reused the term to refer databases which are
non-relational, distributed, and does not conform to atomicity, consistency, isolation, durability
- four obvious features of traditional relational database systems.
• In the same year, the "no:sql(east)" conference held in Atlanta, USA, NoSQL was discussed and
debated a lot.
• And then, discussion and practice of NoSQL got a momentum, and NoSQL saw an
unprecedented growth.
RDBMS VS NOSQL

• RDBMS
• Structured and organized data
• Structured query language (SQL)
• Data and its relationships are stored in separate tables.
• Data Manipulation Language, Data Definition Language
• Tight Consistency
NO SQL

• Stands for Not Only SQL


• No declarative query language
• No predefined schema
• Key-Value pair storage, Column Store, Document Store, Graph databases
• Eventual consistency rather ACID property
• Unstructured and unpredictable data
• CAP Theorem
• Prioritizes high performance, high availability and scalability
• BASE Transaction
HOW DID WE GET HERE?

• Explosion of social media sites (Facebook, Twitter) with large data needs
• Rise of cloud-based solutions such as Amazon S3 (simple storage solution)
• Just as moving to dynamically-typed languages (Ruby/Groovy), a shift to dynamically-typed
data with frequent schema changes
• Open-source community
DYNAMO AND BIGTABLE

• Three major papers were the seeds of the NoSQL movement


• BigTable (Google)
• Dynamo (Amazon)
• Gossip protocol (discovery and error detection)
• Distributed key-value data store
• Eventual consistency
• CAP Theorem (discuss in a sec ..)
THE PERFECT STORM

• Large datasets, acceptance of alternatives, and dynamically-typed data has come together
in a perfect storm
• Not a backlash/rebellion against RDBMS
• SQL is a rich query language that cannot be rivaled by the current list of NoSQL
offerings
NO SQL PROS/CONS

• Advantages :
• High scalability
• Distributed Computing
• Lower cost
• Schema flexibility, semi-structure data
• No complicated Relationships
• Disadvantages
• No standardization
• Limited query capabilities (so far)
• Eventual consistent is not intuitive to program for
CAP THEOREM

• Three properties of a system: consistency, availability and partitions


• You can have at most two of these three properties for any shared-data
system
• To scale out, you have to partition. That leaves either consistency or
availability to choose from
• In almost all cases, you would choose availability over consistency
AVAILABILITY

• Traditionally, thought of as the server/process available five 9’s (99.999 %).


• However, for large node system, at almost any point in time there’s a good
chance that a node is either down or there is a network disruption among
the nodes.
• Want a system that is resilient in the face of network disruption
NOSQL CATEGORIES

There are four general types (most common categories) of NoSQL databases. Each of these
categories has its own specific attributes and limitations. There is not a single solutions which is
better than all the others, however there are some databases that are better to solve specific
problems. To clarify the NoSQL databases, lets discuss the most common categories :
• • Key-value stores
• • Column-oriented
• • Graph
• • Document oriented
CAP THEOREM

The CAP theorem is a tool used to makes system designers aware of the trade-offs while
designing networked shared-data systems. CAP has influenced the design of many
distributed data systems. It made designers aware of a wide range of tradeoffs to consider
while designing distributed data systems. Over the years, the CAP theorem has been a
widely misunderstood tool used to categorize databases. There is much misinformation
floating around about CAP. Most blog posts on CAP are historical and possibly incorrect.
• The CAP theorem applies to distributed systems that store state. Eric Brewer, at the
2000 Symposium on Principles of Distributed Computing (PODC), conjectured that in
any networked shared-data system there is a fundamental trade-off between consistency,
availability, and partition tolerance.
• In 2002, Seth Gilbert and Nancy Lynch of MIT published a formal proof of Brewer's
conjecture. The theorem states that networked shared-data systems can only
guarantee/strongly support two of the following three properties:
• Cons istency — A guarantee that every node in a distributed cluster returns the same, most recent,
successful write. Consistency refers to every client having the same view of the data. There are various
types of consistency models. Consistency in CAP (used to prove the theorem) refers to linearizability or
sequential consistency, a very strong form of consistency.
• Availability — Every non-failing node returns a response for all read and write requests in a reasonable
amount of time. The key word here is every. To be available, every node on (either side of a network
partition) must be able to respond in a reasonable amount of time.
• Partition Tolerant — The system continues to function and upholds its consistency guarantees in spite
of network partitions. Network partitions are a fact of life. Distributed systems guaranteeing partition
tolerance can gracefully recover from partitions once the partition heals.
• The CAP theorem categorizes systems into three categories:
• CP (Consistent and Partition Tolerant) — At first glance, the CP category is confusing, i.e., a
system that is consistent and partition tolerant but never available. CP is referring to a
category of systems where availability is sacrificed only in the case of a network partition.
• CA (Consistent and Available) — CA systems are consistent and available systems in the
absence of any network partition. Often a single node's DB servers are categorized as CA
systems. Single node DB servers do not need to deal with partition tolerance and are thus
considered CA systems. The only hole in this theory is that single node DB systems are not a
network of shared data systems and thus do not fall under the preview of CAP. [^11]
• The part where all three sections intersect is white because it is impossible to have all
three properties in networked shared-data systems. A Venn diagram or a triangle is
an incorrect visualization of the CAP. Any CAP theorem visualization such as a triangle or
a Venn diagram is misleading. The correct way to think about CAP is that in case of a
network partition (a rare occurrence) one needs to choose between availability
and consistency.
• In any networked shared-data systems partition tolerance is a must. Network partitions
and dropped messages are a fact of life and must be handled appropriately. Consequently,
system designers must choose between consistency and availability.
• Simplistically speaking, a network partition forces designers to either choose perfect
consistency or perfect availability. Picking consistency means not being able to answer a
client's query as the system cannot guarantee to return the most recent write. This
sacrifices availability.
• Network partition forces nonfailing nodes to reject clients' requests as these nodes
cannot guarantee consistent data. At the opposite end of the spectrum, being available
means being able to respond to a client's request but the system cannot guarantee
consistency, i.e., the most recent value written. Available systems provide the best
possible answer under the given circumstance.
• During normal operation (lack of network partition) the CAP theorem does not
impose constraints on availability or consistency.
• The CAP theorem is responsible for instigating the discussion about the various tradeoffs
in a distributed shared data system. It has played a pivotal role in increasing our
understanding of shared data systems. Nonetheless, the CAP theorem is criticized for
being too simplistic and often misleading. Over a decade after the release of the CAP
theorem, Brewer acknowledges that the CAP theorem oversimplified the choices
available in the event of a network partition.
• According to Brewer, the CAP theorem prohibits only a “tiny part of the design space:
perfect availability and consistency in the presence of partitions, which are rare." System
designers have a broad range of options for dealing and recovering from network
partitions. The goal of every system must be to “maximize combinations of consistency
and availability that make sense for the specific application.”

You might also like