0% found this document useful (0 votes)
53 views17 pages

Fundamentals of Database Systems: (Nosql)

The document provides an overview of NoSQL databases, including: - The limitations of relational databases that led to the development of NoSQL databases. - The basics of NoSQL, including that it uses non-relational distributed databases for unstructured data with high availability. - The main types of NoSQL databases: document, graph, key-value, and column-family. - The advantages of NoSQL include elastic scaling, handling big data, low administration needs, and flexible data models. Challenges include lack of ACID guarantees and data security issues.

Uploaded by

thecoolguy96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views17 pages

Fundamentals of Database Systems: (Nosql)

The document provides an overview of NoSQL databases, including: - The limitations of relational databases that led to the development of NoSQL databases. - The basics of NoSQL, including that it uses non-relational distributed databases for unstructured data with high availability. - The main types of NoSQL databases: document, graph, key-value, and column-family. - The advantages of NoSQL include elastic scaling, handling big data, low administration needs, and flexible data models. Challenges include lack of ACID guarantees and data security issues.

Uploaded by

thecoolguy96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Outline Basics Advantages and Challenges Some Concepts Implementations

Fundamentals of Database Systems


[NoSQL]

Malay Bhattacharyya

Assistant Professor

Machine Intelligence Unit


Indian Statistical Institute, Kolkata
January, 2021
Outline Basics Advantages and Challenges Some Concepts Implementations

1 Basics

2 Advantages and Challenges

3 Some Concepts

4 Implementations
Outline Basics Advantages and Challenges Some Concepts Implementations

Limitations of relational databases

Relational databases are entirely dependent on relations

Relational databases cannot deal with data that are infeasible


of fitting to relations.
They cannot handle unpredictable, unstructured or messy
data.
They may require vertical and horizontal expansion of servers,
as the data or processing requirements grow.
Outline Basics Advantages and Challenges Some Concepts Implementations

What are beyond relational databases?

Non-relational distributable databases depending on unstructured


data. It’s high performance with high availability, and offers rich
query language and easy scalability.
CAP Theorem: Consistency, availability and partition tolerance.

NoSQL
Examples include - MongoDB, Hadoop, etc.

Note: Non-relational databases may not provide full ACID


(atomicity, consistency, isolation, durability) guarantees, but still
has a distributed and fault tolerant architecture.
Outline Basics Advantages and Challenges Some Concepts Implementations

Basics of NoSQL

NoSQL (often referred to as Not only SQL) is a broad class of


data management systems where the data is partitioned across a
set of servers, where no server plays a privileged role.

– NoSQL database management systems are useful when working


with a huge quantity of data (especially big data), when the data’s
nature does not require a relational model
– The data can be structured, but NoSQL is used when what really
matters is the ability to store and retrieve great quantities of data
(e.g., millions of keyvalue in one or a few associative arrays)
– This is particularly useful for statistical or real-time analysis of
growing lists of elements (e.g., huge number of Twitter posts,
Internet server logs from a large group of users)
– It provides flexibility of the data model (e.g., ERP and BPM)
Outline Basics Advantages and Challenges Some Concepts Implementations

History

“It’s not about BASE vs. ACID” – Emin Gün Sirer.

1998: Carlo Strozzi used the term NoSQL in 1998 to name his
lightweight, open-source relational database that did not expose
the standard SQL interface.
2009: Eric Evans, a Rackspace employee, reintroduced the term
NoSQL to label the emergence of a growing number of
non-relational, distributed data stores.
2011: Work began on UnQL (Unstructured Query Language), a
specification for a query language for NoSQL databases, designed
to query collections (versus tables) of documents (versus rows)
with loosely defined fields (versus columns).

Note: BASE refers to basically available, soft-state and eventual


consistency.
Outline Basics Advantages and Challenges Some Concepts Implementations

Types of NoSQL databases

NoSQL databases are mainly of the following types:


Document based
Graph based
Key-Value pair based
Wide-column / Column-family based
Outline Basics Advantages and Challenges Some Concepts Implementations

Taxonomy of NoSQL

Document: Clusterpoint, Couchbase, MarkLogic, MongoDB,


etc.
Graph: Allegro, Neo4j, OrientDB, Virtuoso, etc.
Key-value: Dynamo, MemcacheDB, Project Voldemort,
Redis, Riak, etc.
Column: Accumulo, Cassandra, HBase, etc.
Outline Basics Advantages and Challenges Some Concepts Implementations

NoSQL – Advantages

Elastic scaling – database can be distributed across multiple


hosts as load increases (scale out)
Handling big data – massive volume of data and transactions
can be managed
No DBA – the mechanism of automatic repair, data
distribution, and simpler data models lead to lower
administration and tuning requirements
Low cost – clusters of cheap commodity servers are typically
used to manage the exploding data and transaction volumes
Flexible data models – schemaless relaxed (or even
nonexistent) data model restrictions are used
Caching layer – employs a useful cache layer
Real time analysis and streaming model – distributed
processing makes it faster
Outline Basics Advantages and Challenges Some Concepts Implementations

NoSQL – Challenges

No normalization, referential integrity, ACID – basic criteria of


relational databases are not satisfied
Poor data security – distributed data becomes less secure and
might get lost easily
Immaturity – most NoSQL alternatives are in pre-production
versions with many key features yet to be implemented
Poor support – NoSQL systems are mostly open source
projects without the global reach, support resources, or
credibility
Weakness in analytics and business intelligence – commonly
used BI tools do not provide connectivity to NoSQL
Targeting no administration – the goal of providing a
zero-admin solution still has not been fulfilled
Limited expertise – almost every NoSQL developer is in a
learning mode
Outline Basics Advantages and Challenges Some Concepts Implementations

JSON format

JSON (stands for JavaScript Object Notation) is a binary and


typed data model that supports the data types list, map, date,
Boolean, and numbers (with different precisions). It can be
serialized to and deserialized from bytes as well as strings.
Outline Basics Advantages and Challenges Some Concepts Implementations

Versioning of datasets

To process “concurrent” modifications and versions of datasets in


a distributed environment, different schemes can be employed.
Some of these are listed below.

Timestamps: It helps to develop a chronological order but rely on


synchronized clocks and don’t capture causality.
Optimistic Locking: It implies that a unique counter or clock value is
saved for each piece of data. When a client tries to update a dataset it
has to provide the counter/clock-value of the revision it likes to update.
It does not work well in a distributed and dynamic scenario where servers
show up and go away often and without prior notice.
Multiversion Storage: It means storing a timestamp for each table cell,
thus providing optimistic concurrency control with compare-and-swap on
timestamps.
Vector Clocks: It is an alternative approach to capture order and allow
reasoning between updates in a distributed system.
Outline Basics Advantages and Challenges Some Concepts Implementations

Vector Clocks
A vector clock is defined as a tuple Vi0 , Vi1 , . . . , Vin of clock values
(may be real timestamps, version/revision numbers or some other
ordinal values) collected from each node i = 1, . . . , n, which
represent the state of itself and the other (replica) nodes’ state.

An overview of a vector clock managing six nodes


Outline Basics Advantages and Challenges Some Concepts Implementations

Vector Clocks

Vector clocks can be utilized “to resolve consistency between


writes on multiple replicas”.
Vector clocks are updated in a way defined by the following rules:
1 If an internal operation happens at node i, this node will increment its
clock Vii (immediate execution).
2 If node i sends a message to node k, it first advances its own clock value
Vii and attaches the vector clock Vi (about his internal state and his view
of the other nodes) to the message sent to node k.
3 If node i receives a message from node j, it first advances its vector clock
Vii and then merges its own vector clock with the vector clock Vmessage
attached to the message from node j so that Vi = max(Vi , Vmessage ). To
compare two vector clocks Vi and Vj in order to derive a partial ordering,
the following rule is applied Vi > Vj , if ∀kVik > Vjk , If neither Vi > Vj
nor Vi < Vj applies, a conflict caused by concurrent updates has occurred
and needs to be resolved (e.g., by a client application).
Outline Basics Advantages and Challenges Some Concepts Implementations

Consistent hashing

Which object is mapped to which node is determined by moving


clockwise around the ring. Therefore, if N3 leaves and N5 enters
the system, the object 45 will get mapped to node N5.
Outline Basics Advantages and Challenges Some Concepts Implementations

Oracle NoSQL Database

Oracle NoSQL database seamlessly integrates into an


enterprise application architecture

Oracle-provided adapters allow the Oracle NoSQL database to


integrate with a Hadoop MapReduce framework or with the Oracle
database in-database MapReduce, data mining, R-based analytics,
or whatever business needs demand.
Outline Basics Advantages and Challenges Some Concepts Implementations

Final comments

None of the SQL and NoSQL supersedes each other

NoSQL is promising but it still has limitations

You might also like