Unit 5_230601_174540-1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

NoSQL Databases

• NoSQL is a type of database management system (DBMS) that is designed to handle


and store large volumes of unstructured and semi-structured data.
• Unlike traditional relational databases that use tables with pre-defined schemas to
store data, NoSQL databases use flexible data models that can adapt to changes in
data structures and are capable of scaling horizontally to handle growing amounts of
data.
• The term NoSQL originally referred to “non-SQL” or “non-relational” databases, but
the term has since evolved to mean “not only SQL,”

Benefits of NoSQL over RDBMS:

Schema Less − NoSQL databases being schema-less do not define any strict data structure.
Dynamic and Agile − NoSQL databases have good tendency to grow dynamically with
changing requirements. It can handle structured, semi-structured and unstructured data.
Scales Horizontally − In contrast to SQL databases which scale vertically, NoSQL scales
horizontally by adding more servers and using concepts of sharding and replication. This
behavior of NoSQL fits with the cloud computing services such as Amazon Web Services
(AWS) which allows you to handle virtual servers which can be expanded horizontally on
demand.
Better Performance − All the NoSQL databases claim to deliver better and faster performance
as compared to traditional RDBMS implementations.

Popular relational databases and RDBMSs


The following list describes popular SQL and RDBMS databases:
• Oracle: An object-relational database management system (DBMS) written in the
C++ language.
• IBM DB2: A family of database server products from IBM®.
• SAP ASE: A business relational database server product for primarily Unix®
operating systems.
• Microsoft SQL Server: An RDBMS for enterprise-level databases that supports both
SQL and NoSQL architectures.
• Maria DB: An enhanced, drop-in version of MySQL®.
• PostgreSQL: An enterprise-level, object-relational DBMS that uses procedural
languages, such as Perl and Python, in addition to SQL-level code.

Popular NoSQL databases

• MongoDB: The most popular open-source NoSQL system. MongoDB is a document-


oriented database that stores JSON-like documents in dynamic schemas.
• Apache CouchDB: An open-source, web-oriented database developed by Apache®.
CouchDB uses the JSON data exchange format to store its documents; JavaScript for
indexing, combining, and transforming documents; and HTTP for its API.
• Apache HBase: An open-source Apache project developed as a part of Hadoop®.
HBase is a column store database written in Java with capabilities similar to those that
Google BigTable® provides.
• Oracle NoSQL Database: A proprietary database that supports JSON table and key-
value datatypes running on-premise or as a cloud service.
• Apache Cassandra DB: A distributed database that excels at handling extremely
large amounts of structured data. Cassandra DB is also highly scalable. Facebook®
created Cassandra DB.

Types of NoSQL Databases


• Key-Values Stores
• Column Oriented Databases
• Graph-Based databases
• Document Oriented Databases

Key-Values Stores
• A key-value store is a nonrelational database. The simplest form of a NoSQL database
is a key-value store.
• Every data element in the database is stored in key-value pairs.
• The data can be retrieved by using a unique key allotted to each element in the
database.
• The values can be simple data types like strings and numbers or complex objects.
• A key-value store is like a relational database with only two columns which is the key
and the value.

Example:
Key-value pair storage databases store data as a hash table where each key is unique, and the
value can be a JSON, BLOB(Binary Large Objects), string, etc.

Key features of the key-value store:


• Simplicity.
• Scalability.
• Speed.

Column Oriented Databases


• A column-oriented database is a non-relational database that stores the data in columns
instead of rows.
• The column-oriented storage allows data to be stored effectively. It avoids consuming
space when storing nulls by simply not storing a column when a value doesn’t exist for
that column.
• Each unit of data can be thought of as a set of key/value pairs, where the unit itself is
identified with the help of a primary identifier, often referred to as the primary key. .
• The columns are arranged by column family.
• They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN,
etc. as the data is readily available in a column.

Graph-Based databases
A graph database (GDB) is a database that uses graph structures for storing data. It uses
nodes, edges, and properties instead of tables or documents to represent and store data. The
edges represent relationships between the nodes. This helps in retrieving data more easily
and, in many cases, with one operation. Graph databases are commonly referred to as a
NoSQL.
Representation:
The graph database is based on graph theory. The data is stored in the nodes of the graph
and the relationship between the data are represented by the edges between the nodes.

Need for Graph Database:


1. It solves Many-To-Many relationship problems.
2. When relationships between data elements are more important.
3. Low latency with large scale data.
Advantages: Frequent schema changes, managing volume of data, real-time query
response time, and more intelligent data activation requirements are done by graph model.
Disadvantages: Note that graph databases aren’t always the best solution for an
application. We will need to assess the needs of application before deciding the
architecture.
Limitations of Graph Databases:
• Graph Databases may not be offering better choice over the NoSQL variations.
• If application needs to scale horizontally this may introduces poor performance.
• Not very efficient when it needs to update all nodes with a given parameter.
Document Oriented Databases

Document databases are considered to be non-relational (or NoSQL) databases. Instead of


storing data in fixed rows and columns, document databases use flexible documents. Document
databases are the most popular alternative to tabular, relational databases.

Documents:
A document is a record in a document database. A document typically stores information about
one object and any of its related metadata.
Documents store data in field-value pairs. The values can be a variety of types and structures,
including strings, numbers, dates, arrays, or objects. Documents can be stored in formats like
JSON, BSON, and XML.
Below is a JSON document that stores information about a user named Tom.

{
"_id": 1,
"first_name": "Tom",
"email": "tom@example.com",
"cell": "765-555-5555",
"likes": [
"fashion",
"spas",
"shopping"
],
"businesses": [
{
"name": "Entertainment 1080",
"partner": "Jean",
"status": "Bankrupt",
"date_founded": {
"$date": "2012-05-19T04:00:00Z"
}
},
{
"name": "Swag for Tweens",
"date_founded": {
"$date": "2012-11-01T04:00:00Z"
}
}
]
}

Collections
A collection is a group of documents. Collections typically store documents that have similar
contents.
Not all documents in a collection are required to have the same fields, because document
databases have a flexible schema. Note that some document databases provide schema
validation, so the schema can optionally be locked down when needed.
Examples:
• Newspapers or magazines, for example, contain articles. To store these in a relational
database, you need to chop them up first: the article text goes in one table, the author
and all the information about the author in another, and comments on the article when
published on a website go in yet another.
• As shown below, a newspaper article can also be stored as a single entity; this lowers
the cognitive burden of working with the data for those used to seeing articles all the
time.
RUD operations
Document databases typically have an API or query language that allows developers to
execute the CRUD (create, read, update, and delete) operations.
• Create: Documents can be created in the database. Each document has a unique
identifier.
• Read: Documents can be read from the database. The API or query language allows
developers to query for documents using their unique identifiers or field values.
Indexes can be added to the database in order to increase read performance.
• Update: Existing documents can be updated — either in whole or in part.
• Delete: Documents can be deleted from the database.

Advantages:
• Schema-less: These are very good in retaining existing data at massive volumes
because there are absolutely no restrictions in the format and the structure of data
storage.
• Faster creation of document and maintenance: It is very simple to create a
document and apart from this maintenance requires is almost nothing.
• Open formats: It has a very simple build process that uses XML, JSON, and its
other forms.
• Built-in versioning: It has built-in versioning which means as the documents
grow in size there might be a chance they can grow in complexity. Versioning
decreases conflicts.
Disadvantages:
• Weak Atomicity: It lacks in supporting multi-document ACID transactions. A
change in the document data model involving two collections will require us to
run two separate queries i.e. one for each collection. This is where it breaks
atomicity requirements.
• Consistency Check Limitations: One can search the collections and
documents that are not connected to an author collection but doing this might
create a problem in the performance of database performance.
• Security: Nowadays many web applications lack security which in turn results
in the leakage of sensitive data. So it becomes a point of concern, one must pay
attention to web app vulnerabilities.
Limitations of NoSQL

1. Lack of standardization
2. Lack of ACID compliance
3. Narrow focus
4. Open-source
5. Lack of support for complex queries
6. Lack of maturity
7. Management challenge
8. GUI is not available
9. Backup
10. Large document size

CAP Theorem
The three letters in CAP refer to three desirable properties of distributed systems with
replicated data: consistency (among replicated copies), availability (of the system for read
and write operations) and partition tolerance (in the face of the nodes in the system being
partitioned by a network fault).

Consistency:
Consistency means that the nodes will have the same copies of a replicated data item
visible for various transactions. Consistency refers to every client having the same view of
the data. There are various types of consistency models. Consistency in CAP refers to
sequential consistency, a very strong form of consistency.

Availability:
Availability means that each read or write request for a data item will either be processed
successfully or will receive a message that the operation cannot be completed. Every non-
failing node returns a response for all the read and write requests in a reasonable amount of
time. The key word here is “every”. In simple terms, every node (on either side of a network
partition) must be able to respond in a reasonable amount of time.
Partition Tolerance:
Partition tolerance means that the system can continue operating even if the network
connecting the nodes has a fault that results in two or more partitions, where the nodes in
each partition can only communicate among each other. That means, the system continues to
function and upholds its consistency guarantees in spite of network partitions. Network
partitions are a fact of life. Distributed systems guaranteeing partition tolerance can
gracefully recover from partitions once the partition heals.

MongoDB

MongoDB is an open-source document database and leading NoSQL database. MongoDB is


written in C++.

The following table shows the relationship of RDBMS terminology with MongoDB

RDBMS MongoDB

Database Database

Table Collection

Tuple/Row Document

column Field

Table Join Embedded Documents

Primary Key Primary Key (Default key _id provided by MongoDB itself)

MongoDB - Document Modelling

In MongoDB, a database contains the collections of documents. One can create multiple
databases on the MongoDB server.

A Database contains a collection, and a collection contains documents and the documents
contain data, they are related to each other.
Creation Database
The use Command
• MongoDB use DATABASE_NAME is used to create database. The command will
create a new database if it doesn't exist, otherwise it will return the existing database.
• Syntax
Basic syntax of use DATABASE statement is as follows −
use DATABASE_NAME
• Example
>use mydb
switched to db mydb

Collection
Collections are just like tables in relational databases, they also store data, but in the form
of documents. A single database is allowed to store multiple collections.
Naming Restrictions for Collection:
Before creating a collection you should first learn about the naming restrictions for
collections:
• Collection name must starts with an underscore or a character.
• Collection name does not contain $, empty string, null character and does not
begin with system. prefix.
• The maximum length of the collection name is 120 bytes(including the database
name, dot separator, and the collection name).
Creating collection:
After creating database now we create a collection to store documents. The collection is
created using the following syntax:
db.collection_name.insertOne({..})

Document
In MongoDB, the data records are stored as BSON documents. Here, BSON stands for
binary representation of JSON documents, although BSON contains more data types as
compared to JSON. The document is created using field-value pairs or key-value pairs and
the value of the field can be of any BSON type.
Syntax:
{
field1: value1
field2: value2
....
fieldN: valueN
}

Naming restriction of fields:


Before moving further first you should learn about the naming restrictions for fields:
• The field names are of strings.
• The _id field name is reserved to use as a primary key. And the value of this
field must be unique, immutable, and can be of any type other than an array.
• The field name cannot contain null characters.
• The top-level field names should not start with a dollar sign ($).
Document Size: The maximum size of the BSON document is 16MB. It ensures that the
single document does not use too much amount of RAM or bandwidth(during
transmission). If a document contains more data than the specified size, then MongoDB
provides a GridFS API to store such type of documents.

Drop Database
db.dropDatabase() command is used to drop a existing database.
Syntax
Basic syntax of dropDatabase() command is as follows −
• db.dropDatabase()

Insert Document:

To insert data into MongoDB collection, you need to use


MongoDB's insert() or save() method.
Syntax
• The basic syntax of insert() command is as follows −
>db.COLLECTION_NAME.insert(document)

Example
Ø db.users.insert({
... _id : ObjectId("507f191e810c19729de860ea"),
... title: "MongoDB Overview",
... description: "MongoDB is no sql database",
... by: "tutorials point",
... url: "http://www.tutorialspoint.com",
... tags: ['mongodb', 'database', 'NoSQL’],
... likes: 100 ... })
WriteResult({ "nInserted" : 1 }) >

You might also like