Unit 5_230601_174540-1
Unit 5_230601_174540-1
Unit 5_230601_174540-1
Schema Less − NoSQL databases being schema-less do not define any strict data structure.
Dynamic and Agile − NoSQL databases have good tendency to grow dynamically with
changing requirements. It can handle structured, semi-structured and unstructured data.
Scales Horizontally − In contrast to SQL databases which scale vertically, NoSQL scales
horizontally by adding more servers and using concepts of sharding and replication. This
behavior of NoSQL fits with the cloud computing services such as Amazon Web Services
(AWS) which allows you to handle virtual servers which can be expanded horizontally on
demand.
Better Performance − All the NoSQL databases claim to deliver better and faster performance
as compared to traditional RDBMS implementations.
Key-Values Stores
• A key-value store is a nonrelational database. The simplest form of a NoSQL database
is a key-value store.
• Every data element in the database is stored in key-value pairs.
• The data can be retrieved by using a unique key allotted to each element in the
database.
• The values can be simple data types like strings and numbers or complex objects.
• A key-value store is like a relational database with only two columns which is the key
and the value.
Example:
Key-value pair storage databases store data as a hash table where each key is unique, and the
value can be a JSON, BLOB(Binary Large Objects), string, etc.
Graph-Based databases
A graph database (GDB) is a database that uses graph structures for storing data. It uses
nodes, edges, and properties instead of tables or documents to represent and store data. The
edges represent relationships between the nodes. This helps in retrieving data more easily
and, in many cases, with one operation. Graph databases are commonly referred to as a
NoSQL.
Representation:
The graph database is based on graph theory. The data is stored in the nodes of the graph
and the relationship between the data are represented by the edges between the nodes.
Documents:
A document is a record in a document database. A document typically stores information about
one object and any of its related metadata.
Documents store data in field-value pairs. The values can be a variety of types and structures,
including strings, numbers, dates, arrays, or objects. Documents can be stored in formats like
JSON, BSON, and XML.
Below is a JSON document that stores information about a user named Tom.
{
"_id": 1,
"first_name": "Tom",
"email": "tom@example.com",
"cell": "765-555-5555",
"likes": [
"fashion",
"spas",
"shopping"
],
"businesses": [
{
"name": "Entertainment 1080",
"partner": "Jean",
"status": "Bankrupt",
"date_founded": {
"$date": "2012-05-19T04:00:00Z"
}
},
{
"name": "Swag for Tweens",
"date_founded": {
"$date": "2012-11-01T04:00:00Z"
}
}
]
}
Collections
A collection is a group of documents. Collections typically store documents that have similar
contents.
Not all documents in a collection are required to have the same fields, because document
databases have a flexible schema. Note that some document databases provide schema
validation, so the schema can optionally be locked down when needed.
Examples:
• Newspapers or magazines, for example, contain articles. To store these in a relational
database, you need to chop them up first: the article text goes in one table, the author
and all the information about the author in another, and comments on the article when
published on a website go in yet another.
• As shown below, a newspaper article can also be stored as a single entity; this lowers
the cognitive burden of working with the data for those used to seeing articles all the
time.
RUD operations
Document databases typically have an API or query language that allows developers to
execute the CRUD (create, read, update, and delete) operations.
• Create: Documents can be created in the database. Each document has a unique
identifier.
• Read: Documents can be read from the database. The API or query language allows
developers to query for documents using their unique identifiers or field values.
Indexes can be added to the database in order to increase read performance.
• Update: Existing documents can be updated — either in whole or in part.
• Delete: Documents can be deleted from the database.
Advantages:
• Schema-less: These are very good in retaining existing data at massive volumes
because there are absolutely no restrictions in the format and the structure of data
storage.
• Faster creation of document and maintenance: It is very simple to create a
document and apart from this maintenance requires is almost nothing.
• Open formats: It has a very simple build process that uses XML, JSON, and its
other forms.
• Built-in versioning: It has built-in versioning which means as the documents
grow in size there might be a chance they can grow in complexity. Versioning
decreases conflicts.
Disadvantages:
• Weak Atomicity: It lacks in supporting multi-document ACID transactions. A
change in the document data model involving two collections will require us to
run two separate queries i.e. one for each collection. This is where it breaks
atomicity requirements.
• Consistency Check Limitations: One can search the collections and
documents that are not connected to an author collection but doing this might
create a problem in the performance of database performance.
• Security: Nowadays many web applications lack security which in turn results
in the leakage of sensitive data. So it becomes a point of concern, one must pay
attention to web app vulnerabilities.
Limitations of NoSQL
1. Lack of standardization
2. Lack of ACID compliance
3. Narrow focus
4. Open-source
5. Lack of support for complex queries
6. Lack of maturity
7. Management challenge
8. GUI is not available
9. Backup
10. Large document size
CAP Theorem
The three letters in CAP refer to three desirable properties of distributed systems with
replicated data: consistency (among replicated copies), availability (of the system for read
and write operations) and partition tolerance (in the face of the nodes in the system being
partitioned by a network fault).
Consistency:
Consistency means that the nodes will have the same copies of a replicated data item
visible for various transactions. Consistency refers to every client having the same view of
the data. There are various types of consistency models. Consistency in CAP refers to
sequential consistency, a very strong form of consistency.
Availability:
Availability means that each read or write request for a data item will either be processed
successfully or will receive a message that the operation cannot be completed. Every non-
failing node returns a response for all the read and write requests in a reasonable amount of
time. The key word here is “every”. In simple terms, every node (on either side of a network
partition) must be able to respond in a reasonable amount of time.
Partition Tolerance:
Partition tolerance means that the system can continue operating even if the network
connecting the nodes has a fault that results in two or more partitions, where the nodes in
each partition can only communicate among each other. That means, the system continues to
function and upholds its consistency guarantees in spite of network partitions. Network
partitions are a fact of life. Distributed systems guaranteeing partition tolerance can
gracefully recover from partitions once the partition heals.
MongoDB
The following table shows the relationship of RDBMS terminology with MongoDB
RDBMS MongoDB
Database Database
Table Collection
Tuple/Row Document
column Field
Primary Key Primary Key (Default key _id provided by MongoDB itself)
In MongoDB, a database contains the collections of documents. One can create multiple
databases on the MongoDB server.
A Database contains a collection, and a collection contains documents and the documents
contain data, they are related to each other.
Creation Database
The use Command
• MongoDB use DATABASE_NAME is used to create database. The command will
create a new database if it doesn't exist, otherwise it will return the existing database.
• Syntax
Basic syntax of use DATABASE statement is as follows −
use DATABASE_NAME
• Example
>use mydb
switched to db mydb
Collection
Collections are just like tables in relational databases, they also store data, but in the form
of documents. A single database is allowed to store multiple collections.
Naming Restrictions for Collection:
Before creating a collection you should first learn about the naming restrictions for
collections:
• Collection name must starts with an underscore or a character.
• Collection name does not contain $, empty string, null character and does not
begin with system. prefix.
• The maximum length of the collection name is 120 bytes(including the database
name, dot separator, and the collection name).
Creating collection:
After creating database now we create a collection to store documents. The collection is
created using the following syntax:
db.collection_name.insertOne({..})
Document
In MongoDB, the data records are stored as BSON documents. Here, BSON stands for
binary representation of JSON documents, although BSON contains more data types as
compared to JSON. The document is created using field-value pairs or key-value pairs and
the value of the field can be of any BSON type.
Syntax:
{
field1: value1
field2: value2
....
fieldN: valueN
}
Drop Database
db.dropDatabase() command is used to drop a existing database.
Syntax
Basic syntax of dropDatabase() command is as follows −
• db.dropDatabase()
Insert Document:
Example
Ø db.users.insert({
... _id : ObjectId("507f191e810c19729de860ea"),
... title: "MongoDB Overview",
... description: "MongoDB is no sql database",
... by: "tutorials point",
... url: "http://www.tutorialspoint.com",
... tags: ['mongodb', 'database', 'NoSQL’],
... likes: 100 ... })
WriteResult({ "nInserted" : 1 }) >