0% found this document useful (0 votes)

2 views8 pages

NOSQL Databases

Uploaded by

Kiran M k

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views8 pages

NOSQL Databases

Uploaded by

Kiran M k

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

NOSQL Databases: Introduction to NOSQL Systems, CAP Theorem, Document-Based NOSQL

Systems and MongoDB, NOSQL Key-Value Stores, Column-Based or Wide Column NOSQL
Systems, NOSQL Graph Databases and Neo4j

Need for NoSQL Databases:

Traditional relational databases (RDBMS) are powerful but not always ideal for today’s data
needs. The need for NoSQL arises due to the following reasons:
1. Handling Big Data
• NoSQL databases are designed to handle massive volumes of structured, semi-
structured, and unstructured data.
• RDBMS struggles with scalability and flexibility in such cases.
2. Horizontal Scalability
• NoSQL systems support horizontal scaling by distributing data across multiple
servers.
• RDBMS typically supports vertical scaling (adding more power to a single machine),
which is costlier and limited.
3. Schema Flexibility
• NoSQL databases are schema-less or schema-flexible, allowing developers to evolve
the structure over time.
• Useful in agile development and when dealing with dynamic or unknown data models.
4. High Throughput and Low Latency
• Optimized for fast read/write operations, even with high traffic and massive
workloads.
• Often used in real-time web applications, analytics, and IoT.
5. Cloud-native and Distributed Systems
• Built with distributed architecture in mind.
• Suitable for cloud computing, where distributed storage and compute are common.
6. Variety of Data Models
Supports diverse models like document, key-value, column, and graph to match
various use cases.

Introduction to NoSQL Systems

NoSQL, or "Not Only SQL," is a database management system (DBMS) designed to handle
large volumes of unstructured and semi-structured data. Unlike traditional relational databases
that use tables and pre-defined schemas, NoSQL databases provide flexible data models and
support horizontal scalability, making them ideal for modern applications that require real-time
data processing.
Features of NoSQL Databases:
Unlike relational databases, which uses Structured Query Language, NoSQL databases don't
have a universal query language. Instead, each type of NoSQL database typically has its unique
query language. Traditional relational databases follow ACID (Atomicity, Consistency,
Isolation, Durability) principles, ensuring strong consistency and structured relationships
between data.

However, as applications evolved to handle big data, real-time analytics, and distributed
environments, NoSQL emerged as a solution with:
• Schema-less: Flexible data models (no fixed schema).
• Horizontal Scalability: Easy to scale across multiple servers.
• High Performance: Optimized for high-speed reads/writes.
• Distributed Architecture: Built to support large-scale, distributed systems.

Types of NoSQL Databases:

1. Document-Based
2. Key-Value Stores
3. Column-Based (Wide Column Stores)
4. Graph Databases

1.Document-oriented databases:
• A document-oriented database stores data in documents similar to JSON (JavaScript
Object Notation) objects or BSON format.
• Each document contains pairs of fields and values.
• The values can typically be a variety of types, including things like strings, numbers,
booleans, arrays, or even other objects.
• A document database offers a flexible data model, much suited for semi-structured and
typically unstructured data sets.
• They also support nested structures, making it easy to represent complex relationships
or hierarchical data.
• Each document is a self-contained unit with nested structures.
Advantages:
• Schema flexibility
• Good for semi-structured data
• Easy to map to programming language objects

Examples for Document-oriented databases:

MongoDB, CouchDB

2. Key-value stores
A key-value store is a simpler type of database where each item contains keys and values. Each
key is unique and associated with a single value.
• Data is stored as key-value pairs, making retrieval extremely fast.
• they are used for caching and session management and provide high performance in
reads and writes because they tend to store things in memory.
• Examples: Redis, Memcached, Amazon DynamoDB
EXAMPLE:
Key: user:12345
Value: {"name": "foo bar", "email": "foo@bar.com", "designation": "software developer"}

3. Column-Based (Wide Column) NoSQL Systems

• Wide-column stores store data in tables, rows, and dynamic columns.
• The data is stored in tables.
• However, unlike traditional SQL databases, wide-column stores are flexible, where different
rows can have different sets of columns.
• These databases can employ column compression techniques to reduce the storage space and
enhance performance.
• The wide rows and columns enable efficient retrieval of sparse and wide data.
• Great for time-series data, IoT applications, and big data analytics.
Some examples of wide-column stores are:
Google BigTable:
• Proprietary system used in Gmail and other Google services.
• Uses Google File System (GFS) for distributed storage.
Apache HBase (open-source):
• Inspired by BigTable.
• Uses Hadoop Distributed File System (HDFS) or Amazon S3 for storage.
Cassandra:
• Shares characteristics of both column-store and key-value systems.

Key Characteristics:
• Keys in column-based systems are multi-dimensional:
Typically include: Table name, Row key, Column (family + qualifier), and Timestamp.
• Columns are grouped into column families, and each family contains column qualifiers.
• Data is stored in rows, but only values for defined columns are stored (supports sparsity).
• Each cell can hold multiple versions of data (tracked using timestamps).

EXAMPLE:
UserID Name Email
101 John Doe john@example.com
102 Jane jane@example.com

4.NoSQL Graph Databases and Neo4j

• Data is stored as nodes and edges, enabling complex relationship management.
• Nodes typically store information about people, places, and things (like nouns), while
edges store information about the relationships between the nodes.
• They work well for highly connected data, where the relationships or patterns may not be
very obvious initially.
• Useful for applications requiring relationship-based queries such as fraud detection and
social network analysis.
Examples of graph databases are Neo4J and Amazon Neptune. MongoDB also provides graph
traversal capabilities using the $graphLookup stage of the aggregation pipeline.

CRUD Operations of MongoDB:

MongoDB stores data in collections as documents using BSON (Binary JSON) format.
CRUD Operations are:
• Create Operation
• Read Operation
• Update Operation
• Delete Operation

CREATE Operation:
Used to insert new documents into a collection.
Syntax for single Document insertion:
db.collectionName.insertOne({ key1: value1, key2: value2 })
Syntax for multiple Document insertion
db.students.insertMany({ key1: value1, key2: value2 },
{ key1: value1, key2: value2 })

Example:
To creates one new document with fields name, age, and course in the student’s collection.
db.students.insertOne({
name: "Alice",
age: 20,
course: "Computer Science"
})
To Adds two documents at once into the students collection with fields name, age, and
course.
db.students.insertMany([
{ name: "Bob", age: 21, course: "Electronics" },
{ name: "Charlie", age: 22, course: "Mechanical" }])

Read Operation:
To retrieve data from a MongoDB collection using various queries.
i. Find All Documents(find())
ii. Find with Condition(find({key:value }))
iii. Find with Projection (select specific fields)
iv. Find One Document
v. Using Comparison Operators

find()
• Returns a cursor to all matching documents in the collection.
1.Example (All records):
db.students.find()
Returns a cursor to all matching documents in the collection.

2. Example (With condition):

db.students.find({ age: 21 })
fetches documents where age is exactly 21.

Projection
Selects specific fields to be returned.
db.students.find({ age: 21 }, { name: 1, _id: 0 })
Returns only the name field (not _id) for students aged 21.

Operator Meaning Example

$gt Greater than { age: { $gt: 20 } }
$lt Less than { age: { $lt: 25 } }
$eq Equal { course: { $eq: "CS" } }

findOne()
Returns the first matching document.
db.students.findOne({ name: "Alice" })
Useful when only one document is expected or needed.
Update Operation
Updates the first document that matches the filter.
db.students.updateOne(
{ name: "Alice" },
{ $set: { age: 21 } }
)
Finds the document with name: "Alice" and sets her age to 21.

updateMany()
Updates all documents matching the filter.
db.students.updateMany(
{ course: "ECE" },
{ $set: { course: "Electronics" } }
)
All students with course: "ECE" will have their course changed to "Electronics".

replaceOne()
Replaces an entire document.
db.students.replaceOne(
{ name: "Alice" },
{ name: "Alice", age: 21, course: "AI" }
)
Completely replaces the old document with a new one (fields not mentioned will be removed).

Delete Operation
To remove documents from a collection.

deleteOne()
Deletes the first matching document.
db.students.deleteOne({ name: "Alice" })
Only the first document with name: "Alice" is deleted, even if more exist.

deleteMany()
Deletes all documents that match the condition.
db.students.deleteMany({ course: "Mechanical" })
Removes all students who are in the Mechanical course.
Delete All Documents
db.students.deleteMany({})
Deletes every document in the students collection — use with caution!

MongoDB Distributed Systems Characteristics:

1. Transactions in MongoDB
2. Replication in MongoDB
3. Sharding in MongoDB
4. Replication vs Sharding
Transactions in MongoDB
• Atomicity: MongoDB supports atomic operations within a single document by default.
• Multi-document Transactions: Supported using two-phase commit protocol to ensure
atomicity and consistency across multiple documents.
• Useful in distributed environments where multiple documents/nodes are involved.
Replication in MongoDB
• Replica Set: A group of MongoDB servers maintaining identical data copies to ensure
high availability.
• Primary Node (N1): Handles all write operations and, by default, read operations.
• Secondary Nodes (N2, N3, ...): Hold replicas of the data; updated asynchronously
from the primary.
• Arbiter: Participates in elections to choose a new primary but does not store data.
• Total members in a replica set = odd number (to avoid voting conflicts).
Read Preferences:
• Default: Reads from the primary only.
• Optional: Reads can be allowed from secondaries (not guaranteed to be most recent).
Sharding in MongoDB
• Purpose: Distribute a large dataset across multiple machines for performance and
scalability.
• Sharding = Horizontal Partitioning of a collection into disjoint sets (called shards).
• Shard Key: The field used for partitioning. It must:
o Exist in every document.
o Have an index.
• Partitioning Methods:
o Range Partitioning: Shard key values split into continuous ranges.
o Hash Partitioning: Uses a hash function to distribute documents randomly
across shards.
• Query Router:
o Forwards CRUD operations to relevant shards.
o If the target shard can't be identified, the query is broadcast to all shards.
Replication vs Sharding
Aspect Replication Sharding
Purpose High availability, failover Scalability, load balancing
Multiple copies of same Partitions of data across
Data Copies
data nodes
Yes, via secondaries and
Failure Tolerance Not the main focus
elections
Based on shard key and
Write Target Always to primary
query router

Examples of Other Key-Value Stores:

• Oracle key-value store. Oracle has one of the well-known SQL relational database
systems, and Oracle also offers a system based on the key-value store concept;this
system is called the Oracle NoSQL Database.
• Redis key-value cache and store. Redis differs from the other systems discussed here
because it caches its data in main memory to further improve performance. It offers
master-slave replication and high availability, and it also offers persistence by backing
up the cache to disk.
• Apache Cassandra. Cassandra is a NOSQL system that is not easily categorizedinto
one category; it is sometimes listed in the column-based NOSQL category (see Section
24.5) or in the key-value category. If offers features from several NOSQL categories
and is used by Facebook as well as many other customers.

Neo4j Data Model:

Neo4j is a graph database in the NoSQL family. Unlike traditional relational databases (which
use tables), Neo4j represents data as a graph structure of nodes, relationships, and
properties.
Core Components:
Nodes
Relationships
Properties
Labels

1. Nodes
• Represent entities or objects (e.g., people, products, cities).
• Analogous to rows in a table in relational databases.
• Can have one or more labels that define their role or type.
• Can store properties as key-value pairs
2. Relationships
• Represent connections between nodes.
• Are directed (have a start and end node).
• Have a type and can also have properties.
• Relationships are first-class citizens in Neo4j.
3. Properties
• Key-value pairs attached to nodes and relationships.
• Can store data like strings, numbers, arrays, etc.
4. Labels
• Labels are used to group nodes into sets (e.g., :Person, :Movie).
• A node can have multiple labels.
5. Relationship Types
• Describe the semantic meaning of the connection.
• Example: :FRIENDS_WITH, :ACTED_IN, :WORKS_FOR.

Advantages of Neo4j’s Data Model:

• Intuitive: Naturally represents complex, highly connected domains.
• Flexible: Schema-optional; nodes can have different sets of properties.
• Efficient Traversals: Relationship pointers make graph traversals very fast.
• Powerful Queries: Cypher makes pattern-matching easy and expressive.
Example: Movie Database
Let's build a mini movie graph:
Entities:
• Persons: Alice, Bob
• Movies: Matrix, Inception
Relationships:
• Alice and Bob are actors.
• Alice acted in Matrix.
• Bob acted in Inception.
Cypher Code (Neo4j Query Language):
CREATE
(alice:Person {name: "Alice", born: 1985}),
(bob:Person {name: "Bob", born: 1990}),
(matrix:Movie {title: "The Matrix", released: 1999}),
(inception:Movie {title: "Inception", released: 2010}),
(alice)-[:ACTED_IN {role: "Neo"}]->(matrix),
(bob)-[:ACTED_IN {role: "Cobb"}]->(inception)

Visual Representation (Property Graph):

(:Person {name: "Alice", born: 1985})
└──[:ACTED_IN {role: "Neo"}]──> (:Movie {title: "The Matrix", released: 1999})

(:Person {name: "Bob", born: 1990})

└──[:ACTED_IN {role: "Cobb"}]──> (:Movie {title: "Inception", released: 2010})

Use Case Example

Social Networks Users, friendships, likes
Recommendation Systems Products, ratings, purchases
Fraud Detection Accounts, transactions, devices
Knowledge Graphs Concepts, definitions, relationships
Network/IT Management Devices, connections, dependencies

Neo4j Interfaces and Distributed System Characteristics:

Neo4j has other interfaces that can be used to create, retrieve, and update nodes and
relationships in a graph database. It also has two main versions: the enterprise edition, which
comes with additional capabilities, and the community edition.
■ Enterprise edition vs. community edition. Both editions support the Neo4j graph data
model and storage system, as well as the Cypher graph query language, and several other
interfaces, including a high-performance native API, language drivers for several popular
programming languages, such as Java, Python, PHP, and the REST (Representational State
Transfer) API. In addition, both editions support ACID properties. The enterprise edition
supports additional features for enhancing performance, such as caching and clustering of data
and locking.
■ Graph visualization interface.
Neo4j has a graph visualization interface, so that a subset of the nodes and edges in a database
graph can be displayed as a graph. This tool can be used to visualize query results in a graph
representation.
■ Master-slave replication:
Neo4j can be configured on a cluster of distributed system nodes (computers), where one node
is designated the master node. The data and indexes are fully replicated on each node in the
cluster. Various ways of synchronizing the data between master and slave nodes can be
configured in the distributed cluster.
■ Caching.
A main memory cache can be configured to store the graph data for improved performance.
■ Logical logs. Logs can be maintained to recover from failures.

Big Data Notes
No ratings yet
Big Data Notes
70 pages
NoSQL+Databases+and+MongoDB+-+I+ +Lecture+Notes
No ratings yet
NoSQL+Databases+and+MongoDB+-+I+ +Lecture+Notes
7 pages
Unit-V SQL
No ratings yet
Unit-V SQL
18 pages
Unit-V DBMS
No ratings yet
Unit-V DBMS
19 pages
BIG - DATA - Unit 4
No ratings yet
BIG - DATA - Unit 4
99 pages
Big Data Unit-Ii Notes
No ratings yet
Big Data Unit-Ii Notes
7 pages
Bigdata Unit 4
No ratings yet
Bigdata Unit 4
97 pages
NoSQL Lecture Notes Compilation
No ratings yet
NoSQL Lecture Notes Compilation
5 pages
06 NoSQL
No ratings yet
06 NoSQL
80 pages
NOSQL
No ratings yet
NOSQL
50 pages
FSD Notes Unit-3-1
No ratings yet
FSD Notes Unit-3-1
26 pages
No SQL
No ratings yet
No SQL
38 pages
Module 3 Bigdata Analytics
No ratings yet
Module 3 Bigdata Analytics
19 pages
Module 1 Introduction
No ratings yet
Module 1 Introduction
9 pages
41 NoSQL Introduction
No ratings yet
41 NoSQL Introduction
18 pages
No SQL Lecture Notes
No ratings yet
No SQL Lecture Notes
17 pages
Types of NoSQL Databases - GeeksforGeeks
No ratings yet
Types of NoSQL Databases - GeeksforGeeks
9 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
2383 - 1019 - DOC - NoSQL Databases
No ratings yet
2383 - 1019 - DOC - NoSQL Databases
6 pages
Lec 15 Notes
No ratings yet
Lec 15 Notes
3 pages
NoSQL.pdf
No ratings yet
NoSQL.pdf
21 pages
1842 Week6 NoSQL
No ratings yet
1842 Week6 NoSQL
51 pages
Wa0001.
No ratings yet
Wa0001.
34 pages
DB 5
No ratings yet
DB 5
39 pages
Unit 3 NoSQL
No ratings yet
Unit 3 NoSQL
98 pages
Mongo DB
No ratings yet
Mongo DB
23 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
05 NoSQL
No ratings yet
05 NoSQL
21 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Nosql
No ratings yet
Nosql
64 pages
ADBMS Original-Output
No ratings yet
ADBMS Original-Output
28 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Module 1
No ratings yet
Module 1
34 pages
Unit 3
No ratings yet
Unit 3
10 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
NoSQL Database
No ratings yet
NoSQL Database
45 pages
Chapter14 BigData&NoSQLDatabases
No ratings yet
Chapter14 BigData&NoSQLDatabases
39 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
What Is NoSQL
No ratings yet
What Is NoSQL
10 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
Unit 1 Mangodb
No ratings yet
Unit 1 Mangodb
57 pages
Complete Unit 3 Notes
No ratings yet
Complete Unit 3 Notes
30 pages
NoSQL Unit 1 & 2 QnA
No ratings yet
NoSQL Unit 1 & 2 QnA
18 pages
NoSQL Big Data Management
No ratings yet
NoSQL Big Data Management
36 pages
Nosql, Mongodb
No ratings yet
Nosql, Mongodb
18 pages
Bda Unit12
No ratings yet
Bda Unit12
9 pages
Dod Unit2
No ratings yet
Dod Unit2
22 pages
Module 5_1 _ NoSQl
No ratings yet
Module 5_1 _ NoSQl
25 pages
L48 - MongoDB
No ratings yet
L48 - MongoDB
31 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Bda Notes (Unit-2)
No ratings yet
Bda Notes (Unit-2)
26 pages
No SQL
No ratings yet
No SQL
12 pages
NoSQL Database Comprehensive Report
No ratings yet
NoSQL Database Comprehensive Report
75 pages
Unit 2
No ratings yet
Unit 2
26 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
28 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
BDA (2) Merged
No ratings yet
BDA (2) Merged
29 pages
Vaibhav No SQL 1-2
No ratings yet
Vaibhav No SQL 1-2
13 pages
BDA Module 5 - Part1 (No SQL) 2023
No ratings yet
BDA Module 5 - Part1 (No SQL) 2023
32 pages
TRUE or FALSE and MCQ - MongoDB - AY24-25
No ratings yet
TRUE or FALSE and MCQ - MongoDB - AY24-25
11 pages
Unit-III CC&BD Cs62 Ab
No ratings yet
Unit-III CC&BD Cs62 Ab
85 pages
10gen-MongoDB Operations Best Practices
No ratings yet
10gen-MongoDB Operations Best Practices
29 pages
NoSQL Module 2
No ratings yet
NoSQL Module 2
76 pages
Nosql Tricks
No ratings yet
Nosql Tricks
34 pages
NOSQL
No ratings yet
NOSQL
55 pages
Maximize Availability: With Oracle Database 18c
No ratings yet
Maximize Availability: With Oracle Database 18c
38 pages
ADT
No ratings yet
ADT
34 pages
MongoDB Performance Best Practices
No ratings yet
MongoDB Performance Best Practices
15 pages
DocumentDB Data Migration Tool v1.7
No ratings yet
DocumentDB Data Migration Tool v1.7
35 pages
Nosql Databases Unit-2
0% (1)
Nosql Databases Unit-2
15 pages
FullStackCafe QAS 1694522508328
No ratings yet
FullStackCafe QAS 1694522508328
3 pages
BGD Mod 2 QB Solns
No ratings yet
BGD Mod 2 QB Solns
11 pages
Shared Disk vs. Shared Nothing
No ratings yet
Shared Disk vs. Shared Nothing
17 pages
Unit II
No ratings yet
Unit II
31 pages
Sharding
No ratings yet
Sharding
12 pages
Aci Apic
No ratings yet
Aci Apic
19 pages
Nosql Database
No ratings yet
Nosql Database
19 pages
CS8091-BIG DATA ANALYTICS UNIT V Notes
100% (4)
CS8091-BIG DATA ANALYTICS UNIT V Notes
31 pages
Unit 3 Nosql Databases Adt
No ratings yet
Unit 3 Nosql Databases Adt
64 pages
MongoDB Sharding PDF
No ratings yet
MongoDB Sharding PDF
3 pages
Instagram Sfpug
No ratings yet
Instagram Sfpug
183 pages
Configuring and Deploying Mongodb Sharded Cluster in 30 Minutes
No ratings yet
Configuring and Deploying Mongodb Sharded Cluster in 30 Minutes
11 pages
Surveyondatamanagementsystemfor Final
No ratings yet
Surveyondatamanagementsystemfor Final
5 pages
MongoDB Data Models Guide
100% (1)
MongoDB Data Models Guide
39 pages
Content Technologies
No ratings yet
Content Technologies
54 pages
Unit 4
No ratings yet
Unit 4
7 pages
Mongodb Notes Basic To Advanced 1692833294
No ratings yet
Mongodb Notes Basic To Advanced 1692833294
10 pages
MongoDB Replication and Sharding
No ratings yet
MongoDB Replication and Sharding
3 pages

NOSQL Databases

Uploaded by

NOSQL Databases

Uploaded by

NOSQL Databases: Introduction to NOSQL Systems, CAP Theorem, Document-Based NOSQL

Need for NoSQL Databases:

Introduction to NoSQL Systems

Types of NoSQL Databases:

Examples for Document-oriented databases:

3. Column-Based (Wide Column) NoSQL Systems

4.NoSQL Graph Databases and Neo4j

CRUD Operations of MongoDB:

2. Example (With condition):

Operator Meaning Example

MongoDB Distributed Systems Characteristics:

Examples of Other Key-Value Stores:

Neo4j Data Model:

Advantages of Neo4j’s Data Model:

Visual Representation (Property Graph):

(:Person {name: "Bob", born: 1990})

Use Case Example

Neo4j Interfaces and Distributed System Characteristics:

You might also like