Course code : CSE3009
Course title : No SQL Data Bases
Module :1
Topic :1
Introduction to NoSQL Concepts
Dr. Karthika Natarajan 5/23/2022 1
Objectives
This session will give the knowledge about
• Database revolutions
• First generation Database
• Second generation Database
• Third generation Database
• What is NoSQL?
• Comparison between SQL and NoSQL?
Dr. Karthika Natarajan 5/23/2022 2
History of Database
• Databases are a foundational element of the modern world. We interact
with them even without knowing it — any time we buy something online, or
log in to a service, or access our bank accounts, and so on
• The concept of a database existed long before computers. In these times,
data was stored in journals, in libraries, and in hundreds of filing cabinets.
Everything was recorded via paper — and that meant it took up space, was
hard to find, and difficult to back up.
• Back then computers became available, and with them, the opportunity for
better data management.
Dr. Karthika Natarajan 5/23/2022 3
What is Database?
A database is a collection of data, typically describing the activities of one or more
related entities and attributes.
A database is a collection of information that is organized so that it can be easily
accessed, managed and updated.
A database management system, or DBMS, is software designed to assist in
maintaining and utilizing large collections of data, and the need for such systems,
as well as their use, is growing rapidly.
Dr. Karthika Natarajan 5/23/2022 4
Evolution of Database
Dr. Karthika Natarajan 5/23/2022 5
First Database Revolution
• The emergence of electronic computers following the Second World War
represented the first revolution in databases.
• Early “databases” used paper tape initially and eventually magnetic tape to
store data sequentially.
• 1955: spinning magnetic disk - Data can be modified or can be deleted easily
in the magnetic disk memory. It also allows random access of data i.e.,
individual records.
• 1961: ISAM (Index Sequential Access Method) made fast record-oriented access
feasible and consequently leads to OLTP (On-line Transaction Processing)
computer systems.
Dr. Karthika Natarajan 5/23/2022 6
ISAM
• ISAM is an advanced sequential file organization method. Using the primary key, the records are
sorted.
• For each primary key, an index value is generated and mapped with the record. This index is
nothing but the address of record in the file.
• If any record must be retrieved based on its index value, then the address of the data block is
fetched, and the record is retrieved from the memory.
Dr. Karthika Natarajan 5/23/2022 7
ISAM-Pros and Cons
Pros of ISAM:
•In this method, each record has the address of its data block, searching a record in a
huge database is quick and easy.
•This method supports range retrieval and partial retrieval of records. Since the index is
based on the primary key values, we can retrieve the data for the given range of value.
In the same way, the partial value can also be easily searched, i.e., the student name
starting with 'JA' can be easily searched.
Cons of ISAM
•This method requires extra space in the disk to store the index value.
•When the new records are inserted, then these files must be reconstructed to
maintain the sequence.
•When the record is deleted, then the space used by it needs to be released. Otherwise,
the performance of the database will slow down.
Dr. Karthika Natarajan 5/23/2022 8
First Database Revolution
By the early 1970s, two major models of DBMS were competing for
dominance.
• The network model was formalized by the CODASYL (Conference
on Data Systems Languages (CODASYL)) standard and implemented
databases such as IDMS (Integrated Database Management
System).
• The hierarchical model provided a somewhat simpler approach found
in IBM’s IMS (Information Management System).
Dr. Karthika Natarajan 5/23/2022 9
A hierarchical database model is a data model in which the data are organized into a tree-like structure. The
data are stored as records which are connected to one another through links.
In order to retrieve data from a hierarchical database, the whole tree needs to be traversed starting from the
root node.
Dr. Karthika Natarajan 5/23/2022 10
Hierarchical model for electronics gadgets
Dr. Karthika Natarajan 5/23/2022 11
Network Model
• It allows a record to have more than
one parent and child record.
• This model is capable of handling
multiple types of relationships which
can help in modeling real-life
applications, for example, 1: 1, 1: M,
M: N relationships.
Dr. Karthika Natarajan 5/23/2022 12
Hierarchical vs Network model
Second Database Revolution
In the late 1960s, Codd who is working at an IBM laboratory, found the
following drawbacks in First generation DBMS:
• Existing databases were too hard to use.
• Existing databases lacked a theoretical foundation.
• Existing databases mixed logical and physical implementations.
To overcome all these, he published a core ideas that defined the relational
database model that became the most significant model for database
systems for a generation.
Dr. Karthika Natarajan 5/23/2022 15
New in Second generation
Key concepts of the relational model includes
1.Attribute: Each column in a Table. Attributes are the properties which define a
relation. e.g., Student_Rollno, NAME, etc.
2.Tables – In the Relational model, the relations are saved in the table format. It is
stored along with its entities. A table has two properties rows and columns. Rows
represent records and columns represent attributes.
3.Tuple – It is nothing but a single row of a table, which contains a single record.
4.Degree: The total number of attributes which in the relation is called the degree of
the relation.
5.Cardinality: Total number of rows present in the Table.
6.Column: The column represents the set of values for a specific attribute.
7.Relation instance – Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
8.Relation key - Every row has one, two or multiple attributes, which is called relation
key.
Dr. Karthika Natarajan 5/23/2022 16
Key concepts in relational model
Dr. Karthika Natarajan 5/23/2022 17
Relational Model: Advantages & Disadvantages
•Simplicity: A Relational data model in DBMS is simpler than the hierarchical and network
model.
•Structural Independence: The relational database is only concerned with data and not with
a structure. This can improve the performance of the model.
•Easy to use: The Relational model in DBMS is easy as tables consisting of rows and columns
are quite natural and simple to understand
•Query capability: It makes possible for a high-level query language like SQL to avoid
complex database navigation.
•Scalable: Regarding a number of records, or rows, and the number of fields, a database
should be enlarged to enhance its usability.
•Disadvantages:
•Few relational databases have limits on field lengths which can't be exceeded.
Dr. Karthika Natarajan 5/23/2022 18
Benefits in Second generation
Database normalization is a process in which we modify the
complex database into a simpler database.
Dr. Karthika Natarajan
5/23/2022 19
New in Second generation
Other important Key concepts of the relational model include:
• Constraints
• Operations
• Normal forms
•popular Relational Database management systems
•DB2 and Informix Dynamic Server - IBM
•Oracle and RDB – Oracle
•SQL Server and Access - Microsoft
Dr. Karthika Natarajan 5/23/2022 20
Transaction Models
Jim Gray defined the most widely accepted transaction model in the late 1970s. This soon
became popularized as ACID transactions
• Atomic: The transaction is indivisible - either all the statements in the transaction are
applied to the database or none are.
• Consistent: The database remains in a consistent state before and after transaction
execution.
• Isolated: While multiple transactions can be executed by one or more users
simultaneously, one transaction should not see the effects of other in-progress
transactions.
• Durable: Once a transaction is saved to the database, its changes are expected to
persist even if there is a failure of operating system or hardware.
Dr. Karthika Natarajan 5/23/2022 21
Atomicity
Dr. Karthika Natarajan 5/23/2022 22
Consistent
In case the value read by B and C is $300, which means that data is inconsistent because
when the debit operation executes, it will not be consistent.
Dr. Karthika Natarajan 5/23/2022 23
Isolation
account A is making T1 and T2 transactions to account B and C, but both are executing independently without
affecting each other. It is known as Isolation.
Dr. Karthika Natarajan 5/23/2022 24
2000s-nosql
• In 1998, the term NoSQL (not only structured query language) was coined.
• It refers to databases that use query language other than SQL to store and
retrieve data.
• NoSQL databases are useful for unstructured data.
• NoSQL allows faster processing of larger, more varied datasets.
• NoSQL databases are more flexible than the traditional relational databases.
Dr. Karthika Natarajan 5/23/2022 25
Third Database Revolution
By 2005, Google was by far the biggest website in the world.
When Google began, the relational database was already well established, but
it was inadequate to deal with the volumes and velocity of the data confronting
Google.
• In 2003, Google revealed details of the distributed file system
GFS(Google File System)
• In 2004, it revealed details of the distributed parallel processing
algorithm “MapReduce”
• In 2006, Google revealed details about its BigTable distributed structured
Database.
• In 2007, HADOOP project is developed.
Dr. Karthika Natarajan 5/23/2022 26
Drawbacks in Second Database Revolution
• Even the most expensive commercial RDBMS such as Oracle could not
provide sufficient scalability to meet the demands of large web sites.
• To overcome this major issue, distributed databases has been introduced.
• “Sharding” involves partitioning the data across multiple databases based
on a key attribute, such as the customer identifier.
• Sharding at sites like Facebook has allowed a MySQL-based system to
scale up to massive levels, but the downsides of doing this are immense.
Many relational operations and database-level ACID transactions are lost.
Dr. Karthika Natarajan 5/23/2022 27
Cloud Computing
• Between 2006 and 2008, Amazon rolled out Elastic Compute Cloud (EC2).
• EC2 made available virtual machine images hosted on Amazon’s hardware
infrastructure and accessible via the Internet.
• Amazon added other services such as storage (S3, EBS), Virtual Private Cloud
(VPC), a MapReduce service (EMR), and so on.
• The entire platform was known as Amazon Web Services (AWS) and was the first
practical implementation of an Infrastructure as a Service (IaaS) cloud.
• AWS became the inspiration for cloud computing offerings from Google, Microsoft, and
others.
Dr. Karthika Natarajan 5/23/2022 28
Document Databases
• The impedance mismatch between object-oriented and relational models, leads to
Object relational mapping systems.
• This was enabled by the programming style known as AJAX (Asynchronous JavaScript
and XML), in which JavaScript within the browser communicates directly with a backend
by transferring XML messages.
• XML was soon superseded by JavaScript Object Notation (JSON), which is a self-describing
format similar to XML but is more compact and tightly integrated into the JavaScript
language.
• The databases which supports JSON may directly create, access the database and
eliminates the role of relational middleman. Later these became as “Document Databases”.
• CouchBase and MongoDB are two popular JSON-oriented databases.
Dr. Karthika Natarajan 5/23/2022 29
NewSQL
In 2007, Michael Stonebraker and his team proposed a number of variants on the
existing RDBMS design.
• H-Store described a pure in-memory distributed database
• C-Store specified a design for a columnar database.
Both these designs were extremely influential in the years to come and are the first
examples of what came to be known as NewSQL database systems
NewSQL databases that retain key characteristics of the RDBMS but that diverge from
the common architecture exhibited by traditional systems such as Oracle and SQL
Server.
Dr. Karthika Natarajan 5/23/2022 30
The Nonrelational Explosion
At the conclusion, dozens of new database systems like such as MongoDB,
Cassandra, and HBase emerged due to the drawbacks of relational databases.
These new breeds of database systems lacked a common name “Distributed
Non-Relational Database Management System” (DNRDBMS).
However, in late 2009, the term NoSQL quickly caught on as shorthand for any
database system that broke with the traditional SQL database.
Dr. Karthika Natarajan 5/23/2022 31
The Database technologies
Dr. Karthika Natarajan 5/23/2022 32
What is NoSQL?
• NoSQL database, also called Not Only SQL, is an approach to data
management and database design that's useful for very large sets of
distributed data.
• NoSQL is not a relational database.
• A relational database model may not be the best solution for all situations.
• The easiest way to understand NoSQL, is that of a database which does
not adhering to the traditional relational database management system
(RDMS) structure.
Dr. Karthika Natarajan 5/23/2022 33
What is NoSQL?
• The most popular NoSQL database is Apache Cassandra.
• Cassandra, which was once Facebook’s proprietary database, was
released as open source in 2008.
• Other NoSQL implementations include SimpleDB, Google BigTable,
Apache Hadoop, MapReduce, MemcacheDB, and Voldemort.
• Companies that use NoSQL include NetFlix, LinkedIn and Twitter.
Dr. Karthika Natarajan 5/23/2022 34
Why we should use NoSQL?
There are several reasons why people consider using a NoSQL database.
• Application development productivity.
• Large data.
• Analytics.
• Scalability.
• Massive write performance.
• Fast key-value access.
• Flexible data model and flexible datatypes.
• Schema migration.
• Write availability.
• Easier maintainability, administration and operations.
• Generally available parallel computing.
• Programmer ease of use.
• Distributed systems and cloud computing support.
Dr. Karthika Natarajan 5/23/2022 35
SQL vs NoSQL
SQL NoSQL
Relational Databases (RDBMS) Non-relational or distributed database
Document based, key-value pairs, graph
Table based databases
databases or wide-column stores
Have dynamic schema for unstructured
Have predefined schema
data
Vertically scalable Horizontally scalable
Scalability is managed by increasing the Scalability is managed by adding few more
CPU, RAM, SSD, etc servers easily in your NoSQL database
Uses UnQL (Unstructured Query
Uses SQL (structured query language) Language). The syntax of using UnQL
varies from database to database
Dr. Karthika Natarajan 5/23/2022 36
SQL vs NoSQL
SQL NoSQL
MySql, Oracle, Sqlite, Postgres and MS- MongoDB, BigTable, Redis, RavenDb,
SQL Cassandra, Hbase, Neo4j and CouchDB
Not good fit for complex queries (NoSQL
Good fit for the complex query
don’t have standard interfaces)
Not best fit for hierarchical data storage Fits better for the hierarchical data storage
Best fit for heavy duty transactional type
Not fit for heavy transactional applications
applications
Dr. Karthika Natarajan 5/23/2022 37
SQL vs NoSQL
SQL NoSQL
Excellent support are available for all SQL
Still have to rely on community support
database
Emphasizes on ACID properties Follows the Brewers CAP theorem
(Atomicity, Consistency, Isolation (Consistency, Availability and
and Durability) Partition tolerance )
Classified on the basis of way of storing
data as graph databases, key-value store
Classified as either open-source or close-
databases, document store databases,
sourced column store database and XML
databases.
Dr. Karthika Natarajan 5/23/2022 38
Summary
This session will give the knowledge about
• Database revolutions
• First generation Database
• Second generation Database
• Third generation Database
• What is NoSQL?
• Comparison between SQL and NoSQL?
Dr. Karthika Natarajan 5/23/2022 39