21 Mca 2326 Researchpaper

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

Benchmarking SQL vs NOSQL

MANMOHIT
(21MCA2326)

Student, University Institute of computing,


Chandigarh University

Abstract:

NoSQL databases are becoming increasingly popular due to the recent


emphasis on "Big Data". Databases based on these technologies are
said to perform better than SQL databases. The objective of this paper
is to independently analyze the performance of some NoSQL and
SQL databases under the influence of key-value stores. On key-value
stores based on NoSQL and SQL, we compare read, write, delete, and
instantiate operations. Furthermore, we examine one more operation
i.e, iterating through all keys. In each of the databases tested, we
developed and implemented an abstract key-value pair framework
supporting these essential operations. Our results demonstrate how the
databases measure up with each other and compare the timing of these
operations. It turns out that not every NoSQL database outperforms
the SQL databases. There are some that are much worse. The
performance varies according to each operation for each database. In
some cases, instantiation is slow, but reading, writing, and deleting
are fast.
Keywords- SQL and NoSQL databases, database performance

I. Introduction
For storing data, the standard database systems are based on
on the relational model. Databases of this type are called SQL
databases after the language they were accessed, (i) used to query
them. Non-relational databases, however, have experienced an
explosion in popularity in recent years. The NoSQL databases
differentiate themselves clearly from traditional SQL databases by
being known as NoSQL databases. Almost all of these rely on the
concept of storing simple key-value pairs in order to operate more
quickly.

Huge volumes of structured, semistructured, and unstructured data


being recorded and stored for a variety of purposes as the Internet
becomes more accessible and storage becomes more affordable. Big
Data [2] is a term used to describe this type of data. Speed, flexible
schemas, and distributed (i.e. noncentralized) databases are required
to process such large amounts of data. These needs are said to be met
by NoSQL databases, which have become the preferred currency for
handling Big Data. This resulted in an increase in the number of
NoSQL database options. There are various commercial and open-
source NoSQL database implementations (for example, BigTable [3]
and HBase [4]).

The enormous number of NoSQL options raises issues about the


distinctions among them and their applicability for certain
applications. As a result, a number of survey articles have been
published to address some of these problems (e.g., Tudorica and
Bucur [5] or Han et al. [6]). There are also a number of internet tools
and blogs that cover similar issues.

The purpose of this research is to compare the implementations of


key-value stores on NoSQL and SQL databases. SQL databases are
not built for efficient keyvalue storage, but NoSQL databases are.
However, our findings indicate that not all NoSQL databases
outperform SQL databases. On the key-value store, we compare read,
write, delete, and instantiation operations. Even inside NoSQL
databases, we notice that the performance of these operations varies
greatly. We also see no association between database performance
and the data model it employs.
The remainder of this work is arranged in the following manner. The
second section introduces a number of NoSQL databases. Section III
examines related research. It looks at various polls that compare
NoSQL products in particular. Our experimental setting and what we
assess are discussed in Section IV. Our experimental findings are
presented and discussed in Section V. A summary closes the last
section.

II. NoSQL Databases

Google's BigTable [3] is largely regarded as the first of the NoSQL


databases. BigTable is built on three keys: the first is the row key, the
second is the column key, and the third is the timestamp. This is a
multidimensional map in effect. Groups of column keys can be
identified. A group can only be accessed as a whole.

The success of proprietary non-relational databases likeBigTable and


Amazon's Dynamo [7] sparked the development of a slew of new
open-source and closed-source non-relational databases. Because of
their simplicity of use, speed, and scalability, NoSQL databases have
become more popular.
The majority of NoSQL databases work by storing keyvalue pairs. It's
possible that the values are a collection of secondary keys, each of
which has its own set of values.
A column family database is a form of key-value pair database. This
is made of of columns and super columns that may be accessed via
keys. A super column is a collection of linked columns that may be
accessed as one.
A document-oriented database is another sort of key-value pair
database. Document-oriented databases are capable of storing objects
in addition to plain values. The objects are serialised in some way,
such as XML, JSON, or BSON (binary encoded JSON).
In this study, we examine a number of NoSQL databases in order to
compare them to one another and to Microsoft SQL. The rest of this
section provides a quick overview of the NoSQL databases we've
chosen for comparison.
MongoDB1 is a C++-based document-oriented database [8]. The
items are serialised in BSON format. The objects don't have to have
the same structure or fields, and the shared fields don't have to have
the same type, providing for schema storage flexibility. MongoDB
enables auto-sharding, which divides data collections into partitions
and distributes them among available servers. As a consequence, the
load is dynamically balanced.
Google's BigTable [3] provides the foundation for Hypertable2, an
open-source NoSQL database. It's a C++ database with a column-
family structure. Hypertable, like MongoDB, allows auto-sharding.
Apache CouchDB is an Erlang-based document-oriented database [9].
The objects are serialised and saved as JSON. A RESTful HTTP API
is used to access the database. As a result, CouchDB is marketed as a
"fully web-centric database." 3.
Apache Cassandra4 is a column-family database that was created by
Facebook [10]. Decentralization, which reduces failures, and data
replication, which increases fault tolerance, are two of its important
aspects.
We also view RavenDB5 and Couchbase6 as document-oriented
databases [11]. These, like MongoDB and CouchDB, provide flexible
schema storage and employ JSON as the serialisation standard. Data
replication and sharding are also available.

III. RELATED WORK


While NoSQL databases offer a performance and scalability
advantage over traditional relational databases, they have a lot of
disadvantages. These difficulties are listed by Leavitt [12].
He points out that, while NoSQL databases are quick for simple
activities, they take a long time to do sophisticated processes. Aside
from that, sophisticated operations questions might be difficult to
formulate. Another disadvantage is the absence of native consistency
support. Many firms have yet to understand NoSQL, according to
Leavitt, and there is a dearth of support and management tools to
assist.
1www.mongodb.org 2www.hypertable.com 3couchdb.apache.org
4cassandra.apache.org 5ravendb.net 6www.couchbase.com

Tiwari [15] offers a thorough overview of NoSQL databases. He


discusses the implementations' history, logic, programmability,
storage design, and performance tweaking. As a foundation for
comparing implementations, he also mentions the following.
 Scalability
 Consistency
 Support for data models
 Support for queries, and
 Management tools
Indrawan-Santiago compares data models, transaction models,
support for ad-hoc searches, indexing, shrading, and licencing type
[16]. On this foundation, she examines eleven NoSQL
implementations, including Cassandra, HBase, Dynamo, MongoDB,
and CouchDB. She also analyses relational and NoSQL databases on
a qualitative level, concluding that NoSQL databases are more likely
to complement relational databases and improve an organization's
database management skills.
Hecht and Jablonski [17] present an overview of NoSQL databases
based on use cases. They recognise the challenges of selecting a
NoSQL database that is appropriate for a certain use case and devote
their work to addressing this issue. The data model, query support,
partitioning, replication, and concurrency controls serve as the
foundation for their comparison. They examine fourteen NoSQL
databases, including MongoDB, CouchDB, Cassandra, and HBase, in
this light.
A NoSQL database is compared to a SQL database by Boicea et al.
They chose Oracle for SQL and MongoDB for NoSQL
implementations [18]. They claim that when dealing with a high
number of records, insertion time is a bigger concern in Oracle, as are
update and deletion times.
For comparing key-value stores, Yahoo! Cloud Serving Benchmark is
an open-source workload generation tool [19].
Our contribution to this research is a practical approach to database
comparison. This is based on some of the most basic database
procedures we perform. In the next part, we'll go through these
procedures in detail.
Architectural aspects such as supported data models, consistency,
fault-tolerance, and so on are not compared in other reviews [15],
[16], [17].

IV. EXPERIMENTAL FRAMEWORK


Four basic procedures are involved in our database comparison:
1) Create a new instance. This creates a bucket for the keyvalue
pairs to be stored in.
2) Read aloud. This retrieves the value associated with a certain
key from the key-value pair storage. The Read operation of the
CRUD (Create, Read, Update, Delete) paradigm, which is often
used to represent database key database activities, is the same as
this.
3) Compose. If a specified key-value pair is not discovered in
the storage, it is created. Otherwise, it modifies the value in the
storage for the provided key.As a result, this operation combines
the CRUD model's Create and Update activities.
4) Remove the item. This clears the key-value pair storage of the
record (i.e. key-value pair) corresponding to a specific key. This
is the same as the CRUD model's Delete operation.
There are two supplemental operations that are typically employed in
addition to the above core operations: iterating through all keys and
iterating through all values. We time one more procedure to enable
testing of these: GetAllkeys is a function that retrieves all of the keys
from the strorage. In fact, when there are a high number of key-value
pairs in the storage, this method may only return a predetermined
amount of keys (rather than all).

abstract public class KeyValueStoreBase


{
abstract public bool Instantiate(string bucketName);
abstract public string Read(string key);
abstract public bool Write(string key, string value);
abstract public void Delete(string key);
abstract public List<string> GetAllKeys();
}
}

Fig. 1. Base operations on key-value stores


The performance of these five procedures for the selected databases is
measured in our trials. As an abstract class, Figure 1 outlines the
operations. This abstraction is one of our specialt for each database
that is being tested
It's worth noting that databases can do complex
operations. Depending on their type (for example, object or column),
they may receive assistance manipulation). In this study, we do not
put these to the test.

Database Version

MongoDB 1.8.5
RavenDB 960
CouchDB 1.2.0
Cassandra 1.1.2
TABLE I

TABLE I
VERSION DETAIL OF DATABASE IMPLEMENTATIONS
Table I shows the database implementations that were tested in this
work, along with their version numbers. We conduct the specified
operation (such as Read or Write) five times for each of these
databases and average the results.
The experiment's data set consists of auto-generated key-value pairs
of the type (kN, vN), where N is a sequence number.

V. RESULTS AND EVALUATION


The time it takes to create a database bucket is the subject of our first
investigation. See Figure 2 for a summary of the experiment's
findings.
Fig. 2. Time for instantiating database bucket (ms)
The timings have been averaged across five runs. Absolute time
values are unimportant; what matters are time values in relation to one
another.
RavenDB, Hyertable, and MongoDB appear to be the fastest in
creating database buckets. Bucket creation takes the longest with
CouchDB, Couchbase, and SQL Express.
The time it takes to read data from the bucket matching to provided
keys is the subject of our second experiment. The results are
summarised in Table II.
The number of operations in the tables refers to the number of times a
certain operation (such as read) is performed in the test. The amount
of key-value pairs (i.e. records) in the store is also equal to this.
TABLE II
TIME FOR READING (MS)
The following databases are listed in order of read performance:
Couchbase, MongoDB, SQL Express, Hypertable, CouchDB,
Cassandra, and RavenDB. Column-family databases include
Cassandra and Hypertable, whereas document-oriented databases
include Couchbase, MongoDB, CouchDB, and RavenDB.
The data model and performance have no discernible relationship. We
can also observe that SQL Express outperforms certain, but not all,
NoSQL databases in terms of read speed.
The time it takes to write keyvalue pairs to the bucket is measured in
our third experiment. This corresponds to changing the existing value
if the key-value pair already exists in the bucket. Otherwise, it's just
another key-value pair added to the bucket. The results are
summarised in Table III.

TABLE III
TIME FOR WRITING (MS)
The following databases are listed in order of write performance:
Couchbase, MongoDB, Cassandra, Hypertable, SQL Express,
RavenDB, and CouchDB. We can observe that RavenDB and
CouchDB have inferior write performance than SQL Express.
Other NoSQL databases, on the other hand, outperform SQL
express.The time it takes to remove key-value pairs from the bucket is
measured in our fourth experiment. The results are summarised in
FIG IV.

TABLE IV
TIME FOR DELETING (MS)

The following databases are listed in order of deletion performance:


Couchbase, MongoDB, SQL Express, Cassandra, Hypertable,
CouchDB, and RavenDB. We can observe that SQL Express
outperforms all NoSQL databases except Couchbase and MongoDB
in terms of delete performance.
According to Boicea et al., insertion time in Oracle is a factor faster
than in MongoDB for 100000 entries, whereas update and delete
times are many factors faster in Oracle [18].
In our tests, we didn't see any significant performance differences
between MongoDB7 and SQL Express.The time it takes to retrieve all
of the keys in the bucket is the subject of our final experiment. The
results are summarised in Table V.All databases, with the exception
of CouchDB, are quick to get the keys. SQL Express was the quickest
of the bunch. Because Couchbase doesn't have an API for retrieving
all of the keys, it was left out of the experiment.
It's worth noting that obtaining all the keys is much faster than
reading the values one by one. The database connection cost is clearly
the cause of this disparity.
7Boicea et al -We don't know if our version of MongoDB differs from theirs because Boicea
et al.. did not mention the versions of the databases they examined.

TABLE V
TIME FOR FETCHING ALL KEYS (MS)
So long as the values are small in size (i.e. comparable to the size of
the keys), getting all values takes about the same amount of time as
retrieving all keys.

VI. SUMMARY AND CONCLUSION


The implementations of key-value stores on NoSQL and SQL
databases are compared in this study. SQL databases are not
optimised for key-value storage, but NoSQL databases are. However,
we discovered that not all NoSQL databases outperform the SQL
database we analysed. Even inside NoSQL databases, we see a broad
range of performance depending on the kind of operation (such as
read and write). We also found no link between performance and the
data model used by any database.
RavenDB and CouchDB are the two NoSQL databases that perform
poorly in read, write, and delete operations. Casandra is sluggish
when it comes to read operations, but it performs admirably when it
comes to write and delete operations. In terms of read, write, and
delete operations, Couchbase and MongoDB are the quickest.
However, Couchbase does not enable retrieving all of the keys (or
values). Couchbase is a suitable choice if an application does not need
iterating over keys and values. Otherwise, MongoDB comes in second
place to Couchbase in terms of read, write, and delete operations.
It's worth noting that we didn't put the databases through their paces
with more sophisticated procedures. When it comes to complicated
procedures, the database rankings we mentioned may not hold true.
NoSQL implementations, like any other application software, go
through modifications, and as a result, performance gains and
degradations are likely to occur as a result of these changes.
As a result, it will be necessary to evaluate databases not only during
the application design phase, but also at regular intervals in order to
switch to the best database implementation. Tiered software
development procedures would aid in the isolation of the database
backend, allowing for easier replacement as necessary.

REFERENCES
[1] K. Kline, SQL in a nutshell, 3rd ed. O’Reilly Media, November 2008.
[2] P. Warden, Big Data Glossary. O’Reilly Media, September 2011.
[3] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows,
T. Chandra, A. Fikes, and R. E. Gruber, “Bigtable: a distributed
storage system for structured data,” in Proceedings of the 7th USENIX
Symposium on Operating Systems Design and Implementation - Volume
7, ser. OSDI ’06. Berkeley, CA, USA: USENIX Association, 2006,
pp. 15–15.
[4] L. George, HBase: The Definitive Guide. O’Reilly Media, August 2011
[5] B. Tudorica and C. Bucur, “A comparison between several NoSQL
databases with comments and notes,” in Roedunet International Conference
(RoEduNet), 2011 10th, june 2011, pp. 1 –5.
[6] J. Han, E. Haihong, G. Le, and J. Du, “Survey on NoSQL database,” in
Pervasive Computing and Applications (ICPCA), 2011 6th International
Conference on, oct. 2011, pp. 363 –366.
[7] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman,
A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo:
Amazon’s highly available key-value store,” SIGOPS Oper. Syst. Rev.,
vol. 41, no. 6, pp. 205–220, Oct. 2007.
[8] K. Chodorow and M. Dirolf, MongoDB: The Definitive Guide. O’Reilly
Media, September 2010.
[9] J. C. Anderson, J. Lehnardt, and N. Slater, CouchDB: The Definitive
Guide. O’Reilly Media, January 2010.
[10] E. Hewitt, Cassandra: The Definitive Guide. O’Reilly Media, November
2010.
[11] M. Brown, Getting Started with Couchbase Server. O’Reilly Media,
June 2012.
[12] N. Leavitt, “Will NoSQL databases live up to their promise?” Computer,
vol. 43, no. 2, pp. 12 –14, feb. 2010.
[13] D. Bartholomew, “SQL vs. NoSQL,” Linux Journal, no. 195, July 2010.
[14] S. Sakr, A. Liu, D. Batista, and M. Alomari, “A survey of large scale
data management approaches in cloud environments,” Communications
Surveys Tutorials, IEEE, vol. 13, no. 3, pp. 311–336, 2011.
[15] S. Tiwari, Professional NoSQL. Wiley/Wrox, August 2011.
[16] M. Indrawan-Santiago, “Database research: Are we at a crossroad?
Reflection on NoSQL,” in Network-Based Information Systems (NBiS),
2012 15th International Conference on, sept. 2012, pp. 45 –51.
[17] R. Hecht and S. Jablonski, “NoSQL evaluation: A use case oriented
survey,” in Cloud and Service Computing (CSC), 2011 International
Conference on, dec. 2011, pp. 336 –341.
[18] A. Boicea, F. Radulescu, and L. I. Agapin, “MongoDB vs Oracle –
database comparison,” in Emerging Intelligent Data and Web Technologies
(EIDWT), 2012 Third International Conference on, sept. 2012, pp.
330 –335.
[19] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears,
“Benchmarking cloud serving systems with ycsb,” in Proceedings of the
1st ACM symposium on Cloud computing, ser. SoCC ’10. ACM, 2010,
pp. 143–154.
19

You might also like