0% found this document useful (0 votes)
73 views112 pages

Unit Ii - Nosql Databases

Nosql databases
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views112 pages

Unit Ii - Nosql Databases

Nosql databases
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 112

UNIT II NOSQL DATABASES

NoSQL – CAP Theorem – Sharding - Document based – MongoDB Operation: Insert,


Update, Delete, Query, Indexing, Application, Replication, Sharding, Deployment – Using
MongoDB with PHP / JAVA – Advanced MongoDB Features – Cassandra: Data Model, Key
Space, Table Operations, CRUD Operations, CQL Types – HIVE: Data types, Database
Operations, Partitioning – HiveQL – OrientDB Graph database – OrientDB Features

What is NoSQL?
NoSQL Database is a non-relational Data Management System, that does
not require a fixed schema. It avoids joins, and is easy to scale. The major
purpose of using a NoSQL database is for distributed data stores with
humongous data storage needs. NoSQL is used for Big data and real-time
web apps. For example, companies like Twitter, Facebook and Google
collect terabytes of user data every single day.
NoSQL database stands for “Not Only SQL” or “Not SQL.” Though a better
term would be “NoREL”, NoSQL caught on. Carl Strozz introduced the
NoSQL concept in 1998.

Traditional RDBMS uses SQL syntax to store and retrieve data for further
insights. Instead, a NoSQL database system encompasses a wide range of
database technologies that can store structured, semi-structured,
unstructured and polymorphic data. Let’s understand about NoSQL with a
diagram in this NoSQL database tutorial:

In this NoSQL tutorial for beginners, you will learn NoSQL basics like:


Why NoSQL?
The concept of NoSQL databases became popular with Internet giants like
Google, Facebook, Amazon, etc. who deal with huge volumes of data. The
system response time becomes slow when you use RDBMS for massive
volumes of data.

To resolve this problem, we could “scale up” our systems by upgrading our
existing hardware. This process is expensive.

The alternative for this issue is to distribute database load on multiple hosts
whenever the load increases. This method is known as “scaling out.”

NoSQL database is non-relational, so it scales out better than relational


databases as they are designed with web applications in mind.

Brief History of NoSQL Databases


 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-
source relational database
 2000- Graph database Neo4j is launched
 2004- Google BigTable is launched
 2005- CouchDB is launched
 2007- The research paper on Amazon Dynamo is released
 2008- Facebooks open sources the Cassandra project
 2009- The term NoSQL was reintroduced

Features of NoSQL
Non-relational
 NoSQL databases never follow the relational model
 Never provide tables with flat fixed-column records
 Work with self-contained aggregates or BLOBs
 Doesn’t require object-relational mapping and data normalization
 No complex features like query languages, query planners,referential
integrity joins, ACID

Schema-free

 NoSQL databases are either schema-free or have relaxed schemas


 Do not require any sort of definition of the schema of the data
 Offers heterogeneous structures of data in the same domain

NoSQL is Schema-Free

Simple API

 Offers easy to use interfaces for storage and querying data provided
 APIs allow low-level data manipulation & selection methods
 Text-based protocols mostly used with HTTP REST with JSON
 Mostly used no standard based NoSQL query language
 Web-enabled databases running as internet-facing services

Distributed

 Multiple NoSQL databases can be executed in a distributed fashion


 Offers auto-scaling and fail-over capabilities
 Often ACID concept can be sacrificed for scalability and throughput
 Mostly no synchronous replication between distributed nodes
Asynchronous Multi-Master Replication, peer-to-peer, HDFS
Replication
 Only providing eventual consistency
 Shared Nothing Architecture. This enables less coordination and
higher distribution.

NoSQL is Shared Nothing.

Types of NoSQL Databases


NoSQL Databases are mainly categorized into four types: Key-value pair,
Column-oriented, Graph-based and Document-oriented. Every category
has its unique attributes and limitations. None of the above-specified
database is better to solve all the problems. Users should select the
database based on their product needs.
Types of NoSQL Databases:

 Key-value Pair Based


 Column-oriented Graph
 Graphs based
 Document-oriented

Key Value Pair Based


Data is stored in key/value pairs. It is designed in such a way to handle lots
of data and heavy load.
Key-value pair storage databases store data as a hash table where each
key is unique, and the value can be a JSON, BLOB(Binary Large Objects),
string, etc.

For example, a key-value pair may contain a key like “Website” associated
with a value like “Guru99”.

It is one of the most basic NoSQL database example. This kind of NoSQL
database is used as a collection, dictionaries, associative arrays, etc. Key
value stores help the developer to store schema-less data. They work best
for shopping cart contents.

Redis, Dynamo, Riak are some NoSQL examples of key-value store


DataBases. They are all based on Amazon’s Dynamo paper.

Column-based
Column-oriented databases work on columns and are based on BigTable
paper by Google. Every column is treated separately. Values of single
column databases are stored contiguously.
Column based NoSQL database

They deliver high performance on aggregation queries like SUM, COUNT,


AVG, MIN etc. as the data is readily available in a column.

Column-based NoSQL databases are widely used to manage data


warehouses, business intelligence, CRM, Library card catalogs,

HBase, Cassandra, HBase, Hypertable are NoSQL query examples of


column based database.

Document-Oriented:
Document-Oriented NoSQL DB stores and retrieves data as a key value
pair but the value part is stored as a document. The document is stored in
JSON or XML formats. The value is understood by the DB and can be
queried.

Relational Vs. Document


In this diagram on your left you can see we have rows and columns, and in
the right, we have a document database which has a similar structure to
JSON. Now for the relational database, you have to know what columns
you have and so on. However, for a document database, you have data
store like JSON object. You do not require to define which make it flexible.

The document type is mostly used for CMS systems, blogging platforms,
real-time analytics & e-commerce applications. It should not use for
complex transactions which require multiple operations or queries against
varying aggregate structures.

Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB,


are popular Document originated DBMS systems.

Graph-Based
A graph type database stores entities as well the relations amongst those
entities. The entity is stored as a node with the relationship as edges. An
edge gives a relationship between nodes. Every node and edge has a
unique identifier.

Compared to a relational database where tables are loosely connected, a


Graph database is a multi-relational in nature. Traversing relationship is
fast as they are already captured into the DB, and there is no need to
calculate them.
Graph base database mostly used for social networks, logistics, spatial
data.

Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based


databases.

Query Mechanism tools for NoSQL


The most common data retrieval mechanism is the REST-based retrieval of
a value based on its key/ID with GET resource

Document store Database offers more difficult queries as they understand


the value in a key-value pair. For example, CouchDB allows defining views
with MapReduce

What is the CAP Theorem?


CAP theorem is also called brewer’s theorem. It states that is impossible
for a distributed data store to offer more than two out of three guarantees

1. Consistency
2. Availability
3. Partition Tolerance

Consistency:

The data should remain consistent even after the execution of an


operation. This means once data is written, any future read request should
contain that data. For example, after updating the order status, all the
clients should be able to see the same data.

Availability:

The database should always be available and responsive. It should not


have any downtime.

Partition Tolerance:

Partition Tolerance means that the system should continue to function even
if the communication among the servers is not stable. For example, the
servers can be partitioned into multiple groups which may not communicate
with each other. Here, if part of the database is unavailable, other parts are
always unaffected.
Eventual Consistency
The term “eventual consistency” means to have copies of data on multiple
machines to get high availability and scalability. Thus, changes made to
any data item on one machine has to be propagated to other replicas.

Data replication may not be instantaneous as some copies will be updated


immediately while others in due course of time. These copies may be
mutually, but in due course of time, they become consistent. Hence, the
name eventual consistency.

BASE: Basically Available, Soft state, Eventual consistency

 Basically, available means DB is available all the time as per CAP


theorem
 Soft state means even without an input; the system state may
change
 Eventual consistency means that the system will become consistent
over time

Advantages of NoSQL
 Can be used as Primary or Analytic Data Source
 Big Data Capability
 No Single Point of Failure
 Easy Replication
 No Need for Separate Caching Layer
 It provides fast performance and horizontal scalability.
 Can handle structured, semi-structured, and unstructured data with
equal effect
 Object-oriented programming which is easy to use and flexible
 NoSQL databases don’t need a dedicated high-performance server
 Support Key Developer Languages and Platforms
 Simple to implement than using RDBMS
 It can serve as the primary data source for online applications.
 Handles big data which manages data velocity, variety, volume, and
complexity
 Excels at distributed database and multi-data center operations
 Eliminates the need for a specific caching layer to store data
 Offers a flexible schema design which can easily be altered without
downtime or service disruption

Disadvantages of NoSQL
 No standardization rules
 Limited query capabilities
 RDBMS databases and tools are comparatively mature
 It does not offer any traditional database capabilities, like consistency
when multiple transactions are performed simultaneously.
 When the volume of data increases it is difficult to maintain unique
values as keys become difficult
 Doesn’t work as well with relational data
 The learning curve is stiff for new developers
 Open source options so not so popular for enterprises.

SHARDING
 Sharding is a partitioning pattern for the NoSQL age.
 It's a partitioning pattern that places each partition in potentially
separate servers—potentially all over the world.
 This scale out works well for supporting people all over the world
accessing different parts of the data set with performance.

Need for Sharding:


Consider a very large database whose sharding has not been done.
For example, let’s take a DataBase of a college in which all the
student’s record (present and past) in the whole college are
maintained in a single database. So, it would contain very very large
number of data, say 100, 000 records.
Now when we need to find a student from this Database, each time
around 100, 000 transactions has to be done to find the student,
which is very very costly.
Now consider the same college students records, divided into
smaller data shards based on years. Now each data shard will have
around 1000-5000 students records only. So not only the database
became much more manageable, but also the transaction cost of
each time also reduces by a huge factor, which is achieved by
Sharding.
Hence this is why Sharding is needed.
Features of Sharding:
 Sharding makes the Database smaller
 Sharding makes the Database faster
 Sharding makes the Database much more easily manageable
 Sharding can be a complex operation sometimes
 Sharding reduces the transaction cost of the Database

DOCUMENT BASED

 A document-oriented database or a NoSQL document store is a


modern way to store data in JSON format rather than simple
rows and columns.
 It allows you to express data in its natural form the way it's meant to
be.
 For the past 40 years, relational databases have dominated the
database industry.
 With such problems faced by data intensive and fast moving
organizations, new technology solutions were demanded and the
answer is NoSQL Document Databases.
 In contrast to rows and columns, NoSQL databases keep data in
documents. These documents follow a minimum of standard format
rules (so that the database can understand it for post processing).
 The format used could be JSON, XML, YAML etc.
 The JSON format is the format of choice for NoSQL databases, and
for good reason.
 A JSON document is simply more compact and more readable.
 JSON (JavaScript Object Notation) is a data representation format
formulated by the JavaScript programming language.
 But because of its textual nature it is also used by all other
programming languages.

What is MongoDB
MongoDB is an open-source document database that provides high
performance, high availability, and automatic scaling.

In simple words, you can say that - Mongo DB is a document-oriented


database. It is an open source product, developed and supported by a
company named 10gen.

MongoDB is available under General Public license for free, and it is also
available under Commercial license from the manufacturer.

The manufacturing company 10gen has defined MongoDB as:

"MongoDB is a scalable, open source, high performance, document-


oriented database." - 10gen

MongoDB was designed to work with commodity servers. Now it is used by


the company of all sizes, across all industry.

History of MongoDB
The initial development of MongoDB began in 2007 when the company
was building a platform as a service similar to window azure.

Window azure is a cloud computing platform and infrastructure, created by Microsoft, to build,
deploy and manage applications and service through a global network.

MongoDB was developed by a NewYork based organization named 10gen


which is now known as MongoDB Inc. It was initially developed as a PAAS
(Platform as a Service). Later in 2009, it is introduced in the market as an
open source database server that was maintained and supported by
MongoDB Inc.

The first ready production of MongoDB has been considered from version
1.4 which was released in March 2010.

MongoDB2.4.9 was the latest and stable version which was released on
January 10, 2014.

Purpose of building MongoDB


It may be a very genuine question that - "what was the need of MongoDB
although there were many databases in action?"
There is a simple answer:

All the modern applications require big data, fast features development,
flexible deployment, and the older database systems not competent
enough, so the MongoDB was needed.

The primary purpose of building MongoDB is:

o Scalability
o Performance
o High Availability
o Scaling from single server deployments to large, complex multi-site
architectures.
o Key points of MongoDB
o Develop Faster
o Deploy Easier
o Scale Bigger

First of all, we should know what is document oriented database?

Example of document oriented database


MongoDB is a document oriented database. It is a key feature of
MongoDB. It offers a document oriented storage. It is very simple you can
program it easily.

MongoDB stores data as documents, so it is known as document-oriented


database.

1. FirstName = "John",

2. Address = "Detroit",

3. Spouse = [{Name: "Angela"}].

4. FirstName ="John",
5. Address = "Wick"

There are two different documents (separated by ".").

Storing data in this manner is called as document-oriented database.


Mongo DB falls into a class of databases that calls Document Oriented
Databases. There is also a broad category of database known as No SQL
Databases.

Features of MongoDB
These are some important features of MongoDB:

1. Support ad hoc queries

In MongoDB, you can search by field, range query and it also supports
regular expression searches.

2. Indexing

You can index any field in a document.

3. Replication

MongoDB supports Master Slave replication.

A master can perform Reads and Writes and a Slave copies data from the
master and can only be used for reads or back up (not writes)

4. Duplication of data

MongoDB can run over multiple servers. The data is duplicated to keep
the system up and also keep its running condition in case of hardware
failure.

5. Load balancing

It has an automatic load balancing configuration because of data placed in


shards.

6. Supports map reduce and aggregation tools.

7. Uses JavaScript instead of Procedures.

8. It is a schema-less database written in C++.

9. Provides high performance.

10. Stores files of any size easily without complicating your stack.
SHARDING IN MONGODB
Sharding is a methodology to distribute data across multiple machines.
Sharding is basically used for deployment with a large dataset and high
throughput operations. The single database cannot handle a database with
large datasets as it requires larger storage, and bulk query operations can
use most of the CPU cycles, which slows down processing. For such
scenarios, we need more powerful systems.

One approach is to add more capacity to a single server, such as adding


more memory and processing units or adding more RAM on the single server,
this is also called vertical scaling. Another approach is to divide a large
dataset across multiple systems and serve a data application to query data
from multiple servers. This approach is called horizontal scaling. MongoDB
handles horizontal scaling through sharding.

Sharded clusters
MongoDB’s sharding consists of the following components:

 Shard: Each shard stores a subset of sharded data. Also, each shard
can be deployed as a replica set.
 Mongos: Mongos provide an interface between a client application
and sharded cluster to route the query.
 Config server: The configuration server stores the metadata and
configuration settings for the cluster. The MongoDB data is sharded
at the collection level and distributed across sharded clusters.
 Shard keys: To distribute documents in collections, MongoDB
partitions the collection using the shard key. MongoDB shards data
into chunks. These chunks are distributed across shards in sharded
clusters.

Advantages of sharding
Here are some of the advantages of sharding:

 When we use sharding, the load of the read/write operations gets


distributed across sharded clusters.
 As sharding is used to distribute data across a shard cluster, we can
increase the storage capacity by adding shards horizontally.
 MongoDB allows continuing the read/write operation even if one of
the shards is unavailable. In the production environment, shards
should deploy with a replication mechanism to maintain high
availability and add fault tolerance in a system.
MONGODB

MongoDB CRUD operations


As we know that we can use MongoDB for various things like
building an application (including web and mobile), or analysis of
data, or an administrator of a MongoDB database, in all these cases
we need to interact with the MongoDB server to perform certain
operations like entering new data into the application, updating data
into the application, deleting data from the application, and reading
the data of the application.
MongoDB provides a set of some basic but most essential operations
that will help you to easily interact with the MongoDB server and
these operations are known as CRUD operations.

Create Operations –
The create or insert operations are used to insert or add new
documents in the collection. If a collection does not exist, then it will
create a new collection in the database. You can perform, create
operations using the following methods provided by the MongoDB:
Method Description
db.collection.insertOne() It is used to insert a single document in the collection.

db.collection.insertMany() It is used to insert multiple documents in the collection.

Example 1:
In this example, we are inserting details of a single student in the
form of document in the student collection
using db.collection.insertOne() method.
Example 2:
In this example, we are inserting details of the multiple students in
the form of documents in the student collection
using db.collection.insertMany() method.

Read Operations –
The Read operations are used to retrieve documents from the
collection, or in other words, read operations are used to query a
collection for a document. You can perform read operation using the
following method provided by the MongoDB:

Method Description

db.collection.find() It is used to retrieve documents from the collection.

Example :
In this example, we are retrieving the details of students from the
student collection using db.collection.find() method.
Update Operations –
The update operations are used to update or modify the existing
document in the collection. You can perform update operations
using the following methods provided by the MongoDB:
Method Description

It is used to update a single document in the


db.collection.updateOne() collection that satisfy the given criteria.

It is used to update multiple documents in the


db.collection.updateMany() collection that satisfy the given criteria.

It is used to replace single document in the collection


db.collection.replaceOne() that satisfy the given criteria.

Example 1:
In this example, we are updating the age of Sumit in the student
collection using db.collection.updateOne() method.

Example 2:

In this example, we are updating the year of course in all the


documents in the student collection
using db.collection.updateMany() method.

Delete Operations –
The delete operation are used to delete or remove the documents
from a collection. You can perform delete operations using the
following methods provided by the MongoDB:
Method Description
It is used to delete a single document from the
db.collection.deleteOne() collection

It is used to delete multiple documents from the


db.collection.deleteMany() collection that satisfy the given criteria.

Example 1:
In this example, we are deleting a document from the student
collection using db.collection.deleteOne() method.

Example 2:
In this example, we are deleting all the documents from the student
collection using db.collection.deleteMany() method.

Mongodb query
The find() Method
To query data from MongoDB collection, you need to use
MongoDB's find() method.

Syntax
The basic syntax of find() method is as follows −
>db.COLLECTION_NAME.find()
find() method will display all the documents in a non-structured way.

Example
Assume we have created a collection named mycol as −
> use sampleDB
switched to db sampleDB
> db.createCollection("mycol")
{ "ok" : 1 }
>

And inserted 3 documents in it using the insert() method as shown below −


> db.mycol.insert([
{
title: "MongoDB Overview",
description: "MongoDB is no SQL database",
by: "tutorials point",
url: "http://www.tutorialspoint.com",
tags: ["mongodb", "database", "NoSQL"],
likes: 100
},
{
title: "NoSQL Database",
description: "NoSQL database doesn't have tables",
by: "tutorials point",
url: "http://www.tutorialspoint.com",
tags: ["mongodb", "database", "NoSQL"],
likes: 20,
comments: [
{
user:"user1",
message: "My first comment",
dateCreated: new
Date(2013,11,10,2,35),
like: 0
}
]
}
])

Following method retrieves all the documents in the collection −


> db.mycol.find()
{ "_id" : ObjectId("5dd4e2cc0821d3b44607534c"), "title" :
"MongoDB Overview", "description" : "MongoDB is no SQL database",
"by" : "tutorials point", "url" :
"http://www.tutorialspoint.com", "tags" : [ "mongodb",
"database", "NoSQL" ], "likes" : 100 }
{ "_id" : ObjectId("5dd4e2cc0821d3b44607534d"), "title" : "NoSQL
Database", "description" : "NoSQL database doesn't have tables",
"by" : "tutorials point", "url" :
"http://www.tutorialspoint.com", "tags" : [ "mongodb",
"database", "NoSQL" ], "likes" : 20, "comments" : [ { "user" :
"user1", "message" : "My first comment", "dateCreated" :
ISODate("2013-12-09T21:05:00Z"), "like" : 0 } ] }
>
The findOne() method
Apart from the find() method, there is findOne() method, that returns only one
document.

Syntax
>db.COLLECTIONNAME.findOne()

Example
Following example retrieves the document with title MongoDB Overview.
> db.mycol.findOne({title: "MongoDB Overview"})
{
"_id" : ObjectId("5dd6542170fb13eec3963bf0"),
"title" : "MongoDB Overview",
"description" : "MongoDB is no SQL database",
"by" : "tutorials point",
"url" : "http://www.tutorialspoint.com",
"tags" : [
"mongodb",
"database",
"NoSQL"
],
"likes" : 100
}

AND in MongoDB
Syntax
To query documents based on the AND condition, you need to use $and keyword.
Following is the basic syntax of AND −
>db.mycol.find({ $and: [ {<key1>:<value1>},
{ <key2>:<value2>} ] })
> db.mycol.find({$and:[{"by":"tutorials point"},{"title":
"MongoDB Overview"}]}).pretty()
{
"_id" : ObjectId("5dd4e2cc0821d3b44607534c"),
"title" : "MongoDB Overview",
"description" : "MongoDB is no SQL database",
"by" : "tutorials point",
"url" : "http://www.tutorialspoint.com",
"tags" : [
"mongodb",
"database",
"NoSQL"
],
"likes" : 100
}
>
OR in MongoDB
Syntax
To query documents based on the OR condition, you need to use $or keyword.
Following is the basic syntax of OR −
>db.mycol.find(
{
$or: [
{key1: value1}, {key2:value2}
]
}
).pretty()

NOR in MongoDB
Syntax
To query documents based on the NOT condition, you need to use $not keyword.
Following is the basic syntax of NOT −
>db.COLLECTION_NAME.find(
{
$not: [
{key1: value1}, {key2:value2}
]
}
)

Example
Assume we have inserted 3 documents in the collection empDetails as shown
below −
db.empDetails.insertMany(
[
{
First_Name: "Radhika",
Last_Name: "Sharma",
Age: "26",
e_mail: "radhika_sharma.123@gmail.com",
phone: "9000012345"
},
{
First_Name: "Rachel",
Last_Name: "Christopher",
Age: "27",
e_mail: "Rachel_Christopher.123@gmail.com",
phone: "9000054321"
},
{
First_Name: "Fathima",
Last_Name: "Sheik",
Age: "24",
e_mail: "Fathima_Sheik.123@gmail.com",
phone: "9000054321"
}
]
)

Following example will retrieve the document(s) whose first name is not "Radhika"
and last name is not "Christopher"
> db.empDetails.find(
{
$nor:[
40
{"First_Name": "Radhika"},
{"Last_Name": "Christopher"}
]
}
).pretty()
{
"_id" : ObjectId("5dd631f270fb13eec3963bef"),
"First_Name" : "Fathima",
"Last_Name" : "Sheik",
"Age" : "24",
"e_mail" : "Fathima_Sheik.123@gmail.com",
"phone" : "9000054321"
}

NOT in MongoDB
Syntax
To query documents based on the NOT condition, you need to use $not keyword
following is the basic syntax of NOT −
>db.COLLECTION_NAME.find(
{
$NOT: [
{key1: value1}, {key2:value2}
]
}
).pretty()

Example
Following example will retrieve the document(s) whose age is not greater than 25
> db.empDetails.find( { "Age": { $not: { $gt: "25" } } } )
{
"_id" : ObjectId("5dd6636870fb13eec3963bf7"),
"First_Name" : "Fathima",
"Last_Name" : "Sheik",
"Age" : "24",
"e_mail" : "Fathima_Sheik.123@gmail.com",
"phone" : "9000054321"
}
REPLICATION
Replication is the process of synchronizing data across multiple servers. Replication
provides redundancy and increases data availability with multiple copies of data on
different database servers. Replication protects a database from the loss of a single
server. Replication also allows you to recover from hardware failure and service
interruptions. With additional copies of the data, you can dedicate one to disaster
recovery, reporting, or backup.

Why Replication?
 To keep your data safe
 High (24*7) availability of data
 Disaster recovery
 No downtime for maintenance (like backups, index rebuilds, compaction)
 Read scaling (extra copies to read from)
 Replica set is transparent to the application

How Replication Works in MongoDB


MongoDB achieves replication by the use of replica set. A replica set is a group
of mongod instances that host the same data set. In a replica, one node is primary
node that receives all write operations. All other instances, such as secondaries,
apply operations from the primary so that they have the same data set. Replica set
can have only one primary node.
 Replica set is a group of two or more nodes (generally minimum 3 nodes are
required).
 In a replica set, one node is primary node and remaining nodes are
secondary.
 All data replicates from primary to secondary node.
 At the time of automatic failover or maintenance, election establishes for
primary and a new primary node is elected.
 After the recovery of failed node, it again join the replica set and works as a
secondary node.
A typical diagram of MongoDB replication is shown in which client application
always interact with the primary node and the primary node then replicates the data
to the secondary nodes.
Replica Set Features
 A cluster of N nodes
 Any one node can be primary
 All write operations go to primary
 Automatic failover
 Automatic recovery
 Consensus election of primary

Set Up a Replica Set


In this tutorial, we will convert standalone MongoDB instance to a replica set. To
convert to replica set, following are the steps −
 Shutdown already running MongoDB server.

 Start the MongoDB server by specifying -- replSet option. Following is the
basic syntax of --replSet −
mongod --port "PORT" --dbpath "YOUR_DB_DATA_PATH" --replSet
"REPLICA_SET_INSTANCE_NAME"

Example
mongod --port 27017 --dbpath "D:\set up\mongodb\data" --replSet
rs0
 It will start a mongod instance with the name rs0, on port 27017.
 Now start the command prompt and connect to this mongod instance.
 In Mongo client, issue the command rs.initiate() to initiate a new replica set.
 To check the replica set configuration, issue the command rs.conf(). To
check the status of replica set issue the command rs.status().

Add Members to Replica Set


To add members to replica set, start mongod instances on multiple machines. Now
start a mongo client and issue a command rs.add().

Syntax
The basic syntax of rs.add() command is as follows −
>rs.add(HOST_NAME:PORT)

Example
Suppose your mongod instance name is mongod1.net and it is running on
port 27017. To add this instance to replica set, issue the command rs.add() in
Mongo client.
>rs.add("mongod1.net:27017")
>

You can add mongod instance to replica set only when you are connected to
primary node. To check whether you are connected to primary or not, issue the
command db.isMaster() in mongo client.

SHARDING
Sharding is the process of storing data records across multiple machines and it is
MongoDB's approach to meeting the demands of data growth. As the size of the
data increases, a single machine may not be sufficient to store the data nor provide
an acceptable read and write throughput. Sharding solves the problem with
horizontal scaling. With sharding, you add more machines to support data growth
and the demands of read and write operations.

Why Sharding?
 In replication, all writes go to master node
 Latency sensitive queries still go to master
 Single replica set has limitation of 12 nodes
 Memory can't be large enough when active dataset is big
 Local disk is not big enough
 Vertical scaling is too expensive

Sharding in MongoDB
The following diagram shows the Sharding in MongoDB using sharded cluster.
In the following diagram, there are three main components −
 Shards − Shards are used to store data. They provide high availability and
data consistency. In production environment, each shard is a separate
replica set.
 Config Servers − Config servers store the cluster's metadata. This data
contains a mapping of the cluster's data set to the shards. The query router
uses this metadata to target operations to specific shards. In production
environment, sharded clusters have exactly 3 config servers.
 Query Routers − Query routers are basically mongo instances, interface with
client applications and direct operations to the appropriate shard. The query
router processes and targets the operations to shards and then returns
results to the clients. A sharded cluster can contain more than one query
router to divide the client request load. A client sends requests to one query
router. Generally, a sharded cluster have many query routers.

MongoDB - Deployment
When you are preparing a MongoDB deployment, you should try to understand how
your application is going to hold up in production. It’s a good idea to develop a
consistent, repeatable approach to managing your deployment environment so that
you can minimize any surprises once you’re in production.
The best approach incorporates prototyping your set up, conducting load testing,
monitoring key metrics, and using that information to scale your set up. The key part
of the approach is to proactively monitor your entire system - this will help you
understand how your production system will hold up before deploying, and
determine where you will need to add capacity. Having insight into potential spikes
in your memory usage, for example, could help put out a write-lock fire before it
starts.
To monitor your deployment, MongoDB provides some of the following commands

mongostat
This command checks the status of all running mongod instances and return
counters of database operations. These counters include inserts, queries, updates,
deletes, and cursors. Command also shows when you’re hitting page faults, and
showcase your lock percentage. This means that you're running low on memory,
hitting write capacity or have some performance issue.
To run the command, start your mongod instance. In another command prompt, go
to bin directory of your mongodb installation and type mongostat.
D:\set up\mongodb\bin>mongostat
Following is the output of the command −
mongotop
This command tracks and reports the read and write activity of MongoDB instance
on a collection basis. By default, mongotop returns information in each second,
which you can change it accordingly. You should check that this read and write
activity matches your application intention, and you’re not firing too many writes to
the database at a time, reading too frequently from a disk, or are exceeding your
working set size.
To run the command, start your mongod instance. In another command prompt, go
to bin directory of your mongodb installation and type mongotop.
D:\set up\mongodb\bin>mongotop
Following is the output of the command −
To change mongotop command to return information less frequently, specify a
specific number after the mongotop command.
D:\set up\mongodb\bin>mongotop 30
The above example will return values every 30 seconds.
Apart from the MongoDB tools, 10gen provides a free, hosted monitoring service,
MongoDB Management Service (MMS), that provides a dashboard and gives you a
view of the metrics from your entire cluster.

MONGODB-JAVA
Installation
Before you start using MongoDB in your Java programs, you need to make sure
that you have MongoDB CLIENT and Java set up on the machine. You can check
Java tutorial for Java installation on your machine. Now, let us check how to set up
MongoDB CLIENT.
 You need to download the jar mongodb-driver-3.11.2.jar and its
dependency mongodb-driver-core-3.11.2.jar.. Make sure to download the
latest release of these jar files.
 You need to include the downloaded jar files into your classpath.

Connect to Database
To connect database, you need to specify the database name, if the database
doesn't exist then MongoDB creates it automatically.
Following is the code snippet to connect to the database −
import com.mongodb.client.MongoDatabase;
import com.mongodb.MongoClient;
import com.mongodb.MongoCredential;
public class ConnectToDB {

public static void main( String args[] ) {

// Creating a Mongo client


MongoClient mongo = new MongoClient( "localhost" , 27017 );

// Creating Credentials
MongoCredential credential;
credential = MongoCredential.createCredential("sampleUser",
"myDb",
"password".toCharArray());
System.out.println("Connected to the database
successfully");

// Accessing the database


MongoDatabase database = mongo.getDatabase("myDb");
System.out.println("Credentials ::"+ credential);
}
}

Now, let's compile and run the above program to create our database myDb as
shown below.
$javac ConnectToDB.java
$java ConnectToDB
On executing, the above program gives you the following output.
Connected to the database successfully
Credentials ::MongoCredential{
mechanism = null,
userName = 'sampleUser',
source = 'myDb',
password = <hidden>,
mechanismProperties = {}
}

Create a Collection
To create a collection, createCollection() method
of com.mongodb.client.MongoDatabase class is used.
Following is the code snippet to create a collection −
import com.mongodb.client.MongoDatabase;
import com.mongodb.MongoClient;
import com.mongodb.MongoCredential;
public class CreatingCollection {

public static void main( String args[] ) {

// Creating a Mongo client


MongoClient mongo = new MongoClient( "localhost" , 27017 );

// Creating Credentials
MongoCredential credential;
credential = MongoCredential.createCredential("sampleUser",
"myDb",
"password".toCharArray());
System.out.println("Connected to the database
successfully");

//Accessing the database


MongoDatabase database = mongo.getDatabase("myDb");

//Creating a collection
database.createCollection("sampleCollection");
System.out.println("Collection created successfully");
}
}

On compiling, the above program gives you the following result −


Connected to the database successfully
Collection created successfully

Getting/Selecting a Collection
To get/select a collection from the database, getCollection() method
of com.mongodb.client.MongoDatabase class is used.
Following is the program to get/select a collection −
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;
import com.mongodb.MongoClient;
import com.mongodb.MongoCredential;
public class selectingCollection {

public static void main( String args[] ) {

// Creating a Mongo client


MongoClient mongo = new MongoClient( "localhost" , 27017 );

// Creating Credentials
MongoCredential credential;
credential = MongoCredential.createCredential("sampleUser",
"myDb",
"password".toCharArray());
System.out.println("Connected to the database
successfully");

// Accessing the database


MongoDatabase database = mongo.getDatabase("myDb");

// Creating a collection
System.out.println("Collection created successfully");
// Retrieving a collection
MongoCollection<Document> collection =
database.getCollection("myCollection");
System.out.println("Collection myCollection selected
successfully");
}
}

On compiling, the above program gives you the following result −


Connected to the database successfully
Collection created successfully
Collection myCollection selected successfully

Insert a Document
To insert a document into MongoDB, insert() method
of com.mongodb.client.MongoCollection class is used.
Following is the code snippet to insert a document −
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;
import com.mongodb.MongoClient;
public class InsertingDocument {
public static void main( String args[] ) {

// Creating a Mongo client


MongoClient mongo = new MongoClient( "localhost" , 27017
);

// Accessing the database


MongoDatabase database = mongo.getDatabase("myDb");

// Creating a collection
database.createCollection("sampleCollection");
System.out.println("Collection created successfully");

// Retrieving a collection
MongoCollection<Document> collection =
database.getCollection("sampleCollection");
System.out.println("Collection sampleCollection selected
successfully");
Document document = new Document("title", "MongoDB")
.append("description", "database")
.append("likes", 100)
.append("url", "http://www.tutorialspoint.com/mongodb/")
.append("by", "tutorials point");

//Inserting document into the collection


collection.insertOne(document);
System.out.println("Document inserted successfully");
}

On compiling, the above program gives you the following result −


Connected to the database successfully
Collection sampleCollection selected successfully
Document inserted successfully

Retrieve All Documents


To select all documents from the collection, find() method
of com.mongodb.client.MongoCollection class is used. This method returns a
cursor, so you need to iterate this cursor.
Following is the program to select all documents −
import com.mongodb.client.FindIterable;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import org.bson.Document;
import com.mongodb.MongoClient;
import com.mongodb.MongoCredential;
public class RetrievingAllDocuments {
public static void main( String args[] ) {

// Creating a Mongo client


MongoClient mongo = new MongoClient( "localhost" ,
27017 );

// Creating Credentials
MongoCredential credential;
credential =
MongoCredential.createCredential("sampleUser", "myDb",
"password".toCharArray());
System.out.println("Connected to the database
successfully");

// Accessing the database


MongoDatabase database = mongo.getDatabase("myDb");

// Retrieving a collection
MongoCollection<Document> collection =
database.getCollection("sampleCollection");
System.out.println("Collection sampleCollection
selected successfully");
Document document1 = new Document("title",
"MongoDB")
.append("description", "database")
.append("likes", 100)
.append("url",
"http://www.tutorialspoint.com/mongodb/")
.append("by", "tutorials point");
Document document2 = new Document("title",
"RethinkDB")
.append("description", "database")
.append("likes", 200)
.append("url",
"http://www.tutorialspoint.com/rethinkdb/")
.append("by", "tutorials point");
List<Document> list = new ArrayList<Document>();
list.add(document1);
list.add(document2);
collection.insertMany(list);
// Getting the iterable object
FindIterable<Document> iterDoc = collection.find();
int i = 1;
// Getting the iterator
Iterator it = iterDoc.iterator();
while (it.hasNext()) {
System.out.println(it.next());
i++;
}
}
}

On compiling, the above program gives you the following result −


Connected to the database successfully
Collection sampleCollection selected successfully
Document{{_id=5dce4e9ff68a9c2449e197b2, title=MongoDB,
description=database, likes=100,
url=http://www.tutorialspoint.com/mongodb/, by=tutorials point}}
Document{{_id=5dce4e9ff68a9c2449e197b3, title=RethinkDB,
description=database, likes=200,
url=http://www.tutorialspoint.com/rethinkdb/, by=tutorials
point}}

Update Document
To update a document from the collection, updateOne() method
of com.mongodb.client.MongoCollection class is used.
Following is the program to select the first document −
import com.mongodb.client.FindIterable;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import com.mongodb.client.model.Filters;
import com.mongodb.client.model.Updates;
import java.util.Iterator;
import org.bson.Document;
import com.mongodb.MongoClient;
import com.mongodb.MongoCredential;
public class UpdatingDocuments {

public static void main( String args[] ) {

// Creating a Mongo client


MongoClient mongo = new MongoClient( "localhost" , 27017 );

// Creating Credentials
MongoCredential credential;
credential = MongoCredential.createCredential("sampleUser",
"myDb",
"password".toCharArray());
System.out.println("Connected to the database
successfully");

// Accessing the database


MongoDatabase database = mongo.getDatabase("myDb");
// Retrieving a collection
MongoCollection<Document> collection =
database.getCollection("sampleCollection");
System.out.println("Collection myCollection selected
successfully");
collection.updateOne(Filters.eq("title", 1),
Updates.set("likes", 150));
System.out.println("Document update successfully...");

// Retrieving the documents after updation


// Getting the iterable object
FindIterable<Document> iterDoc = collection.find();
int i = 1;
// Getting the iterator
Iterator it = iterDoc.iterator();
while (it.hasNext()) {
System.out.println(it.next());
i++;
}
}
}

On compiling, the above program gives you the following result −


Connected to the database successfully
Collection myCollection selected successfully
Document update successfully...
Document{{_id=5dce4e9ff68a9c2449e197b2, title=MongoDB,
description=database, likes=100,
url=http://www.tutorialspoint.com/mongodb/, by=tutorials point}}
Document{{_id=5dce4e9ff68a9c2449e197b3, title=RethinkDB,
description=database, likes=200,
url=http://www.tutorialspoint.com/rethinkdb/, by=tutorials
point}}

Delete a Document
To delete a document from the collection, you need to use the deleteOne() method
of the com.mongodb.client.MongoCollection class.
Following is the program to delete a document −
import com.mongodb.client.FindIterable;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import com.mongodb.client.model.Filters;
import java.util.Iterator;
import org.bson.Document;
import com.mongodb.MongoClient;
import com.mongodb.MongoCredential;
public class DeletingDocuments {

public static void main( String args[] ) {

// Creating a Mongo client


MongoClient mongo = new MongoClient( "localhost" , 27017 );

// Creating Credentials
MongoCredential credential;
credential = MongoCredential.createCredential("sampleUser",
"myDb",
"password".toCharArray());
System.out.println("Connected to the database
successfully");

// Accessing the database


MongoDatabase database = mongo.getDatabase("myDb");
// Retrieving a collection
MongoCollection<Document> collection =
database.getCollection("sampleCollection");
System.out.println("Collection sampleCollection selected
successfully");
// Deleting the documents
collection.deleteOne(Filters.eq("title", "MongoDB"));
System.out.println("Document deleted successfully...");

// Retrieving the documents after updation


// Getting the iterable object
FindIterable<Document> iterDoc = collection.find();
int i = 1;
// Getting the iterator
Iterator it = iterDoc.iterator();
while (it.hasNext()) {
System.out.println(it.next());
i++;
}
}
}

On compiling, the above program gives you the following result −


Connected to the database successfully
Collection sampleCollection selected successfully
Document deleted successfully...
Document{{_id=5dce4e9ff68a9c2449e197b3, title=RethinkDB,
description=database, likes=200,
url=http://www.tutorialspoint.com/rethinkdb/, by=tutorials
point}}

Dropping a Collection
To drop a collection from a database, you need to use the drop() method of
the com.mongodb.client.MongoCollection class.
Following is the program to delete a collection −
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;
import com.mongodb.MongoClient;
import com.mongodb.MongoCredential;
public class DropingCollection {

public static void main( String args[] ) {


// Creating a Mongo client
MongoClient mongo = new MongoClient( "localhost" , 27017 );
// Creating Credentials
MongoCredential credential;
credential = MongoCredential.createCredential("sampleUser",
"myDb",
"password".toCharArray());
System.out.println("Connected to the database
successfully");

// Accessing the database


MongoDatabase database = mongo.getDatabase("myDb");

// Creating a collection
System.out.println("Collections created successfully");
// Retrieving a collection
MongoCollection<Document> collection =
database.getCollection("sampleCollection");
// Dropping a Collection
collection.drop();
System.out.println("Collection dropped successfully");
}
}

On compiling, the above program gives you the following result −


Connected to the database successfully
Collection sampleCollection selected successfully
Collection dropped successfully

Listing All the Collections


To list all the collections in a database, you need to use
the listCollectionNames() method of
the com.mongodb.client.MongoDatabase class.
Following is the program to list all the collections of a database −
import com.mongodb.client.MongoDatabase;
import com.mongodb.MongoClient;
import com.mongodb.MongoCredential;
public class ListOfCollection {

public static void main( String args[] ) {

// Creating a Mongo client


MongoClient mongo = new MongoClient( "localhost" , 27017 );
// Creating Credentials
MongoCredential credential;
credential = MongoCredential.createCredential("sampleUser",
"myDb",
"password".toCharArray());
System.out.println("Connected to the database
successfully");

// Accessing the database


MongoDatabase database = mongo.getDatabase("myDb");
System.out.println("Collection created successfully");
for (String name : database.listCollectionNames()) {
System.out.println(name);
}
}
}

On compiling, the above program gives you the following result −


Connected to the database successfully
Collection created successfully
myCollection
myCollection1
myCollection5
MONGODB-PHP
To use MongoDB with PHP, you need to use MongoDB PHP driver. Download the
driver from the url Download PHP Driver. Make sure to download the latest release
of it. Now unzip the archive and put php_mongo.dll in your PHP extension directory
("ext" by default) and add the following line to your php.ini file −
extension = php_mongo.dll

Make a Connection and Select a Database


To make a connection, you need to specify the database name, if the database
doesn't exist then MongoDB creates it automatically.
Following is the code snippet to connect to the database −
<?php
// connect to mongodb
$m = new MongoClient();

echo "Connection to database successfully";


// select a database
$db = $m->mydb;

echo "Database mydb selected";


?>

When the program is executed, it will produce the following result −


Connection to database successfully
Database mydb selected

Create a Collection
Following is the code snippet to create a collection −
<?php
// connect to mongodb
$m = new MongoClient();
echo "Connection to database successfully";

// select a database
$db = $m->mydb;
echo "Database mydb selected";
$collection = $db->createCollection("mycol");
echo "Collection created succsessfully";
?>

When the program is executed, it will produce the following result −


Connection to database successfully
Database mydb selected
Collection created succsessfully

Insert a Document
To insert a document into MongoDB, insert() method is used.
Following is the code snippet to insert a document −
<?php
// connect to mongodb
$m = new MongoClient();
echo "Connection to database successfully";

// select a database
$db = $m->mydb;
echo "Database mydb selected";
$collection = $db->mycol;
echo "Collection selected succsessfully";

$document = array(
"title" => "MongoDB",
"description" => "database",
"likes" => 100,
"url" => "http://www.tutorialspoint.com/mongodb/",
"by" => "tutorials point"
);

$collection->insert($document);
echo "Document inserted successfully";
?>

When the program is executed, it will produce the following result −


Connection to database successfully
Database mydb selected
Collection selected succsessfully
Document inserted successfully

Find All Documents


To select all documents from the collection, find() method is used.
Following is the code snippet to select all documents −
<?php
// connect to mongodb
$m = new MongoClient();
echo "Connection to database successfully";
// select a database
$db = $m->mydb;
echo "Database mydb selected";
$collection = $db->mycol;
echo "Collection selected succsessfully";
$cursor = $collection->find();
// iterate cursor to display title of documents

foreach ($cursor as $document) {


echo $document["title"] . "\n";
}
?>

When the program is executed, it will produce the following result −


Connection to database successfully
Database mydb selected
Collection selected succsessfully {
"title": "MongoDB"
}

Update a Document
To update a document, you need to use the update() method.
In the following example, we will update the title of inserted document to MongoDB
Tutorial. Following is the code snippet to update a document −
<?php
// connect to mongodb
$m = new MongoClient();
echo "Connection to database successfully";

// select a database
$db = $m->mydb;
echo "Database mydb selected";
$collection = $db->mycol;
echo "Collection selected succsessfully";
// now update the document
$collection->update(array("title"=>"MongoDB"),
array('$set'=>array("title"=>"MongoDB Tutorial")));
echo "Document updated successfully";

// now display the updated document


$cursor = $collection->find();

// iterate cursor to display title of documents


echo "Updated document";

foreach ($cursor as $document) {


echo $document["title"] . "\n";
}
?>

When the program is executed, it will produce the following result −


Connection to database successfully
Database mydb selected
Collection selected succsessfully
Document updated successfully
Updated document {
"title": "MongoDB Tutorial"
}

Delete a Document
To delete a document, you need to use remove() method.
In the following example, we will remove the documents that has the title MongoDB
Tutorial. Following is the code snippet to delete a document −
<?php
// connect to mongodb
$m = new MongoClient();
echo "Connection to database successfully";

// select a database
$db = $m->mydb;
echo "Database mydb selected";
$collection = $db->mycol;
echo "Collection selected succsessfully";

// now remove the document


$collection->remove(array("title"=>"MongoDB Tutorial"),false);
echo "Documents deleted successfully";

// now display the available documents


$cursor = $collection->find();

// iterate cursor to display title of documents


echo "Updated document";

foreach ($cursor as $document) {


echo $document["title"] . "\n";
}
?>

When the program is executed, it will produce the following result −


Connection to database successfully
Database mydb selected
Collection selected successfully
Documents deleted successfully
In the above example, the second parameter is boolean type and used
for justOne field of remove() method.

ADVANCED MONGODB FEATURES


These are some important features of MongoDB:

1. Support ad hoc queries

In MongoDB, you can search by field, range query and it also supports
regular expression searches.

2. Indexing

You can index any field in a document.

3. Replication

MongoDB supports Master Slave replication.

A master can perform Reads and Writes and a Slave copies data from the
master and can only be used for reads or back up (not writes)

4. Duplication of data

MongoDB can run over multiple servers. The data is duplicated to keep
the system up and also keep its running condition in case of hardware
failure.

5. Load balancing

It has an automatic load balancing configuration because of data placed in


shards.

6. Supports map reduce and aggregation tools.

7. Uses JavaScript instead of Procedures.

8. It is a schema-less database written in C++.

9. Provides high performance.

10. Stores files of any size easily without complicating your stack.

11. Easy to administer in the case of failures.

12. It also supports:

o JSON data model with dynamic schemas


o Auto-sharding for horizontal scalability
o Built in replication for high availability

Cassandra:
Cassandra is the modern version of the relational database, albeit where data is
grouped by column instead of row, for fast retrieval. MongoDB stores records as
documents in JSON format. It has a JavaScript shell and a rich set of functions
which makes it easy to work with.

What is difference between Cassandra and MongoDB?


Cassandra only supports JSON data format. MongoDB supports both JSON and
BSON data formats. ... Cassandra does not provides ACID transactions but can be
tuned to support ACID properties. MongoDB provides Multi-document ACID
transactions with snapshot isolation.

What is Cassandra used for?

Cassandra is a free and open-source, distributed, wide-column store, NoSQL database


management system designed to handle large amounts of data across many commodity
servers, providing high availability with no single point of failure.

Data Model
Cassandra - Data Model

The data model of Cassandra is significantly different from what we normally see in
an RDBMS..

Cluster
Cassandra database is distributed over several machines that operate together.
The outermost container is known as the Cluster. For failure handling, every node
contains a replica, and in case of a failure, the replica takes charge. Cassandra
arranges the nodes in a cluster, in a ring format, and assigns data to them.

Keyspace
Keyspace is the outermost container for data in Cassandra. The basic attributes of
a Keyspace in Cassandra are −
 Replication factor − It is the number of machines in the cluster that will
receive copies of the same data.
 Replica placement strategy − It is nothing but the strategy to place replicas
in the ring. We have strategies such as simple strategy (rack-aware
strategy), old network topology strategy (rack-aware strategy),
and network topology strategy (datacenter-shared strategy).
 Column families − Keyspace is a container for a list of one or more column
families. A column family, in turn, is a container of a collection of rows. Each
row contains ordered columns. Column families represent the structure of
your data. Each keyspace has at least one and often many column families.
The syntax of creating a Keyspace is as follows −
CREATE KEYSPACE Keyspace name
WITH replication = {'class': 'SimpleStrategy',
'replication_factor' : 3};
The following illustration shows a schematic view of a Keyspace.

Column Family
A column family is a container for an ordered collection of rows. Each row, in turn, is
an ordered collection of columns. The following table lists the points that
differentiate a column family from a table of relational databases.

Relational Table Cassandra column Family

A schema in a relational model is fixed. Once we In Cassandra, although the column families
define certain columns for a table, while inserting are defined, the columns are not. You can
data, in every row all the columns must be filled at freely add any column to any column family
least with a null value. at any time.
Relational tables define only columns and the In Cassandra, a table contains columns, or
user fills in the table with values. can be defined as a super column family.

A Cassandra column family has the following attributes −


 keys_cached − It represents the number of locations to keep cached per
SSTable.
 rows_cached − It represents the number of rows whose entire contents will
be cached in memory.
 preload_row_cache − It specifies whether you want to pre-populate the row
cache.

Data Models of Cassandra and RDBMS


The following table lists down the points that differentiate the data model of
Cassandra from that of an RDBMS.

RDBMS Cassandra

RDBMS deals with structured data. Cassandra deals with unstructured data.

It has a fixed schema. Cassandra has a flexible schema.

In RDBMS, a table is an array of arrays. In Cassandra, a table is a list of “nested key-


(ROW x COLUMN) value pairs”. (ROW x COLUMN key x COLUMN
value)

Database is the outermost container that Keyspace is the outermost container that
contains data corresponding to an contains data corresponding to an application.
application.

Tables are the entities of a database. Tables or column families are the entity of a
keyspace.

Row is an individual record in RDBMS. Row is a unit of replication in Cassandra.

Column represents the attributes of a Column is a unit of storage in Cassandra.


relation.

RDBMS supports the concepts of foreign Relationships are represented using collections.
keys, joins.

KEY SPACE

Creating a Keyspace using Cqlsh


A keyspace in Cassandra is a namespace that defines data replication on nodes. A
cluster contains one keyspace per node. Given below is the syntax for creating a
keyspace using the statement CREATE KEYSPACE.

Syntax
CREATE KEYSPACE <identifier> WITH <properties>
i.e.
CREATE KEYSPACE “KeySpace Name”
WITH replication = {'class': ‘Strategy name’,
'replication_factor' : ‘No.Of replicas’};

CREATE KEYSPACE “KeySpace Name”


WITH replication = {'class': ‘Strategy name’,
'replication_factor' : ‘No.Of replicas’}

AND durable_writes = ‘Boolean value’;


The CREATE KEYSPACE statement has two
properties: replication and durable_writes.

Replication
The replication option is to specify the Replica Placement strategy and the
number of replicas wanted. The following table lists all the replica placement
strategies.

Strategy name Description

Simple Strategy' Specifies a simple replication factor for the cluster.

Network Topology Strategy Using this option, you can set the replication factor
for each data-center independently.
Old Network Topology This is a legacy replication strategy.
Strategy

Using this option, you can instruct Cassandra whether to use commitlog for
updates on the current KeySpace. This option is not mandatory and by default, it is
set to true.

Example
Given below is an example of creating a KeySpace.
 Here we are creating a KeySpace named TutorialsPoint.
 We are using the first replica placement strategy, i.e.., Simple Strategy.
 And we are choosing the replication factor to 1 replica.
cqlsh.> CREATE KEYSPACE tutorialspoint
WITH replication = {'class':'SimpleStrategy',
'replication_factor' : 3};

Verification
You can verify whether the table is created or not using the command Describe. If
you use this command over keyspaces, it will display all the keyspaces created as
shown below.
cqlsh> DESCRIBE keyspaces;

tutorialspoint system system_traces


Here you can observe the newly created KeySpace tutorialspoint.

Durable_writes
By default, the durable_writes properties of a table is set to true, however it can be
set to false. You cannot set this property to simplex strategy.

Example
Given below is the example demonstrating the usage of durable writes property.
cqlsh> CREATE KEYSPACE test
... WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
'datacenter1' : 3 }
... AND DURABLE_WRITES = false;

Verification
You can verify whether the durable_writes property of test KeySpace was set to
false by querying the System Keyspace. This query gives you all the KeySpaces
along with their properties.
cqlsh> SELECT * FROM system_schema.keyspaces;

keyspace_name | durable_writes |
strategy_class | strategy_options
----------------+----------------
+------------------------------------------------------
+----------------------------

test | False |
org.apache.cassandra.locator.NetworkTopologyStrategy |
{"datacenter1" : "3"}

tutorialspoint | True |
org.apache.cassandra.locator.SimpleStrategy |
{"replication_factor" : "4"}

system | True |
org.apache.cassandra.locator.LocalStrategy | { }

system_traces | True |
org.apache.cassandra.locator.SimpleStrategy |
{"replication_factor" : "2"}

(4 rows)
Here you can observe the durable_writes property of test KeySpace was set to
false.

Using a Keyspace
You can use a created KeySpace using the keyword USE. Its syntax is as follows −
Syntax:USE <identifier>

Example
In the following example, we are using the KeySpace tutorialspoint.
cqlsh> USE tutorialspoint;
cqlsh:tutorialspoint>

Creating a Keyspace using Java API


You can create a Keyspace using the execute() method of Session class. Follow
the steps given below to create a keyspace using Java API.

Step1: Create a Cluster Object


First of all, create an instance of Cluster.builder class
of com.datastax.driver.core package as shown below.
//Creating Cluster.Builder object

Cluster.Builder builder1 = Cluster.builder();


Add a contact point (IP address of the node) using addContactPoint() method
of Cluster.Builder object. This method returns Cluster.Builder.
//Adding contact point to the Cluster.Builder object

Cluster.Builder builder2 = build.addContactPoint( "127.0.0.1" );


Using the new builder object, create a cluster object. To do so, you have a method
called build() in the Cluster.Builder class. The following code shows how to create
a cluster object.
//Building a cluster
Cluster cluster = builder.build();
You can build a cluster object in a single line of code as shown below.
Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

Step 2: Create a Session Object


Create an instance of Session object using the connect() method of Cluster class
as shown below.
Session session = cluster.connect( );
This method creates a new session and initializes it. If you already have a
keyspace, you can set it to the existing one by passing the keyspace name in string
format to this method as shown below.
Session session = cluster.connect(“ Your keyspace name ” );

Step 3: Execute Query


You can execute CQL queries using the execute() method of Session class. Pass
the query either in string format or as a Statement class object to
the execute() method. Whatever you pass to this method in string format will be
executed on the cqlsh.
In this example, we are creating a KeySpace named tp. We are using the first
replica placement strategy, i.e., Simple Strategy, and we are choosing the
replication factor to 1 replica.
You have to store the query in a string variable and pass it to the execute() method
as shown below.
String query = "CREATE KEYSPACE tp WITH replication "
+ "= {'class':'SimpleStrategy', 'replication_factor':1}; ";
session.execute(query);
Step4 : Use the KeySpace
You can use a created KeySpace using the execute() method as shown below.
execute(“ USE tp ” );
Given below is the complete program to create and use a keyspace in Cassandra
using Java API.
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Session;

public class Create_KeySpace {

public static void main(String args[]){

//Query
String query = "CREATE KEYSPACE tp WITH replication "
+ "= {'class':'SimpleStrategy',
'replication_factor':1};";

//creating Cluster object


Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

//Creating Session object


Session session = cluster.connect();

//Executing the query


session.execute(query);

//using the KeySpace


session.execute("USE tp");
System.out.println("Keyspace created");
}
}

Save the above program with the class name followed by .java, browse to the
location where it is saved. Compile and execute the program as shown below.
$javac Create_KeySpace.java
$java Create_KeySpace
Under normal conditions, it will produce the following output −
Keyspace created

Altering a KeySpace
ALTER KEYSPACE can be used to alter properties such as the number of replicas
and the durable_writes of a KeySpace. Given below is the syntax of this command.

Syntax
ALTER KEYSPACE <identifier> WITH <properties>
i.e.
ALTER KEYSPACE “KeySpace Name”
WITH replication = {'class': ‘Strategy name’,
'replication_factor' : ‘No.Of replicas’};
The properties of ALTER KEYSPACE are same as CREATE KEYSPACE. It has
two properties: replication and durable_writes.

Replication
The replication option specifies the replica placement strategy and the number of
replicas wanted.

Durable_writes
Using this option, you can instruct Cassandra whether to use commitlog for updates
on the current KeySpace. This option is not mandatory and by default, it is set to
true.

Example
Given below is an example of altering a KeySpace.
 Here we are altering a KeySpace named TutorialsPoint.
 We are changing the replication factor from 1 to 3.
cqlsh.> ALTER KEYSPACE tutorialspoint
WITH replication = {'class':'NetworkTopologyStrategy',
'replication_factor' : 3};

Altering Durable_writes
You can also alter the durable_writes property of a KeySpace. Given below is the
durable_writes property of the test KeySpace.
SELECT * FROM system.schema_keyspaces;

keyspace_name | durable_writes |
strategy_class | strategy_options
----------------+----------------
+------------------------------------------------------
+----------------------------
test | False |
org.apache.cassandra.locator.NetworkTopologyStrategy |
{"datacenter1":"3"}

tutorialspoint | True |
org.apache.cassandra.locator.SimpleStrategy |
{"replication_factor":"4"}

system | True |
org.apache.cassandra.locator.LocalStrategy | { }
system_traces | True |
org.apache.cassandra.locator.SimpleStrategy |
{"replication_factor":"2"}
(4 rows)
ALTER KEYSPACE test
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy',
'datacenter1' : 3}
AND DURABLE_WRITES = true;
Once again, if you verify the properties of KeySpaces, it will produce the following
output.
SELECT * FROM system.schema_keyspaces;
keyspace_name | durable_writes |
strategy_class | strategy_options
----------------+----------------
+------------------------------------------------------
+----------------------------
test | True |
org.apache.cassandra.locator.NetworkTopologyStrategy |
{"datacenter1":"3"}

tutorialspoint | True |
org.apache.cassandra.locator.SimpleStrategy |
{"replication_factor":"4"}

system | True |
org.apache.cassandra.locator.LocalStrategy | { }

system_traces | True |
org.apache.cassandra.locator.SimpleStrategy |
{"replication_factor":"2"}

(4 rows)

Altering a Keyspace using Java API


You can alter a keyspace using the execute() method of Session class. Follow the
steps given below to alter a keyspace using Java API

Step1: Create a Cluster Object


First of all, create an instance of Cluster.builder class
of com.datastax.driver.core package as shown below.
//Creating Cluster.Builder object
Cluster.Builder builder1 = Cluster.builder();
Add a contact point (IP address of the node) using the addContactPoint() method
of Cluster.Builder object. This method returns Cluster.Builder.
//Adding contact point to the Cluster.Builder object
Cluster.Builder builder2 = build.addContactPoint( "127.0.0.1" );
Using the new builder object, create a cluster object. To do so, you have a method
called build() in the Cluster.Builder class. The following code shows how to create
a cluster object.
//Building a cluster

Cluster cluster = builder.build();


You can build the cluster object using a single line of code as shown below.
Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

Step 2: Create a Session Object


Create an instance of Session object using the connect() method of Clusterclass
as shown below.
Session session = cluster.connect( );
This method creates a new session and initializes it. If you already have a
keyspace, you can set it to the existing one by passing the keyspace name in string
format to this method as shown below.
Session session = cluster.connect(“ Your keyspace name ” );

Step 3: Execute Query


You can execute CQL queries using the execute() method of Session class. Pass
the query either in string format or as a Statement class object to the execute()
method. Whatever you pass to this method in string format will be executed on
the cqlsh.
In this example,
 We are altering a keyspace named tp. We are altering the replication option
from Simple Strategy to Network Topology Strategy.
 We are altering the durable_writes to false

You have to store the query in a string variable and pass it to the execute() method
as shown below.
//Query
String query = "ALTER KEYSPACE tp WITH replication " + "=
{'class':'NetworkTopologyStrategy', 'datacenter1':3}" +" AND
DURABLE_WRITES = false;";
session.execute(query);
Given below is the complete program to create and use a keyspace in Cassandra
using Java API.
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Session;

public class Alter_KeySpace {


public static void main(String args[]){

//Query
String query = "ALTER KEYSPACE tp WITH replication " + "=
{'class':'NetworkTopologyStrategy', 'datacenter1':3}"
+ "AND DURABLE_WRITES = false;";

//Creating Cluster object


Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

//Creating Session object


Session session = cluster.connect();

//Executing the query


session.execute(query);

System.out.println("Keyspace altered");
}
}

Save the above program with the class name followed by .java, browse to the
location where it is saved. Compile and execute the program as shown below.
$javac Alter_KeySpace.java
$java Alter_KeySpace
Under normal conditions, it produces the following output −
Keyspace Altered

Dropping a Keyspace
You can drop a KeySpace using the command DROP KEYSPACE. Given below is
the syntax for dropping a KeySpace.

Syntax
DROP KEYSPACE <identifier>
i.e.
DROP KEYSPACE “KeySpace name”

Example
The following code deletes the keyspace tutorialspoint.
cqlsh> DROP KEYSPACE tutorialspoint;

Verification
Verify the keyspaces using the command Describe and check whether the table is
dropped as shown below.
cqlsh> DESCRIBE keyspaces;

system system_traces
Since we have deleted the keyspace tutorialspoint, you will not find it in the
keyspaces list.
Dropping a Keyspace using Java API
You can create a keyspace using the execute() method of Session class. Follow the
steps given below to drop a keyspace using Java API.

Step1: Create a Cluster Object


First of all, create an instance of Cluster.builder class
of com.datastax.driver.core package as shown below.
//Creating Cluster.Builder object
Cluster.Builder builder1 = Cluster.builder();
Add a contact point (IP address of the node) using the addContactPoint() method
of Cluster.Builder object. This method returns Cluster.Builder.
//Adding contact point to the Cluster.Builder object
Cluster.Builder builder2 = build.addContactPoint( "127.0.0.1" );
Using the new builder object, create a cluster object. To do so, you have a method
called build() in the Cluster.Builder class. The following code shows how to create
a cluster object.
//Building a cluster
Cluster cluster = builder.build();
You can build a cluster object using a single line of code as shown below.
Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

Step 2: Create a Session Object


Create an instance of Session object using the connect() method of Cluster class as
shown below.
Session session = cluster.connect( );
This method creates a new session and initializes it. If you already have a
keyspace, you can set it to the existing one by passing the keyspace name in string
format to this method as shown below.
Session session = cluster.connect(“ Your keyspace name”);

Step 3: Execute Query


You can execute CQL queries using the execute() method of Session class. Pass
the query either in string format or as a Statement class object to the execute()
method. Whatever you pass to this method in string format will be executed on the
cqlsh.
In the following example, we are deleting a keyspace named tp. You have to store
the query in a string variable and pass it to the execute() method as shown below.
String query = "DROP KEYSPACE tp; ";

session.execute(query);
Given below is the complete program to create and use a keyspace in Cassandra
using Java API.
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Session;

public class Drop_KeySpace {

public static void main(String args[]){

//Query
String query = "Drop KEYSPACE tp";

//creating Cluster object


Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

//Creating Session object


Session session = cluster.connect();

//Executing the query


session.execute(query);
System.out.println("Keyspace deleted");
}
}

Save the above program with the class name followed by .java, browse to the
location where it is saved. Compile and execute the program as shown below.
$javac Delete_KeySpace.java
$java Delete_KeySpace
Under normal conditions, it should produce the following output −
Keyspace deleted

TABLE OPERATION

Creating a Table
You can create a table using the command CREATE TABLE. Given below is the
syntax for creating a table.

Syntax
CREATE (TABLE | COLUMNFAMILY) <tablename>
('<column-definition>' , '<column-definition>')
(WITH <option> AND <option>)

Defining a Column
You can define a column as shown below.
column name1 data type,
column name2 data type,

example:

age int,
name text

Primary Key
The primary key is a column that is used to uniquely identify a row.
Therefore,defining a primary key is mandatory while creating a table. A primary key
is made of one or more columns of a table. You can define a primary key of a table
as shown below.
CREATE TABLE tablename(
column1 name datatype PRIMARYKEY,
column2 name data type,
column3 name data type.
)
or
CREATE TABLE tablename(
column1 name datatype PRIMARYKEY,
column2 name data type,
column3 name data type,
PRIMARY KEY (column1)
)

Example
Given below is an example to create a table in Cassandra using cqlsh. Here we are

 Using the keyspace tutorialspoint
 Creating a table named emp
It will have details such as employee name, id, city, salary, and phone number.
Employee id is the primary key.
cqlsh> USE tutorialspoint;
cqlsh:tutorialspoint>; CREATE TABLE emp(
emp_id int PRIMARY KEY,
emp_name text,
emp_city text,
emp_sal varint,
emp_phone varint
);

Verification
The select statement will give you the schema. Verify the table using the select
statement as shown below.
cqlsh:tutorialspoint> select * from emp;
emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+----------+----------+-----------+---------

(0 rows)
Here you can observe the table created with the given columns. Since we have
deleted the keyspace tutorialspoint, you will not find it in the keyspaces list.

Creating a Table using Java API


You can create a table using the execute() method of Session class. Follow the
steps given below to create a table using Java API.

Step1: Create a Cluster Object


First of all, create an instance of the Cluster.builder class
of com.datastax.driver.core package as shown below.
//Creating Cluster.Builder object
Cluster.Builder builder1 = Cluster.builder();
Add a contact point (IP address of the node) using the addContactPoint() method
of Cluster.Builder object. This method returns Cluster.Builder.
//Adding contact point to the Cluster.Builder object
Cluster.Builder builder2 = build.addContactPoint( "127.0.0.1" );
Using the new builder object, create a cluster object. To do so, you have a method
called build() in the Cluster.Builder class. The following code shows how to create
a cluster object.
//Building a cluster
Cluster cluster = builder.build();
You can build a cluster object using a single line of code as shown below.
Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

Step 2: Create a Session Object


Create an instance of Session object using the connect() method of Cluster class
as shown below.
Session session = cluster.connect( );
This method creates a new session and initializes it. If you already have a
keyspace, you can set it to the existing one by passing the keyspace name in string
format to this method as shown below.
Session session = cluster.connect(“ Your keyspace name ” );
Here we are using the keyspace named tp. Therefore, create the session object as
shown below.
Session session = cluster.connect(“ tp” );
Step 3: Execute Query
You can execute CQL queries using the execute() method of Session class. Pass
the query either in string format or as a Statement class object to the execute()
method. Whatever you pass to this method in string format will be executed on the
cqlsh.
In the following example, we are creating a table named emp. You have to store the
query in a string variable and pass it to the execute() method as shown below.
//Query
String query = "CREATE TABLE emp(emp_id int PRIMARY KEY, "
+ "emp_name text, "
+ "emp_city text, "
+ "emp_sal varint, "
+ "emp_phone varint );";
session.execute(query);
Given below is the complete program to create and use a keyspace in Cassandra
using Java API.
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Session;

public class Create_Table {

public static void main(String args[]){

//Query
String query = "CREATE TABLE emp(emp_id int PRIMARY KEY, "
+ "emp_name text, "
+ "emp_city text, "
+ "emp_sal varint, "
+ "emp_phone varint );";

//Creating Cluster object


Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

//Creating Session object


Session session = cluster.connect("tp");

//Executing the query


session.execute(query);

System.out.println("Table created");
}
}

Save the above program with the class name followed by .java, browse to the
location where it is saved. Compile and execute the program as shown below.
$javac Create_Table.java
$java Create_Table
Under normal conditions, it should produce the following output −
Table created

Altering a Table
You can alter a table using the command ALTER TABLE. Given below is the
syntax for creating a table.

Syntax
ALTER (TABLE | COLUMNFAMILY) <tablename> <instruction>
Using ALTER command, you can perform the following operations −
 Add a column
 Drop a column

Adding a Column
Using ALTER command, you can add a column to a table. While adding columns,
you have to take care that the column name is not conflicting with the existing
column names and that the table is not defined with compact storage option. Given
below is the syntax to add a column to a table.
ALTER TABLE table name
ADD new column datatype;
Example
Given below is an example to add a column to an existing table. Here we are
adding a column called emp_email of text datatype to the table named emp.
cqlsh:tutorialspoint> ALTER TABLE emp
... ADD emp_email text;
Verification
Use the SELECT statement to verify whether the column is added or not. Here you
can observe the newly added column emp_email.
cqlsh:tutorialspoint> select * from emp;

emp_id | emp_city | emp_email | emp_name | emp_phone | emp_sal


--------+----------+-----------+----------+-----------+---------

Dropping a Column
Using ALTER command, you can delete a column from a table. Before dropping a
column from a table, check that the table is not defined with compact storage
option. Given below is the syntax to delete a column from a table using ALTER
command.
ALTER table name
DROP column name;
Example
Given below is an example to drop a column from a table. Here we are deleting the
column named emp_email.
cqlsh:tutorialspoint> ALTER TABLE emp DROP emp_email;
Verification
Verify whether the column is deleted using the select statement, as shown below.
cqlsh:tutorialspoint> select * from emp;

emp_id | emp_city | emp_name | emp_phone | emp_sal


--------+----------+----------+-----------+---------
(0 rows)
Since emp_email column has been deleted, you cannot find it anymore.

Altering a Table using Java API


You can create a table using the execute() method of Session class. Follow the
steps given below to alter a table using Java API.

Step1: Create a Cluster Object


First of all, create an instance of Cluster.builder class
of com.datastax.driver.core package as shown below.
//Creating Cluster.Builder object
Cluster.Builder builder1 = Cluster.builder();
Add a contact point (IP address of the node) using the addContactPoint() method
of Cluster.Builder object. This method returns Cluster.Builder.
//Adding contact point to the Cluster.Builder object
Cluster.Builder builder2 = build.addContactPoint( "127.0.0.1" );
Using the new builder object, create a cluster object. To do so, you have a method
called build() in the Cluster.Builder class. The following code shows how to create
a cluster object.
//Building a cluster
Cluster cluster = builder.build();
You can build a cluster object using a single line of code as shown below.
Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

Step 2: Create a Session Object


Create an instance of Session object using the connect() method of Cluster class as
shown below.
Session session = cluster.connect( );
This method creates a new session and initializes it. If you already have a
keyspace, you can set it to the existing one by passing the KeySpace name in string
format to this method as shown below.
Session session = cluster.connect(“ Your keyspace name ” );
Session session = cluster.connect(“ tp” );
Here we are using the KeySpace named tp. Therefore, create the session object as
shown below.

Step 3: Execute Query


You can execute CQL queries using the execute() method of Session class. Pass
the query either in string format or as a Statement class object to the execute()
method. Whatever you pass to this method in string format will be executed on
the cqlsh.
In the following example, we are adding a column to a table named emp. To do so,
you have to store the query in a string variable and pass it to the execute() method
as shown below.
//Query
String query1 = "ALTER TABLE emp ADD emp_email text";
session.execute(query);
Given below is the complete program to add a column to an existing table.
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Session;

public class Add_column {

public static void main(String args[]){

//Query
String query = "ALTER TABLE emp ADD emp_email text";

//Creating Cluster object


Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

//Creating Session object


Session session = cluster.connect("tp");

//Executing the query


session.execute(query);

System.out.println("Column added");
}
}

Dropping a Table
You can drop a table using the command Drop Table. Its syntax is as follows −

Syntax
DROP TABLE <tablename>
Example
The following code drops an existing table from a KeySpace.
cqlsh:tutorialspoint> DROP TABLE emp;

Verification
Use the Describe command to verify whether the table is deleted or not. Since the
emp table has been deleted, you will not find it in the column families list.
cqlsh:tutorialspoint> DESCRIBE COLUMNFAMILIES;
employee

Deleting a Table using Java API


You can delete a table using the execute() method of Session class. Follow the
steps given below to delete a table using Java API.

Step1: Create a Cluster Object


First of all, create an instance of Cluster.builder class
of com.datastax.driver.core package as shown below −
//Creating Cluster.Builder object
Cluster.Builder builder1 = Cluster.builder();
Add a contact point (IP address of the node) using addContactPoint() method
of Cluster.Builder object. This method returns Cluster.Builder.
//Adding contact point to the Cluster.Builder object
Cluster.Builder builder2 = build.addContactPoint( "127.0.0.1" );
Using the new builder object, create a cluster object. To do so, you have a method
called build() in the Cluster.Builder class. The following code shows how to create
a cluster object.
//Building a cluster
Cluster cluster = builder.build();
You can build a cluster object using a single line of code as shown below.
Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

Step 2: Create a Session Object


Create an instance of Session object using the connect() method of Cluster class as
shown below.
Session session = cluster.connect( );
This method creates a new session and initializes it. If you already have a
keyspace, you can set it to the existing one by passing the KeySpace name in string
format to this method as shown below.
Session session = cluster.connect(“Your keyspace name”);
Here we are using the keyspace named tp. Therefore, create the session object as
shown below.
Session session = cluster.connect(“tp”);

Step 3: Execute Query


You can execute CQL queries using execute() method of Session class. Pass the
query either in string format or as a Statement class object to the execute() method.
Whatever you pass to this method in string format will be executed on the cqlsh.
In the following example, we are deleting a table named emp. You have to store the
query in a string variable and pass it to the execute() method as shown below.
// Query

String query = "DROP TABLE emp1;”;


session.execute(query);
Given below is the complete program to drop a table in Cassandra using Java API.
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Session;

public class Drop_Table {

public static void main(String args[]){

//Query
String query = "DROP TABLE emp1;";
Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

//Creating Session object


Session session = cluster.connect("tp");

//Executing the query


session.execute(query);

System.out.println("Table dropped");
}
}

Save the above program with the class name followed by .java, browse to the
location where it is saved. Compile and execute the program as shown below.
$javac Drop_Table.java
$java Drop_Table
Under normal conditions, it should produce the following output −
Table dropped
CRUD OPERATIONS
Creating Data in a Table
You can insert data into the columns of a row in a table using the
command INSERT. Given below is the syntax for creating data in a table.
INSERT INTO <tablename>
(<column1 name>, <column2 name>....)
VALUES (<value1>, <value2>....)
USING <option>

Example
Let us assume there is a table called emp with columns (emp_id, emp_name,
emp_city, emp_phone, emp_sal) and you have to insert the following data into
the emp table.

emp_id emp_name emp_city emp_phone emp_sal

1 ram Hyderabad 9848022338 50000

2 robin Hyderabad 9848022339 40000

3 rahman Chennai 9848022330 45000

Use the commands given below to fill the table with required data.
cqlsh:tutorialspoint> INSERT INTO emp (emp_id, emp_name,
emp_city,
emp_phone, emp_sal) VALUES(1,'ram', 'Hyderabad', 9848022338,
50000);

cqlsh:tutorialspoint> INSERT INTO emp (emp_id, emp_name,


emp_city,
emp_phone, emp_sal) VALUES(2,'robin', 'Hyderabad', 9848022339,
40000);

cqlsh:tutorialspoint> INSERT INTO emp (emp_id, emp_name,


emp_city,
emp_phone, emp_sal) VALUES(3,'rahman', 'Chennai', 9848022330,
45000);

Verification
After inserting data, use SELECT statement to verify whether the data has been
inserted or not. If you verify the emp table using SELECT statement, it will give you
the following output.
cqlsh:tutorialspoint> SELECT * FROM emp;
emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | Hyderabad | robin | 9848022339 | 40000
3 | Chennai | rahman | 9848022330 | 45000

(3 rows)
Here you can observe the table has populated with the data we inserted.

Creating Data using Java API


You can create data in a table using the execute() method of Session class. Follow
the steps given below to create data in a table using java API.

Step1: Create a Cluster Object


Create an instance of Cluster.builder class of com.datastax.driver.core package
as shown below.
//Creating Cluster.Builder object
Cluster.Builder builder1 = Cluster.builder();
Add a contact point (IP address of the node) using the addContactPoint() method
of Cluster.Builder object. This method returns Cluster.Builder.
//Adding contact point to the Cluster.Builder object
Cluster.Builder builder2 = build.addContactPoint("127.0.0.1");
Using the new builder object, create a cluster object. To do so, you have a method
called build() in the Cluster.Builder class. The following code shows how to create
a cluster object.
//Building a cluster
Cluster cluster = builder.build();
You can build a cluster object using a single line of code as shown below.
Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

Step 2: Create a Session Object


Create an instance of Session object using the connect() method of Cluster class as
shown below.
Session session = cluster.connect( );
This method creates a new session and initializes it. If you already have a
keyspace, then you can set it to the existing one by passing the KeySpace name in
string format to this method as shown below.
Session session = cluster.connect(“ Your keyspace name ” );
Here we are using the KeySpace called tp. Therefore, create the session object as
shown below.
Session session = cluster.connect(“ tp” );
Step 3: Execute Query
You can execute CQL queries using the execute() method of Session class. Pass
the query either in string format or as a Statement class object to the execute()
method. Whatever you pass to this method in string format will be executed on
the cqlsh.
In the following example, we are inserting data in a table called emp. You have to
store the query in a string variable and pass it to the execute() method as shown
below.
String query1 = “INSERT INTO emp (emp_id, emp_name, emp_city,
emp_phone, emp_sal)
VALUES(1,'ram', 'Hyderabad', 9848022338, 50000);” ;

String query2 = “INSERT INTO emp (emp_id, emp_name, emp_city,


emp_phone, emp_sal)
VALUES(2,'robin', 'Hyderabad', 9848022339, 40000);” ;

String query3 = “INSERT INTO emp (emp_id, emp_name, emp_city,


emp_phone, emp_sal)
VALUES(3,'rahman', 'Chennai', 9848022330, 45000);” ;

session.execute(query1);
session.execute(query2);
session.execute(query3);
Given below is the complete program to insert data into a table in Cassandra using
Java API.
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Session;

public class Create_Data {

public static void main(String args[]){

//queries
String query1 = "INSERT INTO emp (emp_id, emp_name,
emp_city, emp_phone, emp_sal)"

+ " VALUES(1,'ram', 'Hyderabad', 9848022338, 50000);" ;

String query2 = "INSERT INTO emp (emp_id, emp_name,


emp_city,
emp_phone, emp_sal)"

+ " VALUES(2,'robin', 'Hyderabad', 9848022339, 40000);"


;

String query3 = "INSERT INTO emp (emp_id, emp_name,


emp_city, emp_phone, emp_sal)"

+ " VALUES(3,'rahman', 'Chennai', 9848022330, 45000);" ;


//Creating Cluster object
Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

//Creating Session object


Session session = cluster.connect("tp");

//Executing the query


session.execute(query1);

session.execute(query2);

session.execute(query3);

System.out.println("Data created");
}
}

Save the above program with the class name followed by .java, browse to the
location where it is saved. Compile and execute the program as shown below.
$javac Create_Data.java
$java Create_Data
Under normal conditions, it should produce the following output −
Data created
UPDATE
Updating Data in a Table
UPDATE is the command used to update data in a table. The following keywords
are used while updating data in a table −
 Where − This clause is used to select the row to be updated.
 Set − Set the value using this keyword.
 Must − Includes all the columns composing the primary key.
While updating rows, if a given row is unavailable, then UPDATE creates a fresh
row. Given below is the syntax of UPDATE command −
UPDATE <tablename>
SET <column name> = <new value>
<column name> = <value>....
WHERE <condition>

Example
Assume there is a table named emp. This table stores the details of employees of a
certain company, and it has the following details −

emp_id emp_name emp_city emp_phone emp_sal

1 ram Hyderabad 9848022338 50000


2 robin Hyderabad 9848022339 40000

3 rahman Chennai 9848022330 45000

Let us now update emp_city of robin to Delhi, and his salary to 50000. Given below
is the query to perform the required updates.
cqlsh:tutorialspoint> UPDATE emp SET
emp_city='Delhi',emp_sal=50000
WHERE emp_id=2;

Verification
Use SELECT statement to verify whether the data has been updated or not. If you
verify the emp table using SELECT statement, it will produce the following output.
cqlsh:tutorialspoint> select * from emp;

emp_id | emp_city | emp_name | emp_phone | emp_sal


--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | Delhi | robin | 9848022339 | 50000
3 | Chennai | rahman | 9848022330 | 45000

(3 rows)
Here you can observe the table data has got updated.

Updating Data using Java API


You can update data in a table using the execute() method of Session class. Follow
the steps given below to update data in a table using Java API.

Step1: Create a Cluster Object


Create an instance of Cluster.builder class of com.datastax.driver.core package
as shown below.
//Creating Cluster.Builder object
Cluster.Builder builder1 = Cluster.builder();
Add a contact point (IP address of the node) using the addContactPoint() method
of Cluster.Builder object. This method returns Cluster.Builder.
//Adding contact point to the Cluster.Builder object
Cluster.Builder builder2 = build.addContactPoint("127.0.0.1");
Using the new builder object, create a cluster object. To do so, you have a method
called build() in the Cluster.Builder class. Use the following code to create the
cluster object.
//Building a cluster
Cluster cluster = builder.build();
You can build the cluster object using a single line of code as shown below.
Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

Step 2: Create a Session Object


Create an instance of Session object using the connect() method of Cluster class as
shown below.
Session session = cluster.connect( );
This method creates a new session and initializes it. If you already have a
keyspace, then you can set it to the existing one by passing the KeySpace name in
string format to this method as shown below.
Session session = cluster.connect(“ Your keyspace name”);
Here we are using the KeySpace named tp. Therefore, create the session object as
shown below.
Session session = cluster.connect(“tp”);

Step 3: Execute Query


You can execute CQL queries using the execute() method of Session class. Pass
the query either in string format or as a Statement class object to the execute()
method. Whatever you pass to this method in string format will be executed on
the cqlsh.
In the following example, we are updating the emp table. You have to store the
query in a string variable and pass it to the execute() method as shown below:
String query = “ UPDATE emp SET emp_city='Delhi',emp_sal=50000
WHERE emp_id = 2;” ;
Given below is the complete program to update data in a table using Java API.
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Session;

public class Update_Data {

public static void main(String args[]){

//query
String query = " UPDATE emp SET
emp_city='Delhi',emp_sal=50000"

//Creating Cluster object


Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

//Creating Session object


Session session = cluster.connect("tp");
//Executing the query
session.execute(query);

System.out.println("Data updated");
}
}

Save the above program with the class name followed by .java, browse to the
location where it is saved. Compile and execute the program as shown below.
$javac Update_Data.java
$java Update_Data
Under normal conditions, it should produce the following output −
Data updated
READ DATA
Reading Data using Select Clause
SELECT clause is used to read data from a table in Cassandra. Using this clause,
you can read a whole table, a single column, or a particular cell. Given below is the
syntax of SELECT clause.
SELECT FROM <tablename>

Example
Assume there is a table in the keyspace named emp with the following details −

emp_id emp_name emp_city emp_phone emp_sal

1 ram Hyderabad 9848022338 50000

2 robin null 9848022339 50000

3 rahman Chennai 9848022330 50000

4 rajeev Pune 9848022331 30000

The following example shows how to read a whole table using SELECT clause.
Here we are reading a table called emp.
cqlsh:tutorialspoint> select * from emp;

emp_id | emp_city | emp_name | emp_phone | emp_sal


--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | null | robin | 9848022339 | 50000
3 | Chennai | rahman | 9848022330 | 50000
4 | Pune | rajeev | 9848022331 | 30000

(4 rows)

Reading Required Columns


The following example shows how to read a particular column in a table.
cqlsh:tutorialspoint> SELECT emp_name, emp_sal from emp;

emp_name | emp_sal
----------+---------
ram | 50000
robin | 50000
rajeev | 30000
rahman | 50000

(4 rows)

Where Clause
Using WHERE clause, you can put a constraint on the required columns. Its syntax
is as follows −
SELECT FROM <table name> WHERE <condition>;
Note − A WHERE clause can be used only on the columns that are a part of
primary key or have a secondary index on them.
In the following example, we are reading the details of an employee whose salary is
50000. First of all, set secondary index to the column emp_sal.
cqlsh:tutorialspoint> CREATE INDEX ON emp(emp_sal);
cqlsh:tutorialspoint> SELECT * FROM emp WHERE emp_sal=50000;

emp_id | emp_city | emp_name | emp_phone | emp_sal


--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | null | robin | 9848022339 | 50000
3 | Chennai | rahman | 9848022330 | 50000

Reading Data using Java API


You can read data from a table using the execute() method of Session class. Follow
the steps given below to execute multiple statements using batch statement with the
help of Java API.

Step1:Create a Cluster Object


Create an instance of Cluster.builder class of com.datastax.driver.core package
as shown below.
//Creating Cluster.Builder object
Cluster.Builder builder1 = Cluster.builder();
Add a contact point (IP address of the node) using the addContactPoint() method
of Cluster.Builder object. This method returns Cluster.Builder.
//Adding contact point to the Cluster.Builder object
Cluster.Builder builder2 = build.addContactPoint( "127.0.0.1" );
Using the new builder object, create a cluster object. To do so, you have a method
called build() in the Cluster.Builder class. Use the following code to create the
cluster object.
//Building a cluster
Cluster cluster = builder.build();
You can build the cluster object using a single line of code as shown below.
Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

Step 2: Create a Session Object


Create an instance of Session object using the connect() method of Cluster class as
shown below.
Session session = cluster.connect( );
This method creates a new session and initializes it. If you already have a
keyspace, then you can set it to the existing one by passing the KeySpace name in
string format to this method as shown below.
Session session = cluster.connect(“Your keyspace name”);
Here we are using the KeySpace called tp. Therefore, create the session object as
shown below.
Session session = cluster.connect(“tp”);

Step 3: Execute Query


You can execute CQL queries using execute() method of Session class. Pass the
query either in string format or as a Statement class object to the execute() method.
Whatever you pass to this method in string format will be executed on the cqlsh.
In this example, we are retrieving the data from emp table. Store the query in a
string and pass it to the execute() method of session class as shown below.
String query = ”SELECT 8 FROM emp”;
session.execute(query);
Execute the query using the execute() method of Session class.

Step 4: Get the ResultSet Object


The select queries will return the result in the form of a ResultSet object, therefore
store the result in the object of RESULTSET class as shown below.
ResultSet result = session.execute( );
Given below is the complete program to read data from a table.
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.ResultSet;
import com.datastax.driver.core.Session;

public class Read_Data {

public static void main(String args[])throws Exception{

//queries
String query = "SELECT * FROM emp";

//Creating Cluster object


Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

//Creating Session object


Session session = cluster.connect("tutorialspoint");

//Getting the ResultSet


ResultSet result = session.execute(query);

System.out.println(result.all());
}
}

Save the above program with the class name followed by .java, browse to the
location where it is saved. Compile and execute the program as shown below.
$javac Read_Data.java
$java Read_Data
Under normal conditions, it should produce the following output −
[Row[1, Hyderabad, ram, 9848022338, 50000], Row[2, Delhi, robin,
9848022339, 50000], Row[4, Pune, rajeev, 9848022331, 30000],
Row[3,
Chennai, rahman, 9848022330, 50000]]

DELETE DATA
Deleting Datafrom a Table
You can delete data from a table using the command DELETE. Its syntax is as
follows −
DELETE FROM <identifier> WHERE <condition>;
Example
Let us assume there is a table in Cassandra called emp having the following data −

emp_id emp_name emp_city emp_phone emp_sal

1 ram Hyderabad 9848022338 50000

2 robin Hyderabad 9848022339 40000

3 rahman Chennai 9848022330 45000

The following statement deletes the emp_sal column of last row −


cqlsh:tutorialspoint> DELETE emp_sal FROM emp WHERE emp_id=3;

Verification
Use SELECT statement to verify whether the data has been deleted or not. If you
verify the emp table using SELECT, it will produce the following output −
cqlsh:tutorialspoint> select * from emp;

emp_id | emp_city | emp_name | emp_phone | emp_sal


--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | Delhi | robin | 9848022339 | 50000
3 | Chennai | rahman | 9848022330 | null
(3 rows)
Since we have deleted the salary of Rahman, you will observe a null value in place
of salary.

Deleting an Entire Row


The following command deletes an entire row from a table.
cqlsh:tutorialspoint> DELETE FROM emp WHERE emp_id=3;

Verification
Use SELECT statement to verify whether the data has been deleted or not. If you
verify the emp table using SELECT, it will produce the following output −
cqlsh:tutorialspoint> select * from emp;

emp_id | emp_city | emp_name | emp_phone | emp_sal


--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | Delhi | robin | 9848022339 | 50000
(2 rows)
Since we have deleted the last row, there are only two rows left in the table.

Deleting Data using Java API


You can delete data in a table using the execute() method of Session class. Follow
the steps given below to delete data from a table using java API.

Step1: Create a Cluster Object


Create an instance of Cluster.builder class of com.datastax.driver.core package
as shown below.
//Creating Cluster.Builder object
Cluster.Builder builder1 = Cluster.builder();
Add a contact point (IP address of the node) using the addContactPoint() method
of Cluster.Builder object. This method returns Cluster.Builder.
//Adding contact point to the Cluster.Builder object
Cluster.Builder builder2 = build.addContactPoint( "127.0.0.1" );
Using the new builder object, create a cluster object. To do so, you have a method
called build() in the Cluster.Builder class. Use the following code to create a
cluster object.
//Building a cluster
Cluster cluster = builder.build();
You can build the cluster object using a single line of code as shown below.
Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

Step 2: Create a Session Object


Create an instance of Session object using the connect() method of Cluster class as
shown below.
Session session = cluster.connect();
This method creates a new session and initializes it. If you already have a
keyspace, then you can set it to the existing one by passing the KeySpace name in
string format to this method as shown below.
Session session = cluster.connect(“ Your keyspace name ”);
Here we are using the KeySpace called tp. Therefore, create the session object as
shown below.
Session session = cluster.connect(“tp”);

Step 3: Execute Query


You can execute CQL queries using the execute() method of Session class. Pass
the query either in string format or as a Statement class object to the execute()
method. Whatever you pass to this method in string format will be executed on
the cqlsh.
In the following example, we are deleting data from a table named emp. You have
to store the query in a string variable and pass it to the execute() method as shown
below.
String query1 = ”DELETE FROM emp WHERE emp_id=3; ”;
session.execute(query);
Given below is the complete program to delete data from a table in Cassandra
using Java API.
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Session;

public class Delete_Data {

public static void main(String args[]){

//query
String query = "DELETE FROM emp WHERE emp_id=3;";

//Creating Cluster object


Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();

//Creating Session object


Session session = cluster.connect("tp");

//Executing the query


session.execute(query);

System.out.println("Data deleted");
}
}

Save the above program with the class name followed by .java, browse to the
location where it is saved. Compile and execute the program as shown below.
$javac Delete_Data.java
$java Delete_Data
Under normal conditions, it should produce the following output −
Data deleted
Cassandra CQL Types
Query languages are computer languages used to make queries in databases and
information systems. Cassandra Query Language is the primary query language for
communicating with the Apache Cassandra database.

CQL provides a rich set of built-in data types, including collection types. Along with
these data types, users can also create their own custom data types. The following
table provides a list of built-in data types available in CQL.
Data Type Constants Description

ascii strings Represents ASCII character string

bigint bigint Represents 64-bit signed long

blob blobs Represents arbitrary bytes

Boolean booleans Represents true or false

counter integers Represents counter column

decimal integers, floats Represents variable-precision decimal

double integers Represents 64-bit IEEE-754 floating point

float integers, floats Represents 32-bit IEEE-754 floating point

inet strings Represents an IP address, IPv4 or IPv6

int integers Represents 32-bit signed int

text strings Represents UTF8 encoded string

timestamp integers, strings Represents a timestamp

timeuuid uuids Represents type 1 UUID

uuid uuids Represents type 1 or type 4

UUID
varchar strings Represents uTF8 encoded string

varint integers Represents arbitrary-precision integer

Collection Types
Cassandra Query Language also provides a collection data types. The following
table provides a list of Collections available in CQL.

Collection Description

list A list is a collection of one or more ordered elements.

map A map is a collection of key-value pairs.

set A set is a collection of one or more elements.

User-defined datatypes
Cqlsh provides users a facility of creating their own data types. Given below are the
commands used while dealing with user defined datatypes.
 CREATE TYPE − Creates a user-defined datatype.
 ALTER TYPE − Modifies a user-defined datatype.
 DROP TYPE − Drops a user-defined datatype.
 DESCRIBE TYPE − Describes a user-defined datatype.
 DESCRIBE TYPES − Describes user-defined datatypes.

CQL COLLECTIONS

CQL provides the facility of using Collection data types. Using these Collection
types, you can store multiple values in a single variable. This chapter explains how
to use Collections in Cassandra.

List
List is used in the cases where
 the order of the elements is to be maintained, and
 a value is to be stored multiple times.
You can get the values of a list data type using the index of the elements in the list.

Creating a Table with List


Given below is an example to create a sample table with two columns, name and
email. To store multiple emails, we are using list.
cqlsh:tutorialspoint> CREATE TABLE data(name text PRIMARY KEY,
email list<text>);

Inserting Data into a List


While inserting data into the elements in a list, enter all the values separated by
comma within square braces [ ] as shown below.
cqlsh:tutorialspoint> INSERT INTO data(name, email) VALUES
('ramu',
['abc@gmail.com','cba@yahoo.com'])

Updating a List
Given below is an example to update the list data type in a table called data. Here
we are adding another email to the list.
cqlsh:tutorialspoint> UPDATE data
... SET email = email +['xyz@tutorialspoint.com']
... where name = 'ramu';

Verification
If you verify the table using SELECT statement, you will get the following result −
cqlsh:tutorialspoint> SELECT * FROM data;

name | email
------
+--------------------------------------------------------------
ramu | ['abc@gmail.com', 'cba@yahoo.com',
'xyz@tutorialspoint.com']

(1 rows)
HIVE
 Hive is a lightweight, NoSQL database, easy to
implement and also having high benchmark on the
devices and written in the pure dart. ...
 Hive having the concept of boxes(which store data).
 A Box has to be opened before use.
 Hive allows users to read, write, and manage
petabytes of data using SQL.
 Hive is built on top of Apache Hadoop, which is an
open-source framework used to efficiently store and
process large datasets.
 As a result, Hive is closely integrated with Hadoop, and
is designed to work quickly on petabytes of data.

DATA TYPES
This chapter takes you through the different data types in Hive, which are involved
in the table creation. All the data types in Hive are classified into four types, given
as follows:

 Column Types
 Literals
 Null Values
 Complex Types

Column Types
Column type are used as column data types of Hive. They are as follows:

Integral Types
Integer type data can be specified using integral data types, INT. When the data
range exceeds the range of INT, you need to use BIGINT and if the data range is
smaller than the INT, you use SMALLINT. TINYINT is smaller than SMALLINT.
The following table depicts various INT data types:

Type Postfix Example

TINYINT Y 10Y

SMALLINT S 10S

INT - 10

BIGINT L 10L
String Types
String type data types can be specified using single quotes (' ') or double quotes ("
"). It contains two data types: VARCHAR and CHAR. Hive follows C-types escape
characters.
The following table depicts various CHAR data types:

Data Type Length

VARCHAR 1 to 65355

CHAR 255

Timestamp
It supports traditional UNIX timestamp with optional nanosecond precision. It
supports java.sql.Timestamp format “YYYY-MM-DD HH:MM:SS.fffffffff” and format
“yyyy-mm-dd hh:mm:ss.ffffffffff”.

Dates
DATE values are described in year/month/day format in the form {{YYYY-MM-DD}}.

Decimals
The DECIMAL type in Hive is as same as Big Decimal format of Java. It is used for
representing immutable arbitrary precision. The syntax and example is as follows:
DECIMAL(precision, scale)
decimal(10,0)

Union Types
Union is a collection of heterogeneous data types. You can create an instance
using create union. The syntax and example is as follows:
UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>

{0:1}
{1:2.0}
{2:["three","four"]}
{3:{"a":5,"b":"five"}}
{2:["six","seven"]}
{3:{"a":8,"b":"eight"}}
{0:9}
{1:10.0}
Literals
The following literals are used in Hive:

Floating Point Types


Floating point types are nothing but numbers with decimal points. Generally, this
type of data is composed of DOUBLE data type.

Decimal Type
Decimal type data is nothing but floating point value with higher range than DOUBLE data
type. The range of decimal type is approximately -10-308 to 10308.

Null Value
Missing values are represented by the special value NULL.

Complex Types
The Hive complex data types are as follows:

Arrays
Arrays in Hive are used the same way they are used in Java.
Syntax: ARRAY<data_type>

Maps
Maps in Hive are similar to Java Maps.
Syntax: MAP<primitive_type, data_type>

Structs
Structs in Hive is similar to using complex data with comment.
Syntax: STRUCT<col_name : data_type [COMMENT col_comment], ...>
Previous Page Print Page

DATABASE OPERATIONS

CREATE DATABASE

Hive is a database technology that can define databases and tables to analyze
structured data. The theme for structured data analysis is to store the data in a
tabular manner, and pass queries to analyze it. This chapter explains how to create
Hive database. Hive contains a default database named default.

Create Database Statement


Create Database is a statement used to create a database in Hive. A database in
Hive is a namespace or a collection of tables. The syntax for this statement is as
follows:
CREATE DATABASE|SCHEMA [IF NOT EXISTS] <database name>

Here, IF NOT EXISTS is an optional clause, which notifies the user that a database
with the same name already exists. We can use SCHEMA in place of DATABASE
in this command. The following query is executed to create a database
named userdb:
hive> CREATE DATABASE [IF NOT EXISTS] userdb;

or
hive> CREATE SCHEMA userdb;

The following query is used to verify a databases list:


hive> SHOW DATABASES;
default
userdb

JDBC Program
The JDBC program to create a database is given below.
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveCreateDb {


private static String driverName =
"org.apache.hadoop.hive.jdbc.HiveDriver";

public static void main(String[] args) throws SQLException {


// Register driver and create driver instance

Class.forName(driverName);
// get connection

Connection con =
DriverManager.getConnection("jdbc:hive://localhost:10000/default"
, "", "");
Statement stmt = con.createStatement();

stmt.executeQuery("CREATE DATABASE userdb");


System.out.println(“Database userdb created
successfully.”);

con.close();
}
}

Save the program in a file named HiveCreateDb.java. The following commands are
used to compile and execute this program.
$ javac HiveCreateDb.java
$ java HiveCreateDb

Output:
Database userdb created successfully.

HIVE - DROP DATABASE


This chapter describes how to drop a database in Hive. The usage of SCHEMA and
DATABASE are same.

Drop Database Statement


Drop Database is a statement that drops all the tables and deletes the database. Its
syntax is as follows:
DROP DATABASE StatementDROP (DATABASE|SCHEMA) [IF EXISTS]
database_name
[RESTRICT|CASCADE];

The following queries are used to drop a database. Let us assume that the
database name is userdb.
hive> DROP DATABASE IF EXISTS userdb;

The following query drops the database using CASCADE. It means dropping
respective tables before dropping the database.
hive> DROP DATABASE IF EXISTS userdb CASCADE;

The following query drops the database using SCHEMA.


hive> DROP SCHEMA userdb;

This clause was added in Hive 0.6.

JDBC Program
The JDBC program to drop a database is given below.
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveDropDb {


private static String driverName =
"org.apache.hadoop.hive.jdbc.HiveDriver";

public static void main(String[] args) throws SQLException {

// Register driver and create driver instance


Class.forName(driverName);

// get connection
Connection con =
DriverManager.getConnection("jdbc:hive://localhost:10000/default"
, "", "");
Statement stmt = con.createStatement();
stmt.executeQuery("DROP DATABASE userdb");

System.out.println(“Drop userdb database successful.”);

con.close();
}
}

Save the program in a file named HiveDropDb.java. Given below are the commands
to compile and execute this program.
$ javac HiveDropDb.java
$ java HiveDropDb

Output:
Drop userdb database successful.

PARTITIONING
Hive organizes tables into partitions. It is a way of dividing a table into related parts
based on the values of partitioned columns such as date, city, and department.
Using partition, it is easy to query a portion of the data.
Tables or partitions are sub-divided into buckets, to provide extra structure to the
data that may be used for more efficient querying. Bucketing works based on the
value of hash function of some column of a table.
For example, a table named Tab1 contains employee data such as id, name, dept,
and yoj (i.e., year of joining). Suppose you need to retrieve the details of all
employees who joined in 2012. A query searches the whole table for the required
information. However, if you partition the employee data with the year and store it in
a separate file, it reduces the query processing time. The following example shows
how to partition a file and its data:
The following file contains employeedata table.
/tab1/employeedata/file1
id, name, dept, yoj
1, gopal, TP, 2012
2, kiran, HR, 2012
3, kaleel,SC, 2013
4, Prasanth, SC, 2013
The above data is partitioned into two files using year.
/tab1/employeedata/2012/file2
1, gopal, TP, 2012
2, kiran, HR, 2012
/tab1/employeedata/2013/file3
3, kaleel,SC, 2013
4, Prasanth, SC, 2013

Adding a Partition
We can add partitions to a table by altering the table. Let us assume we have a
table called employee with fields such as Id, Name, Salary, Designation, Dept, and
yoj.

Syntax:
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION
partition_spec
[LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...;

partition_spec:
: (p_column = p_col_value, p_column = p_col_value, ...)

The following query is used to add a partition to the employee table.


hive> ALTER TABLE employee
> ADD PARTITION (year=’2012’)
> location '/2012/part2012';

Renaming a Partition
The syntax of this command is as follows.
ALTER TABLE table_name PARTITION partition_spec RENAME TO
PARTITION partition_spec;

The following query is used to rename a partition:


hive> ALTER TABLE employee PARTITION (year=’1203’)
> RENAME TO PARTITION (Yoj=’1203’);

Dropping a Partition
The following syntax is used to drop a partition:
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec,
PARTITION partition_spec,...;

The following query is used to drop a partition:


hive> ALTER TABLE employee DROP [IF EXISTS]
> PARTITION (year=’1203’);

HIVEQL

The Hive Query Language (HiveQL) is a query language for Hive to process and
analyze structured data in a Metastore. ... It filters the data using the condition and
gives you a finite result. The built-in operators and functions generate an expression,
which fulfils the condition.

HiveQL - Select-Where

The Hive Query Language (HiveQL) is a query language for Hive to process and
analyze structured data in a Metastore. This chapter explains how to use the
SELECT statement with WHERE clause.
SELECT statement is used to retrieve the data from a table. WHERE clause works
similar to a condition. It filters the data using the condition and gives you a finite
result. The built-in operators and functions generate an expression, which fulfils the
condition.

Syntax
Given below is the syntax of the SELECT query:
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY
col_list]]
[LIMIT number];

Example
Let us take an example for SELECT…WHERE clause. Assume we have the
employee table as given below, with fields named Id, Name, Salary, Designation,
and Dept. Generate a query to retrieve the employee details who earn a salary of
more than Rs 30000.
+------+--------------+-------------+-------------------+--------
+
| ID | Name | Salary | Designation | Dept
|
+------+--------------+-------------+-------------------+--------
+
|1201 | Gopal | 45000 | Technical manager | TP
|
|1202 | Manisha | 45000 | Proofreader | PR
|
|1203 | Masthanvali | 40000 | Technical writer | TP
|
|1204 | Krian | 40000 | Hr Admin | HR
|
|1205 | Kranthi | 30000 | Op Admin | Admin
|
+------+--------------+-------------+-------------------+--------
+
The following query retrieves the employee details using the above scenario:
hive> SELECT * FROM employee WHERE salary>30000;

On successful execution of the query, you get to see the following response:
+------+--------------+-------------+-------------------+--------
+
| ID | Name | Salary | Designation | Dept
|
+------+--------------+-------------+-------------------+--------
+
|1201 | Gopal | 45000 | Technical manager | TP
|
|1202 | Manisha | 45000 | Proofreader | PR
|
|1203 | Masthanvali | 40000 | Technical writer | TP
|
|1204 | Krian | 40000 | Hr Admin | HR
|
+------+--------------+-------------+-------------------+--------
+

JDBC Program
The JDBC program to apply where clause for the given example is as follows.
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveQLWhere {


private static String driverName =
"org.apache.hadoop.hive.jdbc.HiveDriver";

public static void main(String[] args) throws SQLException {


// Register driver and create driver instance
Class.forName(driverName);

// get connection
Connection con =
DriverManager.getConnection("jdbc:hive://localhost:10000/userdb",
"", "");

// create statement
Statement stmt = con.createStatement();

// execute statement
Resultset res = stmt.executeQuery("SELECT * FROM employee
WHERE salary>30000;");

System.out.println("Result:");
System.out.println(" ID \t Name \t Salary \t Designation \t
Dept ");

while (res.next()) {
System.out.println(res.getInt(1) + " " +
res.getString(2) + " " + res.getDouble(3) + " " +
res.getString(4) + " " + res.getString(5));
}
con.close();
}
}

Save the program in a file named HiveQLWhere.java. Use the following commands
to compile and execute this program.
$ javac HiveQLWhere.java
$ java HiveQLWhere

Output:
ID Name Salary Designation Dept
1201 Gopal 45000 Technical manager TP
1202 Manisha 45000 Proofreader PR
1203 Masthanvali 40000 Technical writer TP
1204 Krian 40000 Hr Admin HR

HiveQL - Select-Order By

This chapter explains how to use the ORDER BY clause in a SELECT statement.
The ORDER BY clause is used to retrieve the details based on one column and sort
the result set by ascending or descending order.
Syntax
Given below is the syntax of the ORDER BY clause:
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[ORDER BY col_list]]
[LIMIT number];

Example
Let us take an example for SELECT...ORDER BY clause. Assume employee table
as given below, with the fields named Id, Name, Salary, Designation, and Dept.
Generate a query to retrieve the employee details in order by using Department
name.
+------+--------------+-------------+-------------------+--------
+
| ID | Name | Salary | Designation | Dept
|
+------+--------------+-------------+-------------------+--------
+
|1201 | Gopal | 45000 | Technical manager | TP
|
|1202 | Manisha | 45000 | Proofreader | PR
|
|1203 | Masthanvali | 40000 | Technical writer | TP
|
|1204 | Krian | 40000 | Hr Admin | HR
|
|1205 | Kranthi | 30000 | Op Admin | Admin
|
+------+--------------+-------------+-------------------+--------
+
The following query retrieves the employee details using the above scenario:
hive> SELECT Id, Name, Dept FROM employee ORDER BY DEPT;

On successful execution of the query, you get to see the following response:
+------+--------------+-------------+-------------------+--------
+
| ID | Name | Salary | Designation | Dept
|
+------+--------------+-------------+-------------------+--------
+
|1205 | Kranthi | 30000 | Op Admin | Admin
|
|1204 | Krian | 40000 | Hr Admin | HR
|
|1202 | Manisha | 45000 | Proofreader | PR
|
|1201 | Gopal | 45000 | Technical manager | TP
|
|1203 | Masthanvali | 40000 | Technical writer | TP
|
+------+--------------+-------------+-------------------+--------
+

JDBC Program
Here is the JDBC program to apply Order By clause for the given example.
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveQLOrderBy {


private static String driverName =
"org.apache.hadoop.hive.jdbc.HiveDriver";

public static void main(String[] args) throws SQLException {

// Register driver and create driver instance


Class.forName(driverName);

// get connection
Connection con =
DriverManager.getConnection("jdbc:hive://localhost:10000/userdb",
"", "");

// create statement
Statement stmt = con.createStatement();

// execute statement
Resultset res = stmt.executeQuery("SELECT * FROM employee
ORDER BY DEPT;");
System.out.println(" ID \t Name \t Salary \t Designation \t
Dept ");

while (res.next()) {
System.out.println(res.getInt(1) + " " +
res.getString(2) + " " + res.getDouble(3) + " " +
res.getString(4) + " " + res.getString(5));
}

con.close();
}
}

Save the program in a file named HiveQLOrderBy.java. Use the following


commands to compile and execute this program.
$ javac HiveQLOrderBy.java
$ java HiveQLOrderBy

Output:
ID Name Salary Designation Dept
1205 Kranthi 30000 Op Admin Admin
1204 Krian 40000 Hr Admin HR
1202 Manisha 45000 Proofreader PR
1201 Gopal 45000 Technical manager TP
1203 Masthanvali 40000 Technical writer TP
1204 Krian 40000 Hr Admin HR

HiveQL - Select-Group By

This chapter explains the details of GROUP BY clause in a SELECT statement. The
GROUP BY clause is used to group all the records in a result set using a particular
collection column. It is used to query a group of records.

Syntax
The syntax of GROUP BY clause is as follows:
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[ORDER BY col_list]]
[LIMIT number];

Example
Let us take an example of SELECT…GROUP BY clause. Assume employee table
as given below, with Id, Name, Salary, Designation, and Dept fields. Generate a
query to retrieve the number of employees in each department.
+------+--------------+-------------+-------------------+--------
+
| ID | Name | Salary | Designation | Dept
|
+------+--------------+-------------+-------------------+--------
+
|1201 | Gopal | 45000 | Technical manager | TP
|
|1202 | Manisha | 45000 | Proofreader | PR
|
|1203 | Masthanvali | 40000 | Technical writer | TP
|
|1204 | Krian | 45000 | Proofreader | PR
|
|1205 | Kranthi | 30000 | Op Admin | Admin
|
+------+--------------+-------------+-------------------+--------
+
The following query retrieves the employee details using the above scenario.
hive> SELECT Dept,count(*) FROM employee GROUP BY DEPT;

On successful execution of the query, you get to see the following response:
+------+--------------+
| Dept | Count(*) |
+------+--------------+
|Admin | 1 |
|PR | 2 |
|TP | 3 |
+------+--------------+

JDBC Program
Given below is the JDBC program to apply the Group By clause for the given
example.
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveQLGroupBy {


private static String driverName =
"org.apache.hadoop.hive.jdbc.HiveDriver";

public static void main(String[] args) throws SQLException {

// Register driver and create driver instance


Class.forName(driverName);

// get connection
Connection con = DriverManager.
getConnection("jdbc:hive://localhost:10000/userdb", "",
"");

// create statement
Statement stmt = con.createStatement();

// execute statement
Resultset res = stmt.executeQuery(“SELECT Dept,count(*) ” +
“FROM employee GROUP BY DEPT; ”);
System.out.println(" Dept \t count(*)");
while (res.next()) {
System.out.println(res.getString(1) + " " +
res.getInt(2));
}
con.close();
}
}

Save the program in a file named HiveQLGroupBy.java. Use the following


commands to compile and execute this program.
$ javac HiveQLGroupBy.java
$ java HiveQLGroupBy

Output:
Dept Count(*)
Admin 1
PR 2
TP 3

HiveQL - Select-Joins

JOIN is a clause that is used for combining specific fields from two tables by using
values common to each one. It is used to combine records from two or more tables
in the database.

Syntax
join_table:

table_reference JOIN table_factor [join_condition]


| table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN
table_reference
join_condition
| table_reference LEFT SEMI JOIN table_reference
join_condition
| table_reference CROSS JOIN table_reference [join_condition]

Example
We will use the following two tables in this chapter. Consider the following table
named CUSTOMERS..
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Consider another table ORDERS as follows:
+-----+---------------------+-------------+--------+
|OID | DATE | CUSTOMER_ID | AMOUNT |
+-----+---------------------+-------------+--------+
| 102 | 2009-10-08 00:00:00 | 3 | 3000 |
| 100 | 2009-10-08 00:00:00 | 3 | 1500 |
| 101 | 2009-11-20 00:00:00 | 2 | 1560 |
| 103 | 2008-05-20 00:00:00 | 4 | 2060 |
+-----+---------------------+-------------+--------+
There are different types of joins given as follows:

 JOIN
 LEFT OUTER JOIN
 RIGHT OUTER JOIN
 FULL OUTER JOIN

JOIN
JOIN clause is used to combine and retrieve the records from multiple tables. JOIN
is same as OUTER JOIN in SQL. A JOIN condition is to be raised using the primary
keys and foreign keys of the tables.
The following query executes JOIN on the CUSTOMER and ORDER tables, and
retrieves the records:
hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT
FROM CUSTOMERS c JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);
On successful execution of the query, you get to see the following response:
+----+----------+-----+--------+
| ID | NAME | AGE | AMOUNT |
+----+----------+-----+--------+
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+----+----------+-----+--------+

LEFT OUTER JOIN


The HiveQL LEFT OUTER JOIN returns all the rows from the left table, even if there
are no matches in the right table. This means, if the ON clause matches 0 (zero)
records in the right table, the JOIN still returns a row in the result, but with NULL in
each column from the right table.
A LEFT JOIN returns all the values from the left table, plus the matched values from
the right table, or NULL in case of no matching JOIN predicate.
The following query demonstrates LEFT OUTER JOIN between CUSTOMER and
ORDER tables:
hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
LEFT OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);
On successful execution of the query, you get to see the following response:
+----+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+----+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
+----+----------+--------+---------------------+

RIGHT OUTER JOIN


The HiveQL RIGHT OUTER JOIN returns all the rows from the right table, even if
there are no matches in the left table. If the ON clause matches 0 (zero) records in
the left table, the JOIN still returns a row in the result, but with NULL in each column
from the left table.
A RIGHT JOIN returns all the values from the right table, plus the matched values
from the left table, or NULL in case of no matching join predicate.
The following query demonstrates RIGHT OUTER JOIN between the CUSTOMER
and ORDER tables.
notranslate"> hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c
RIGHT OUTER JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);

On successful execution of the query, you get to see the following response:
+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
+------+----------+--------+---------------------+
FULL OUTER JOIN
The HiveQL FULL OUTER JOIN combines the records of both the left and the right
outer tables that fulfil the JOIN condition. The joined table contains either all the
records from both the tables, or fills in NULL values for missing matches on either
side.
The following query demonstrates FULL OUTER JOIN between CUSTOMER and
ORDER tables:
hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
FULL OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);
On successful execution of the query, you get to see the following response:
+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
+------+----------+--------+---------------------+

ORIENTDB GRAPH DATABASE

What is orientdb graph database?

OrientDB is an open source NoSQL database management system written in


Java. It is a Multi-model database, supporting graph, document, key/value, and
object models, but the relationships are managed as in graph databases with direct
connections between records
In graph databases, the database system graphs data into network-like structures
consisting of vertices and edges. In the OrientDB Graph model, the database
represents data through the concept of a property graph, which defines a vertex as
an entity linked with other vertices and an edge, as an entity that links two
vertices.

OrientDB ships with a generic vertex persistent class, called V, as well as a class
for edges, called E. As an example, you can create a new vertex using
the INSERT command with V.
orientdb> INSERT INTO V SET name='Jay'

Created record with RID #9:0


In effect, the Graph model database works on top of the underlying document
model. But, in order to simplify this process, OrientDB introduces a new set of
commands for managing graphs from the console. Instead of INSERT, use CREATE
VERTEX
orientdb> CREATE VERTEX V SET name='Jay'

Created vertex with RID #9:1

Social Network for Restaurant Patrons

While you have the option of working with vertexes and edges in your database as
they are, you can also extend the standard V and E classes to suit the particular
needs of your application. The advantages of this approach are,

 It grants better understanding about the meaning of these entities.


 It allows for optional constraints at the class level.
 It improves performance through better partitioning of entities.
 It allows for object-oriented inheritance among the graph elements.

For example, consider a social network based on restaurants. You need to start
with a class for individual customers and another for the restaurants they
patronize. Create these classes to extend the V class.
orientdb> CREATE CLASS Person EXTENDS V

orientdb> CREATE CLASS Restaurant EXTENDS V


Doing this creates the schema for your social network. Now that the schema is
ready, populate the graph with data.
orientdb> CREATE VERTEX Person SET name='Luca'
Created record with RID #11:0

orientdb> CREATE VERTEX Person SET name='Bill'

Created record with RID #11:1

orientdb> CREATE VERTEX Person SET name='Jay'

Created record with RID #11:2

orientdb> CREATE VERTEX Restaurant SET name='Dante', type='Pizza'

Created record with RID #12:0

orientdb> CREATE VERTEX Restaurant SET name='Charlie', type='French'

Created record with RID #12:1


This adds three vertices to the Person class, representing individual users in the
social network. It also adds two vertices to the Restaurant class, representing the
restaurants that they patronize.

Creating Edges

For the moment, these vertices are independent of one another, tied together only
by the classes to which they belong. That is, they are not yet connected by edges.
Before you can make these connections, you first need to create a class that
extends E.
orientdb> CREATE CLASS Eat EXTENDS E
This creates the class Eat, which extends the class E. Eat represents the
relationship between the vertex Person and the vertex Restaurant.
When you create the edge from this class, note that the orientation of the vertices
is important, because it gives the relationship its meaning. For instance, creating
an edge in the opposite direction, (from Restaurant to Person), would call for a
separate class, such as Attendee.
The user Luca eats at the pizza joint Dante. Create an edge that represents this
connection:
orientdb> CREATE EDGE Eat FROM ( SELECT FROM Person WHERE name='Luca' )
TO ( SELECT FROM Restaurant WHERE name='Dante' )

Creating Edges from Record ID

In the event that you know the Record ID of the vertices, you can connect them
directly with a shorter and faster command. For example, the person Bill also eats
at the restaurant Dante and the person Jay eats at the restaurant Charlie. Create
edges in the class Eat to represent these connections.
orientdb> CREATE EDGE Eat FROM #11:1 TO #12:0

orientdb> CREATE EDGE Eat FROM #11:2 TO #12:1

Querying Graphs

In the above example you created and populated a small graph of a social network
of individual users and the restaurants at which they eat. You can now begin to
experiment with queries on a graph database.

To cross edges, you can use special graph functions, such as:

 OUT() To retrieve the adjacent outgoing vertices


 IN() To retrieve the adjacent incoming vertices
 BOTH() To retrieve the adjacent incoming and outgoing vertices

For example, to know all of the people who eat in the restaurant Dante, which has
a Record ID of #12:0, you can access the record for that restaurant and traverse
the incoming edges to discover which entries in the Person class connect to it.
orientdb> SELECT IN() FROM Restaurant WHERE name='Dante'

-------+----------------+
@RID | in |
-------+----------------+
#-2:1 | [#11:0, #11:1] |
-------+----------------+
This query displays the record ID's from the Person class that connect to the
restaurant Dante. In cases such as this, you can use the EXPAND() special function
to transform the vertex collection in the result-set by expanding it.
orientdb> SELECT EXPAND( IN() ) FROM Restaurant WHERE name='Dante'

-------+-------------+-------------+---------+
@RID | @CLASS | Name | out_Eat |
-------+-------------+-------------+---------+
#11:0 | Person | Luca | #12:0 |
#11:1 | Person | Bill | #12:0 |
-------+-------------+-------------+---------+

Creating Edge to Connect Users

Your application at this point shows connections between individual users and the
restaurants they patronize. While this is interesting, it does not yet function as a
social network. To do so, you need to establish edges that connect the users to
one another.

To begin, as before, create a new class that extends E:


orientdb> CREATE CLASS Friend EXTENDS E
The users Luca and Jay are friends. They have Record ID's of #11:0 and #11:2.
Create an edge that connects them.
orientdb> CREATE EDGE Friend FROM #11:0 TO #11:2
In the Friend relationship, orientation is not important. That is, if Luca is a friend of
Jay's then Jay is a friend of Luca's. Therefore, you should use the BOTH() function.
orientdb> SELECT EXPAND( BOTH( 'Friend' ) ) FROM Person WHERE name =
'Luca'

-------+-------------+-------------+---------+-----------+
@RID | @CLASS | Name | out_Eat | in_Friend |
-------+-------------+-------------+---------+-----------+
#11:2 | Person | Jay | #12:1 | #11:0 |
-------+-------------+-------------+---------+-----------+
Here, the BOTH() function takes the edge class Friend as an argument, crossing
only relationships of the Friend kind, (that is, it skips the Eat class, at this time).
Note in the result-set that the relationship with Luca, with a Record ID of #11:0 in
the in_ field.
You can also now view all the restaurants patronized by friends of Luca.
orientdb> SELECT EXPAND( BOTH('Friend').out('Eat') ) FROM Person
WHERE name='Luca'

-------+-------------+-------------+-------------+--------+
@RID | @CLASS | Name | Type | in_Eat |
-------+-------------+-------------+-------------+--------+
#12:1 | Restaurant | Charlie | French | #11:2 |
-------+-------------+-------------+-------------+--------+

Lightweight Edges
In version 1.4.x, OrientDB begins to manage some edges as Lightweight Edges.
Lightweight Edges do not have Record ID's, but are physically stored as links
within vertices. Note that OrientDB only uses a Lightweight Edge only when the
edge has no properties, otherwise it uses the standard Edge.

From the logic point of view, Lightweight Edges are Edges in all effects, so that all
graph functions work with them. This is to improve performance and reduce disk
space.

Because Lightweight Edges don't exist as separate records in the database, some
queries won't work as expected. For instance,
orientdb> SELECT FROM E
For most cases, an edge is used connecting vertices, so this query would not
cause any problems in particular. But, it would not return Lightweight Edges in the
result-set. In the event that you need to query edges directly, including those with
no properties, disable the Lightweight Edge feature.

To disable the Lightweight Edge feature, execute the following command.


orientdb> ALTER DATABASE CUSTOM useLightweightEdges=FALSE
You only need to execute this command once. OrientDB now generates new
edges as the standard Edge, rather than the Lightweight Edge. Note that this does
not affect existing edges.

– OrientDB Features

 Many database modules integrated into one db.


 Good user profile based security system.
 SQL engine developed from scratch to improve performance.
 Storage caching better for reducing latency.
 Remote connection with increased transaction isolation.

You might also like