Introduction To Cassandra

Chapter 7
Introduction to Cassandra
Learning Objectives and Learning Outcomes
Learning Objectives Learning Outcomes

Introduction to Cassandra
1. To study the features of a) To comprehend the reasons

Cassandra. behind the popularity of
NoSQL database.
2. To learn how to perform
CRUD operations. b) To be able to perform CRUD
operations.
3. To learn about collections in
Cassandra. c) To distinguish between
collections types such as
4. To import from and export SET, LIST and MAP.
to CSV format.
d) To be able to successfully
import from CSV.
e) To be able to successfully
export to CSV.
Session Plan
Lecture time 45 to 60 minutes
Q/A 15 minutes
Agenda
 Apache Cassandra – An Introduction

 Features of Cassandra
 Peer-to-Peer Network
 Writes in Cassandra
 Hinted Handoffs
 Tunable Consistency: Read Consistency and Write Consistency
 CQL Data Types
 CQLSH
 CRUD : Insert, Update, Delete and Select
 Collections : Set, List and Map
 Time To Live (TTL)
 Import and Export
Apache Cassandra – An Introduction
Apache Cassandra – An Introduction
 Apache Cassandra was born at Facebook. After Facebook open sourced the
code in 2008, Cassandra became an Apache Incubator project in 2009 and
subsequently became a top-level Apache project in 2010.
 It is a column-oriented database designed to support peer-to-peer symmetric

nodes instead of the masterslave architecture.
 It is built on Amazon’s dynamo and Google’s BigTable.

Features of Cassandra
Features of Cassandra
• Open Source
• Distributed
• Decentralized (Server Symmetry)
• No single point of failure
• Column-oriented
• Peer to Peer
• Elastic Scalability
Peer to Peer Network
Sample Cassandra Cluster
Writes in Cassandra
Writes in Cassandra
 A client that initiates a write request.
 It is first written to the commit log. A write is taken as successful only

if it is written to the commit log.
 The next step is to push the write to a memory resident data structure
called Memtable. A threshold value is defined in the Memtable.
 When the number of objects stored in the Memtable reaches a

threshold, the contents of Memtable are flushed to the disk in a file
called SSTable (Stored string Table). Flushing is a non-blocking
operation.
 It is possible to have multiple Memtables for a single column family.

One out of them is current and the rest are waiting to be flushed.
Hinted Handoffs
Hinted Handoffs
Coordinator
Node C is down.
Write a hint in your table
A
Writes Row K
System hints table

Client Replicates Row K
C
Tunable Consistency
Read Consistency
ONE Returns a response from the closest node (replica)

holding the data.
QUORUM Returns a result from a quorum of servers with the
most recent timestamp for the data.
LOCAL_QUORUM Returns a result from a quorum of servers with the
most recent timestamp for the data in the same data
center as the coordinator node.
EACH_QUORUM Returns a result from a quorum of servers with the
most recent timestamp in all data centers.
ALL This provides the highest level of consistency of all
levels and the lowest level of availability of all levels.
It responds to a read request from a client after all
the replica nodes have responded.
Write Consistency
ALL This is the highest level of consistency of all levels as it necessitates
that a write must be written to the commit log and Memtable on all
replica nodes in the cluster.
EACH_QUORUM A write must be written to the commit log and Memtable on a quorum
of replica nodes in all data centers.
QUORUM A write must be written to the commit log and Memtable on a quorum
of replica nodes.
LOCAL_QUORUM A write must be written to the commit log and Memtable on a quorum
of replica nodes in the same data center as the coordinator node. This
is to avoid latency of inter-data center communication.
ONE A write must be written to the commit log and Memtable of at least one
replica node.
TWO A write must be written to the commit log and Memtable of at least two
replica nodes.
THREE A write must be written to the commit log and Memtable of at least
three replica nodes.
LOCAL_ONE A write must be sent to, and successfully acknowledged by, at least one
replica node in the local data center.
CQL Data types
CQL Data types
Int 32 bit signed integer

Bigint 64 bit signed long
Double 64-bit IEEE-754 floating point
Float 32-bit IEEE-754 floating point
Boolean True or false
Blob Arbitrary bytes, expressed in hexadecimal
Counter Distributed counter value
Decimal Variable – precision integer
List A collection of one or more ordered elements
Map A JSON style array of elements
Set A collection of one or more elements
Timestamp Date plus time
Varchar UTF 8 encoded string
Varint Arbitrary-precision integers
Text UTF 8 encoded string
CQLSH
CRUD - Keyspace
To create a keyspace by the name “Students”
CREATE KEYSPACE Students WITH REPLICATION = {

'class':'SimpleStrategy',
'replication_factor':1
};
CRUD – Create Table
To create a column family or table by the name “student_info”.
CREATE TABLE Student_Info (

RollNo int PRIMARY KEY,
StudName text,
DateofJoining timestamp,
LastExamPercent double
);
CRUD - Insert
To insert data into the column family “student_info”.
BEGIN BATCH
INSERT INTO student_info
(RollNo,StudName,DateofJoining,LastExamPercent)
VALUES (1,'Michael Storm','2012-03-29', 69.6)
INSERT INTO student_info
(RollNo,StudName,DateofJoining,LastExamPercent)
VALUES (2,'Stephen Fox','2013-02-27', 72.5)
APPLY BATCH;
CRUD - Select
To view the data from the table “student_info”.
SELECT *
FROM student_info;
CRUD – Create Index
To create an index on the “studname” column of the “student_info”

column family use the following statement
CREATE INDEX ON student_info(studname);

CRUD – Update
To update the value held in the “StudName” column of the “student_info”

column family to “David Sheen” for the record where the RollNo column has
value = 2.
Note: An update updates one or more column values for a given row to the
Cassandra table. It does not return anything.
UPDATE student_info SET StudName = 'David Sheen' WHERE RollNo = 2;

CRUD – Delete
To delete the column “LastExamPercent” from the “student_info” table for

the record where the RollNo = 2.
Note: Delete statement removes one or more columns from one or more
rows of a Cassandra table or removes entire rows if no columns are specified.

DELETE LastExamPercent FROM student_info WHERE RollNo=2;
Collections
Collections
When to use collection?

Use collection when it is required to store or denormalize a small amount of
data.
What is the limit on the values of items in a collection?

The values of items in a collection are limited to 64K.
Where to use collections?

Collections can be used when you need to store the following:
1. Phone numbers of users.
2. Email ids of users.
Collections - Set
To alter the schema for the table “student_info” to add a column “hobbies”.
ALTER TABLE student_info ADD hobbies set<text>;

Collections - Set
To update the table “student_info” to provide the values for “hobbies” for the
student with Rollno =1.
UPDATE student_info
SET hobbies = hobbies + {'Chess, Table Tennis'}
WHERE RollNo=1;
Collections - List
To alter the schema of the table “student_info” to add a list column “language”.

ALTER TABLE student_info ADD language list<text>;
Collections - List
To update values in the list column, “language” of the table “student_info”.
UPDATE student_info
SET language = language + ['Hindi, English']
WHERE RollNo=1;
Collections - Map
To alter the “users” table to add a map column “todo”.

ALTER TABLE users
ADD todo map<timestamp, text>;
Collections - Map
To update the record for user (user_id = ‘AB’) in the “users” table.

UPDATE users
SET todo =
{ ‘2014-9-24’: ‘Cassandra Session’,
‘2014-10-2 12:00’ : ‘MongoDB Session’ }
WHERE user_id = ‘AB’;
Time To Live
Time To Live
Data in a column, other than a counter column, can have an optional expiration period called
TTL (time to live). The client request may specify a TTL value for the data. The TTL is
specified in seconds.
CREATE TABLE userlogin(

userid int primary key, password text
);
INSERT INTO userlogin (userid, password) VALUES (1,'infy') USING TTL 30;
SELECT TTL (password)

FROM userlogin
WHERE userid=1;
Export to CSV
Export data to a CSV file
Export the contents of the table/column family “elearninglists” present in

the “students” database to a CSV file (d:\elearninglists.csv).
COPY elearninglists (id, course_order, course_id, courseowner, title) TO

'd:\elearninglists.csv';
Import from CSV
Import data from a CSV file
To import data from “D:\elearninglists.csv” into the table “elearninglists”

present in the “students” database.
COPY elearninglists (id, course_order, course_id, courseowner, title)

FROM 'd:\elearninglists.csv';
Answer a few quick questions …
Answer Me
 What is Cassandra?
 Comment on Cassandra writes.
 What is your understanding of tunable consistency?
 What are collections in CQLSH? Where are they used?
Summary please…
Ask a few participants of the learning program to summarize the lecture.

References …
Further Readings
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/gettingS
tartedCassandraIntro.html
 http://www.datastax.com/documentation/cql/3.1/pdf/cql31.pdf
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dm
l_config_consistency_c.html
Thank you

Introduction To Cassandra

Uploaded by

Copyright:

Available Formats

Introduction To Cassandra

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Cassandra

Uploaded by

Copyright:

Available Formats

Chapter 7

Learning Objectives Learning Outcomes

1. To study the features of a) To comprehend the reasons

Lecture time 45 to 60 minutes

 Apache Cassandra – An Introduction

 It is a column-oriented database designed to support peer-to-peer symmetric

 It is built on Amazon’s dynamo and Google’s BigTable.

• Decentralized (Server Symmetry)

• No single point of failure

 A client that initiates a write request.

 It is first written to the commit log. A write is taken as successful only

 When the number of objects stored in the Memtable reaches a

 It is possible to have multiple Memtables for a single column family.

System hints table

ONE Returns a response from the closest node (replica)

Int 32 bit signed integer

To create a keyspace by the name “Students”

CREATE KEYSPACE Students WITH REPLICATION = {

To create a column family or table by the name “student_info”.

CREATE TABLE Student_Info (

To insert data into the column family “student_info”.

To view the data from the table “student_info”.

To create an index on the “studname” column of the “student_info”

CREATE INDEX ON student_info(studname);

To update the value held in the “StudName” column of the “student_info”

UPDATE student_info SET StudName = 'David Sheen' WHERE RollNo = 2;

To delete the column “LastExamPercent” from the “student_info” table for

When to use collection?

What is the limit on the values of items in a collection?

Where to use collections?

ALTER TABLE student_info ADD hobbies set<text>;

To update values in the list column, “language” of the table “student_info”.

To alter the “users” table to add a map column “todo”.

CREATE TABLE userlogin(

SELECT TTL (password)

Export the contents of the table/column family “elearninglists” present in

COPY elearninglists (id, course_order, course_id, courseowner, title) TO

To import data from “D:\elearninglists.csv” into the table “elearninglists”

COPY elearninglists (id, course_order, course_id, courseowner, title)

Ask a few participants of the learning program to summarize the lecture.

You might also like