Cassandra Notes
Cassandra Notes
Cassandra Notes
org/cassandra/ArticlesAndPresentations
http://docs.datastax.com/en/landing_page/doc/landing_page/current.html
A) Cassandra Architecture: -
Cassandra
- A distributed database.
- There is no master-slave concept and each node is equal.
- A cluster can easily be across more than one data center.
Snitch
- It is, How the nodes in a cluster know about the topology of the cluster.
- There is no master-slave concept and each node is equal.
- Type: Dynamic Snitching, SimpleSnitch, RackInferring Snitch, PropretyFileSnitch,
GossipingPropertyFileSnitch, EC2Snitch, EC2MultiRegionSnitch
Data Distribution
- It is done through consistent hashing, to strive for even distribution of data
across the nodes in cluster.
- Rather than all rows of a table existing on only one node, the rows are
distributed across the nodes in cluster, in an attempt to evenly spread out the
load of the table's data.
- To distribute the rows across the nodes, a partitioner is used. The partitioner
uses an algorithm to determine which node a given row of data will go to
- The default partitioner in cassandra is Murmur3
Murmur3: It takes the value in the first column of the row to generate a unique
number between -2^63 and 2^63.
Calculate the token ranges: -
<In below python formula it is calculated for 4 nodes, you can replace it with
actual number of nodes for your env.>
$ python -c 'print [str(((2**64 / 4) * i - 2**63) for in range(4)]'
['-9223372036854775808', '-4611686018427387904', 0, '461686018427387904']
OR
Use a Murmur3 calculator
- Each nodes in a cluster is assigned one token range. (OR multiple ranges with
virtual nodes)
e.g.: Each node is responsible for the token range between its endpoint and the
endpoint of the previous node.
Node wise endpoint is defined below.
NodeA: -100
NodeB: 0
NodeC: 51
NodeD: 100
-> NodeA can store value from value greater than 100 in +ve and value less than
-100 in -ve
-> NodeB can store value from -99 to 0
-> NodeC can store value from 1 to 51
-> NodeD can store value from 52 to 100
Replication Factor:
- It must be specified whenever a database is defined.
- It specifies how many instances of the data there will be within a given
database.
- Although 1 can be specified, it is common to specify 2,3, or more so that if a
node goes down, there is at least one other replica of the data, so that the data
is not lost with down node.
Virtual Nodes:
- They are alternative way to assign token ranges to nodes, and "Virtual Nodes" are
now the default in Cassandra.
- With Virtual Nodes, instead of a node being responsible for just one token range,
it is instead responsible for many small token range (by default, 256 of them)
- Virtual Nodes allow for assigning a high number of ranges to a powerful
computer(e.g. 512) and a lower number of ranges (e.g. 128) to a less powerful
computer
- Virtual Nodes (aka vnodes) were created to make it easier to add new nodes to a
cluster while keeping the cluster balanced
- When a new node is added, it receives many small token range slices from the
existing nodes, to maintain a balanced cluster
===================================================================================
=========================================================================
B) Installing and Configuring
Installation: -
- http://www.planetcassandra.org/cassandra/
- Where you unzip the folder Casssndra is installed in that directory.
Configuration: -
- Go inside conf directory to see configuration files.
(/Users/ashah/cassandra/dsc-cassandra-3.0.0/conf)
- cassandra.yaml is main configuration file.
File permission: -
<if you have modified cassandra.yaml as per below then create those directories
and give permission>
- sudo mkdir /var/lib/cassandra
- sudo mkdir /var/log/cassandra
- sudo chown -R $USER:$GROUP /var/lib/cassandra
- sudo chown -R $USER:$GROUP /var/log/cassandra
Starting/Stoping Cassandra: -
Way 1)
-> Start
<for now it is via root user>
- $pwd
o/p:/Users/ashah/cassandra/dsc-cassandra-3.0.0
- bin/cassandra
-> Stop
- ps aux | grep cass
- kill <pid>
Way 2)
- start: bin/cassandra -f
- stop: control or command + c
Checking Status: -
- bin/nodetool status
- bin/nodetool info [-h <host>]
- bin/nodetool ring
===================================================================================
=========================================================================
C) Communicating with Cassandra
CQLSH: -
- bin/cqlsh
- cqlsh> HELP
- cqlsh> help create_keyspace
- Semicolon (";") is optional for CQLSH command but mandatory for CQL command.
===================================================================================
========================================================================
D) Creating a database
Defining a keyspace: -
- A keyspace name is case sensitive only if you put it inside double quote
otherwise it will go in lower case.
e.g.: a) CREATE KEYSPACE "Test" :: This will be created as Test.
b) CREATE KEYSPACE Test :: This will be created as test.
- A keyspace can be defined through the create keyspace command.
->
CREATE KEYSPACE vehicle_traker WITH REPLICATION =
{'class':'NetworkTopologyStrategy', 'dc1':3, 'dc2':2};
<dc1 3 means data center 1 contains 3 replica of data and same way data center 2
contains 2 replica of data>
->
CREATE KEYSPACE vehicle_traker WITH REPLICATION = {'class':'SimpleStrategy',
'replication_factor':1}
Deleting a keyspace: -
- DROP KEYSPACE vehicle_tracker;
===================================================================================
=========================================================================
E) Creating a Table
Creating/dropping a Table: -
- CREATE TABLE activity
(home_id text, datetime timestamp, event text, code_used text PRIMARY
KEY(home_id, datatime)) WITH CLUSTERING ORDER BY (datetime DESC);
- DROP TABLE activity;
===================================================================================
=========================================================================
F) Inserting Data
- The COPY command can be used to export data (COPY TO) a .csv file.
===================================================================================
=========================================================================
G) Modelling Data
===================================================================================
=========================================================================
H) Creating an application
===================================================================================
=========================================================================