0% found this document useful (0 votes)

61 views

HBase - Tutorial

Uploaded by

ucebittrichy2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views

HBase - Tutorial

Uploaded by

ucebittrichy2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

HBase– Overview

Since 1970, RDBMS is the solution for data storage and maintenance related problems. After the
advent of big data, companies realized the benefit of processing big data and started opting for
solutions like Hadoop.

Hadoop uses distributed file system for storing big data, and MapReduce to process it. Hadoop
excels in storing and processing of huge data of various formats such as arbitrary, semi-, or even
unstructured.

Limitations of Hadoop

Hadoop can perform only batch processing, and data will be accessed only in a sequential
manner. That means one has to search the entire dataset even for the simplest of jobs.

A huge dataset when processed results in another huge data set, which should also be processed
sequentially. At this point, a new solution is needed to access any point of data in a single unit of
time (random access).

Hadoop Random Access Databases

Applications such as HBase, Cassandra, couchDB, Dynamo, and MongoDB are some of the
databases that store huge amounts of data and access the data in a random manner.

What is HBase?

HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an
open-source project and is horizontally scalable.

HBase is a data model that is similar to Google’s big table designed to provide quick random
access to huge amounts of structured data. It leverages the fault tolerance provided by the
Hadoop File System (HDFS).

It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in
the Hadoop File System.

One can store the data in HDFS either directly or through HBase. Data consumer reads/accesses
the data in HDFS randomly using HBase. HBase sits on top of the Hadoop File System and
provides read and write access.
HBase and HDFS
HDFS HBase

HDFS is a distributed file system suitable

HBase is a database built on top of the HDFS.
for storing large files.

HDFS does not support fast individual

HBase provides fast lookups for larger tables.
record lookups.

It provides high latency batch

It provides low latency access to single rows from billions of
processing; no concept of batch
records (Random access).
processing.

It provides only sequential access of HBase internally uses Hash tables and provides random access,
data. and it stores the data in indexed HDFS files for faster lookups.

Storage Mechanism in HBase

HBase is a column-oriented database and the tables in it are sorted by row. The table schema
defines only column families, which are the key value pairs. A table have multiple column
families and each column family can have any number of columns. Subsequent column values
are stored contiguously on the disk. Each cell value of the table has a timestamp. In short, in an
HBase:

 Table is a collection of rows.

 Row is a collection of column families.
 Column family is a collection of columns.
 Column is a collection of key value pairs.

Given below is an example schema of table in HBase.

Rowid Column Family Column Family Column Family Column Family

col1 col2 col3 col1 col2 col3 col1 col2 col3 col1 col2 col3

Column Oriented and Row Oriented

Column-oriented databases are those that store data tables as sections of columns of data, rather
than as rows of data. Shortly, they will have column families.

Row-Oriented Database Column-Oriented Database

It is suitable for Online Analytical Processing

It is suitable for Online Transaction Process (OLTP).
(OLAP).

Such databases are designed for small number of rows Column-oriented databases are designed for
and columns. huge tables.

The following image shows column families in a column-oriented database:

HBase and RDBMS
HBase RDBMS

HBase is schema-less, it doesn't have the concept of An RDBMS is governed by its schema, which
fixed columns schema; defines only column families. describes the whole structure of tables.

It is thin and built for small tables. Hard to

It is built for wide tables. HBase is horizontally scalable.
scale.

No transactions are there in HBase. RDBMS is transactional.

It has de-normalized data. It will have normalized data.

It is good for semi-structured as well as structured data. It is good for structured data.

Features of HBase

 HBase is linearly scalable.

 It has automatic failure support.
 It provides consistent read and writes.
 It integrates with Hadoop, both as a source and a destination.
 It has easy java API for client.
 It provides data replication across clusters.

Where to Use HBase

 Apache HBase is used to have random, real-time read/write access to Big Data.
 It hosts very large tables on top of clusters of commodity hardware.
 Apache HBase is a non-relational database modeled after Google's Bigtable. Bigtable acts
up on Google File System, likewise Apache HBase works on top of Hadoop and HDFS.

Applications of HBase

 It is used whenever there is a need to write heavy applications.

 HBase is used whenever we need to provide fast random access to available data.
 Companies such as Facebook, Twitter, Yahoo, and Adobe use HBase internally.

HBase History
Year Event

Nov 2006 Google released the paper on BigTable.

Feb 2007 Initial HBase prototype was created as a Hadoop contribution.

Oct 2007 The first usable HBase along with Hadoop 0.15.0 was released.
Jan 2008 HBase became the sub project of Hadoop.

Oct 2008 HBase 0.18.1 was released.

Jan 2009 HBase 0.19.0 was released.

Sept 2009 HBase 0.20.0 was released.

May 2010 HBase became Apache top-level project.

Architecture of HBase
HBase architecture has 3 main components: HMaster, Region Server, Zookeeper.

MasterServer

The master server -

 Assigns regions to the region servers and takes the help of Apache ZooKeeper for this
task.
 Handles load balancing of the regions across region servers. It unloads the busy servers
and shifts the regions to less occupied servers.
 Maintains the state of the cluster by negotiating the load balancing.
 Is responsible for schema changes and other metadata operations such as creation of
tables and column families.

Regions

Regions are nothing but tables that are split up and spread across the region servers.

Region server

The region servers have regions that -

 Communicate with the client and handle data-related operations.

 Handle read and write requests for all the regions under it.
 Decide the size of the region by following the region size thresholds.

When we take a deeper look into the region server, it contain regions and stores as shown below:

The store contains memory store and HFiles. Memstore is just like a cache memory. Anything
that is entered into the HBase is stored here initially. Later, the data is transferred and saved in
Hfiles as blocks and the memstore is flushed.
Zookeeper

 Zookeeper is an open-source project that provides services like maintaining configuration

information, naming, providing distributed synchronization, etc.
 Zookeeper has ephemeral nodes representing different region servers. Master servers use
these nodes to discover available servers.
 In addition to availability, the nodes are also used to track server failures or network
partitions.
 Clients communicate with region servers via zookeeper.
 In pseudo and standalone modes, HBase itself will take care of zookeeper.

HBase - General Commands

The general commands in HBase are status, version, table_help, and whoami. This chapter
explains these commands.

status

This command returns the status of the system including the details of the servers running on the
system. Its syntax is as follows:

hbase(main):009:0> status

If you execute this command, it returns the following output.

hbase(main):009:0> status
3 servers, 0 dead, 1.3333 average load

version

This command returns the version of HBase used in your system. Its syntax is as follows:

hbase(main):010:0> version

If you execute this command, it returns the following output.

hbase(main):009:0> version
0.98.8-hadoop2, r6cfc8d064754251365e070a10a82eb169956d5fe, Fri Nov 14
18:26:29 PST 2014
table_help

This command guides you what and how to use table-referenced commands. Given below is the
syntax to use this command.

hbase(main):02:0>table_help

When you use this command, it shows help topics for table-related commands. Given below is
the partial output of this command.

hbase(main):002:0>table_help
Help for table-reference commands.
You can either create a table via 'create' and then manipulate the table
via commands like 'put', 'get', etc.
See the standard help information for how to use each of these commands.
However, as of 0.96, you can also get a reference to a table, on which
you can invoke commands.
For instance, you can get create a table and keep around a reference to
it via:
hbase> t = create 't', 'cf'…...

whoami

This command returns the user details of HBase. If you execute this command, returns the
current HBase user as shown below.

hbase(main):008:0>whoami
hadoop (auth:SIMPLE)
groups: hadoop

Some of the Commands:

Creating a Table using HBase Shell

You can create a table using the createcommand, here you must specify the table name and the
Column Family name. The syntax to create a table in HBase shell is shown below.

create ‘<table name>’,’<column family>’

Example

Given below is a sample schema of a table named emp. It has two column families: “personal
data” and “professional data”.

Row key personal data professional data

You can create this table in HBase shell as shown below.

hbase(main):002:0> create 'emp', 'personal data', 'professional data'

And it will give you the following output.

0 row(s) in 1.1300 seconds

=>Hbase::Table - emp

Verification

You can verify whether the table is created using the list command as shown below. Here you
can observe the created emp table.

hbase(main):002:0> list
TABLE
emp
2 row(s) in 0.0340 seconds

Listing a Table using HBase Shell

list is the command that is used to list all the tables in HBase. Given below is the syntax of the
list command.

hbase(main):001:0 > list

When you type this command and execute in HBase prompt, it will display the list of all the
tables in HBase as shown below.

hbase(main):001:0> list
TABLE
emp

Dropping a Table using HBase Shell

Using the drop command, you can delete a table. Before dropping a table, you have to disable it.

hbase(main):018:0> disable 'emp'

0 row(s) in 1.4580 seconds

hbase(main):019:0> drop 'emp'

0 row(s) in 0.3060 seconds

Verify whether the table is deleted using the exists command.

hbase(main):020:07gt; exists 'emp'

Table emp does not exist
0 row(s) in 0.0730 seconds
drop_all

This command is used to drop the tables matching the “regex” given in the command. Its syntax
is as follows:

hbase>drop_all ‘t.*’

Note: Before dropping a table, you must disable it.

Example

Assume there are tables named raja, rajani, rajendra, rajesh, and raju.

hbase(main):017:0> list
TABLE
raja
rajani
rajendra
rajesh
raju
9 row(s) in 0.0270 seconds

All these tables start with the letters raj. First of all, let us disable all these tables using the
disable_all command as shown below.

hbase(main):002:0>disable_all 'raj.*'
raja
rajani
rajendra
rajesh
raju
Disable the above 5 tables (y/n)?
y
5 tables successfully disabled

Now you can delete all of them using the drop_all command as given below.

hbase(main):018:0>drop_all 'raj.*'
raja
rajani
rajendra
rajesh
raju
Drop the above 5 tables (y/n)?
y
5 tables successfully dropped
Inserting Data using HBase Shell

This chapter demonstrates how to create data in an HBase table. To create data in an HBase
table, the following commands and methods are used:

 put command,
 add() method of Put class, and
 put() method of HTable class.

As an example, we are going to create the following table in HBase.

Using put command, you can insert rows into a table. Its syntax is as follows:

put ’<table name>’,’row1’,’<colfamily:colname>’,’<value>’

Inserting the First Row

Let us insert the first row values into the emp table as shown below.

hbase(main):005:0> put 'emp','1','personal data:name','raju'

0 row(s) in 0.6600 seconds
hbase(main):006:0> put 'emp','1','personal data:city','hyderabad'
0 row(s) in 0.0410 seconds
hbase(main):007:0> put 'emp','1','professional
data:designation','manager'
0 row(s) in 0.0240 seconds
hbase(main):007:0> put 'emp','1','professional data:salary','50000'
0 row(s) in 0.0240 seconds
Insert the remaining rows using the put command in the same way. If you insert the whole table,
you will get the following output.

hbase(main):022:0> scan 'emp'

ROW COLUMN+CELL
1 column=personal data:city, timestamp=1417524216501, value=hyderabad

1 column=personal data:name, timestamp=1417524185058, value=ramu

1 column=professional data:designation, timestamp=1417524232601,

value=manager

1 column=professional data:salary, timestamp=1417524244109, value=50000

2 column=personal data:city, timestamp=1417524574905, value=chennai

2 column=personal data:name, timestamp=1417524556125, value=ravi

2 column=professional data:designation, timestamp=1417524592204,

value=sr:engg

2 column=professional data:salary, timestamp=1417524604221, value=30000

3 column=personal data:city, timestamp=1417524681780, value=delhi

3 column=personal data:name, timestamp=1417524672067, value=rajesh

3 column=professional data:designation, timestamp=1417524693187,

value=jr:engg
3 column=professional data:salary, timestamp=1417524702514,

value=25000

Updating Data using HBase Shell

You can update an existing cell value using the put command. To do so, just follow the same
syntax and mention your new value as shown below.

put ‘table name’,’row ’,'Column family:columnname',’new value’

The newly given value replaces the existing value, updating the row.

Example

Suppose there is a table in HBase called emp with the following data.

hbase(main):003:0> scan 'emp'

ROW COLUMN + CELL
row1 column = personal:name, timestamp = 1418051555, value = raju
row1 column = personal:city, timestamp = 1418275907, value = Hyderabad
row1 column = professional:designation, timestamp = 14180555,value = manager
row1 column = professional:salary, timestamp = 1418035791555,value = 50000
1 row(s) in 0.0100 seconds

The following command will update the city value of the employee named ‘Raju’ to Delhi.

hbase(main):002:0> put 'emp','row1','personal:city','Delhi'

0 row(s) in 0.0400 seconds

The updated table looks as follows where you can observe the city of Raju has been changed to
‘Delhi’.

hbase(main):003:0> scan 'emp'

ROW COLUMN + CELL
row1 column = personal:name, timestamp = 1418035791555, value = raju
row1 column = personal:city, timestamp = 1418274645907, value = Delhi
row1 column = professional:designation, timestamp = 141857555,value = manager
row1 column = professional:salary, timestamp = 1418039555, value = 50000
1 row(s) in 0.0100 seconds

Deleting a Specific Cell in a Table

Using the delete command, you can delete a specific cell in a table. The syntax of delete
command is as follows:

delete ‘<table name>’, ‘<row>’, ‘<column name >’, ‘<time stamp>’

Example

Here is an example to delete a specific cell. Here we are deleting the salary.

hbase(main):006:0> delete 'emp', '1', 'personal data:city',

1417521848375
0 row(s) in 0.0060 seconds

Deleting All Cells in a Table

Using the “deleteall” command, you can delete all the cells in a row. Given below is the syntax
of deleteall command.

deleteall ‘<table name>’, ‘<row>’,

Example

Here is an example of “deleteall” command, where we are deleting all the cells of row1 of emp
table.
hbase(main):007:0>deleteall 'emp','1'
0 row(s) in 0.0240 seconds

Verify the table using the scan command. A snapshot of the table after deleting the table is given
below.

hbase(main):022:0> scan 'emp'

ROW COLUMN + CELL

2 column = personal data:city, timestamp = 1417524574905, value = chennai

2 column = personal data:name, timestamp = 1417524556125, value = ravi

2 column = professional data:designation, timestamp = 1417524204, value =

sr:engg

2 column = professional data:salary, timestamp = 1417524604221, value = 30000

3 column = personal data:city, timestamp = 1417524681780, value = delhi

3 column = personal data:name, timestamp = 1417524672067, value = rajesh

3 column = professional data:designation, timestamp = 1417523187, value =

jr:engg

3 column = professional data:salary, timestamp = 1417524702514, value = 25000

The C# Player's Guide - 5th Edition - 5.0.0
83% (18)
The C# Player's Guide - 5th Edition - 5.0.0
497 pages
TDA Cyber Snapshot 2022
No ratings yet
TDA Cyber Snapshot 2022
20 pages
Sormec Crane Manuale U & M M460 - 3S - 4610
No ratings yet
Sormec Crane Manuale U & M M460 - 3S - 4610
61 pages
Industry Internship Report
50% (2)
Industry Internship Report
43 pages
HBASE
No ratings yet
HBASE
11 pages
Cse 17CS82 M2 S4 PPT
No ratings yet
Cse 17CS82 M2 S4 PPT
19 pages
HBase
No ratings yet
HBase
27 pages
10_HBase
No ratings yet
10_HBase
13 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
HBase
No ratings yet
HBase
6 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
30 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
34 pages
HBase
No ratings yet
HBase
31 pages
Unit 5 BDA
No ratings yet
Unit 5 BDA
34 pages
Unit 5 Hbase
No ratings yet
Unit 5 Hbase
15 pages
Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
Hadoop Week 6
No ratings yet
Hadoop Week 6
38 pages
Unit - 5 Part - 1
No ratings yet
Unit - 5 Part - 1
8 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
Big data UNIT 5 own
No ratings yet
Big data UNIT 5 own
18 pages
Lesson 6 NoSQL Databases HBase
100% (1)
Lesson 6 NoSQL Databases HBase
47 pages
HBASE
No ratings yet
HBASE
18 pages
BDT UNIT - V
No ratings yet
BDT UNIT - V
15 pages
UNIT5
No ratings yet
UNIT5
42 pages
Hbase
No ratings yet
Hbase
15 pages
Hadoop HBase Notes-Abhijit-Nagargoje
No ratings yet
Hadoop HBase Notes-Abhijit-Nagargoje
24 pages
Big Data 22MSM40206
No ratings yet
Big Data 22MSM40206
9 pages
BDA1
No ratings yet
BDA1
42 pages
Chapter 12 HBase[1]
No ratings yet
Chapter 12 HBase[1]
108 pages
Large-Scale Data Management: Hbase
No ratings yet
Large-Scale Data Management: Hbase
36 pages
HBASE
No ratings yet
HBASE
35 pages
UNIT 5 Notes
No ratings yet
UNIT 5 Notes
47 pages
Apache HBase PPT
No ratings yet
Apache HBase PPT
12 pages
Bda Unit 5
No ratings yet
Bda Unit 5
16 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
HBase (Unit 4)
No ratings yet
HBase (Unit 4)
37 pages
9 HBase
No ratings yet
9 HBase
77 pages
Hbase
No ratings yet
Hbase
3 pages
Hbase
No ratings yet
Hbase
23 pages
Columnar Database
No ratings yet
Columnar Database
18 pages
Hbase What Is Hbase?
No ratings yet
Hbase What Is Hbase?
2 pages
Unit 5 Hbase - Hive - Pig
No ratings yet
Unit 5 Hbase - Hive - Pig
93 pages
HBase
No ratings yet
HBase
30 pages
unit-5 notes
No ratings yet
unit-5 notes
61 pages
4 4HBase
No ratings yet
4 4HBase
17 pages
lec18
No ratings yet
lec18
18 pages
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
No ratings yet
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
6 pages
Assignment 10
No ratings yet
Assignment 10
9 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
18 pages
HBase Presentation
No ratings yet
HBase Presentation
23 pages
BDA Unit-4 Part-2 HBase,Hive,Pig
No ratings yet
BDA Unit-4 Part-2 HBase,Hive,Pig
74 pages
Hbase
100% (1)
Hbase
30 pages
Unit 5
No ratings yet
Unit 5
10 pages
Module 05 HBase - Distributed NoSQL Database
No ratings yet
Module 05 HBase - Distributed NoSQL Database
54 pages
Cs525: Special Topics in DBS: Large-Scale Data Management
No ratings yet
Cs525: Special Topics in DBS: Large-Scale Data Management
35 pages
Assignment Day 10: Task 1
No ratings yet
Assignment Day 10: Task 1
8 pages
lec18
No ratings yet
lec18
21 pages
bdcc-2.5
No ratings yet
bdcc-2.5
9 pages
Unit V
No ratings yet
Unit V
6 pages
HBase
No ratings yet
HBase
38 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
II CSE CS3352 FDS QB Unit4
100% (1)
II CSE CS3352 FDS QB Unit4
6 pages
II CSE CS3352 FDS QB Unit1
No ratings yet
II CSE CS3352 FDS QB Unit1
4 pages
II CSE CS3352 FDS QB Unit5
No ratings yet
II CSE CS3352 FDS QB Unit5
4 pages
II CSE CS3352 FDS QB Unit2
No ratings yet
II CSE CS3352 FDS QB Unit2
3 pages
Lattice-Based Cryptography
No ratings yet
Lattice-Based Cryptography
2 pages
Specification Forapprova: LCD Panel of E-Bicycle
No ratings yet
Specification Forapprova: LCD Panel of E-Bicycle
3 pages
Manual Neogeo Mv1fz
No ratings yet
Manual Neogeo Mv1fz
20 pages
Galaxy Tab The Missing Manual Covers Samsung TouchWiz Interface Missing Manuals 1st Edition Preston Gralla - The ebook with rich content is ready for you to download
100% (1)
Galaxy Tab The Missing Manual Covers Samsung TouchWiz Interface Missing Manuals 1st Edition Preston Gralla - The ebook with rich content is ready for you to download
57 pages
Quantum Internet: Networking Challenges in Distributed Quantum Computing
No ratings yet
Quantum Internet: Networking Challenges in Distributed Quantum Computing
9 pages
550-440 MSTP VAV Controller(2012)
No ratings yet
550-440 MSTP VAV Controller(2012)
5 pages
OJT Narrative Format 1
No ratings yet
OJT Narrative Format 1
7 pages
Leadership Skills of Mark Zuckerberg
No ratings yet
Leadership Skills of Mark Zuckerberg
8 pages
Ebooks File (Test Bank) Financial Reporting and Analysis 13th Edition All Chapters
100% (1)
Ebooks File (Test Bank) Financial Reporting and Analysis 13th Edition All Chapters
34 pages
PhonePe Statement Jan2025 Jan2025
No ratings yet
PhonePe Statement Jan2025 Jan2025
2 pages
CRYPTOGRAPHY LAB 1 PDF
No ratings yet
CRYPTOGRAPHY LAB 1 PDF
10 pages
Cognitive, Physical, Sensory, and Functional Affordance
No ratings yet
Cognitive, Physical, Sensory, and Functional Affordance
24 pages
Mabel Kidisil CV
No ratings yet
Mabel Kidisil CV
3 pages
Radiator Radius Installation Manual
No ratings yet
Radiator Radius Installation Manual
397 pages
Hacking and Virus
No ratings yet
Hacking and Virus
18 pages
Wind2Upgrade to Pro - Windscribe
No ratings yet
Wind2Upgrade to Pro - Windscribe
8 pages
Scrip Ega Chanel New Version 2 Isp LB PCC Game Isp Tertentu
No ratings yet
Scrip Ega Chanel New Version 2 Isp LB PCC Game Isp Tertentu
32 pages
Amazon.com Airpods Pro Case 3
No ratings yet
Amazon.com Airpods Pro Case 3
1 page
Fusion Project Costingcosting
No ratings yet
Fusion Project Costingcosting
216 pages
Web Design Companies in Bangalore
No ratings yet
Web Design Companies in Bangalore
6 pages
H P HP31/HP32: Ygro ALM
No ratings yet
H P HP31/HP32: Ygro ALM
3 pages
Synchronization PDF
No ratings yet
Synchronization PDF
7 pages
Looping / Iterative Statements: While Loop
No ratings yet
Looping / Iterative Statements: While Loop
7 pages
Emax +PRXX Protecction
No ratings yet
Emax +PRXX Protecction
94 pages
All Course Internet Links
No ratings yet
All Course Internet Links
3 pages
Online Safety, Security Safety and Etiquette PPT Lovely F
No ratings yet
Online Safety, Security Safety and Etiquette PPT Lovely F
39 pages