0% found this document useful (0 votes)
222 views4 pages

Sqoop To Hbase

Sqoop can be used to import data from a relational database into an HBase schema. This involves transforming the database schema and then using Sqoop to import each table into the corresponding HBase table and column families. The document walks through using Sqoop to import the customercontactinfo table from MySQL into a new HBase table with three column families, completing the import in three separate Sqoop jobs. An HBase scan shows the data was successfully imported from the relational table into the HBase key-value store. Sqoop provides a way to move large amounts of relational data into HBase for scalable storage and analysis.

Uploaded by

Ashish Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
222 views4 pages

Sqoop To Hbase

Sqoop can be used to import data from a relational database into an HBase schema. This involves transforming the database schema and then using Sqoop to import each table into the corresponding HBase table and column families. The document walks through using Sqoop to import the customercontactinfo table from MySQL into a new HBase table with three column families, completing the import in three separate Sqoop jobs. An HBase scan shows the data was successfully imported from the relational table into the HBase key-value store. Sqoop provides a way to move large amounts of relational data into HBase for scalable storage and analysis.

Uploaded by

Ashish Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Sqoop can be used to transform a relational database schema into an HBase schema.

Of
course, the main goal here is to demonstrate how Sqoop can import data from an RDBMS or
data warehouse directly into HBase, but its always better to see how a tool is used in context
versus how its used in the abstract.
The figure shows how the Service Order Database might look after being transformed into an
HBase schema.

For this particular import example, you want to import thecustomercontactinfo table directly into
an HBase table in preparation for building the HBase Service Order Database schema. To
complete the HBase schema, youd have to execute the same steps to import
the productinfo table, and then the serviceorders table could be built with a Java MapReduce
application.

Sqoop doesnt now permit you to import, all at once, a relational table directly into an HBase
table having multiple column families. To work around this limitation, you create the HBase table
first and then execute three Sqoop import operations to finish the task. The listing shows the
task of creating the table.

hbase(main):017:0> create 'customercontactinfo', 'CustomerName',


hbase(main):018:0*

'ContactInfo', 'ProductNums'

0 row(s) in 1.0680 seconds

In the following listing, for each Sqoop import command, note that the target HBase column
family specified by the column-family CLA and the corresponding MySQL columns specified by
the columns CLA are in bold. The customernumprimary key also becomes the HBase row key,
as specified by the hbase-row-keyCLA.
$ sqoop import
--connect jdbc:mysql://localhost/serviceorderdb
--username root -P
--table customercontactinfo
--columns "customernum,customername"
--hbase-table customercontactinfo
--column-family CustomerName
--hbase-row-key customernum -m 1
Enter password:
...
13/08/17 16:53:01 INFO mapreduce.ImportJobBase: Retrieved 5 records.
$ sqoop import
--connect jdbc:mysql://localhost/serviceorderdb
--username root -P
--table customercontactinfo
--columns "customernum,contactinfo"
--hbase-table customercontactinfo
--column-family ContactInfo
--hbase-row-key customernum -m 1
Enter password:
...
13/08/17 17:00:59 INFO mapreduce.ImportJobBase: Retrieved 5 records.

$ sqoop import
--connect jdbc:mysql://localhost/serviceorderdb
--username root -P
--table customercontactinfo
--columns "customernum,productnums"
--hbase-table customercontactinfo
--column-family ProductNums
--hbase-row-key customernum -m 1
Enter password:
...
13/08/17 17:05:54 INFO mapreduce.ImportJobBase: Retrieved 5 records.
If you were to carry out an HBase scan of your new table, youd see that the import and
translation from a relational database table on MySQL directly into HBase was a success.
The customercontactinfo table in this example is rather small, but imagine the power you now
have, using Sqoop and HBase, to quickly move relational tables that may be exceeding capacity
on your RDBMS or data warehouse into HBase, where capacity is virtually unlimited and
scalability is automatic.
hbase(main):033:0> scan 'customercontactinfo'
ROW

COLUMN+CELL

10000
column=ContactInfo:contactinfo, timestamp=1376773256317, value=1 Hadoop Lane, NY, 11111, John.Sm
ith@xyz.com
10000

column=CustomerName:customername, timestamp=1376772776684, value=John Timothy Smith

10000

column=ProductNums:productnums, timestamp=1376773551221, value=B500

10001

column=ContactInfo:contactinfo, timestamp=1376773256317, value=2 HBase Ave, CA, 22222

10001

column=CustomerName:customername, timestamp=1376772776684, value=Bill Jones

10001
0,D700

column=ProductNums:productnums, timestamp=1376773551221, value=A100,A200,A300,B400,B500,C500,C60

20000

column=ContactInfo:contactinfo, timestamp=1376773256317, value=1 Expert HBase Ave, CA, 22222

20000

column=CustomerName:customername, timestamp=1376772776684, value=Jane Ann Doe

20000

column=ProductNums:productnums, timestamp=1376773551221, value=A100,A200,A300

20001

column=ContactInfo:contactinfo, timestamp=1376773256317, value=1 Piglatin Ave, CO, 33333

20001

column=CustomerName:customername, timestamp=1376772776684, value=Joe Developer

20001

column=ProductNums:productnums, timestamp=1376773551221, value=D700

30000

column=ContactInfo:contactinfo, timestamp=1376773256317, value=1 Statistics Lane, MA, 33333

30000

column=CustomerName:customername, timestamp=1376772776684, value=Data Scientist

30000

column=ProductNums:productnums, timestamp=1376773551221, value=C500

5 row(s) in 0.1120 seconds

Importing existing relational data via Sqoop into Hive and HBase tables can potentially enable a
wide range of new and exciting data analysis workflows. If this feature is of interest to you,
check out the Apache Sqoop documentation for additional Hive and HBase command line
arguments and features.

You might also like