Sqoop To Hbase
Sqoop To Hbase
Of
course, the main goal here is to demonstrate how Sqoop can import data from an RDBMS or
data warehouse directly into HBase, but its always better to see how a tool is used in context
versus how its used in the abstract.
The figure shows how the Service Order Database might look after being transformed into an
HBase schema.
For this particular import example, you want to import thecustomercontactinfo table directly into
an HBase table in preparation for building the HBase Service Order Database schema. To
complete the HBase schema, youd have to execute the same steps to import
the productinfo table, and then the serviceorders table could be built with a Java MapReduce
application.
Sqoop doesnt now permit you to import, all at once, a relational table directly into an HBase
table having multiple column families. To work around this limitation, you create the HBase table
first and then execute three Sqoop import operations to finish the task. The listing shows the
task of creating the table.
'ContactInfo', 'ProductNums'
In the following listing, for each Sqoop import command, note that the target HBase column
family specified by the column-family CLA and the corresponding MySQL columns specified by
the columns CLA are in bold. The customernumprimary key also becomes the HBase row key,
as specified by the hbase-row-keyCLA.
$ sqoop import
--connect jdbc:mysql://localhost/serviceorderdb
--username root -P
--table customercontactinfo
--columns "customernum,customername"
--hbase-table customercontactinfo
--column-family CustomerName
--hbase-row-key customernum -m 1
Enter password:
...
13/08/17 16:53:01 INFO mapreduce.ImportJobBase: Retrieved 5 records.
$ sqoop import
--connect jdbc:mysql://localhost/serviceorderdb
--username root -P
--table customercontactinfo
--columns "customernum,contactinfo"
--hbase-table customercontactinfo
--column-family ContactInfo
--hbase-row-key customernum -m 1
Enter password:
...
13/08/17 17:00:59 INFO mapreduce.ImportJobBase: Retrieved 5 records.
$ sqoop import
--connect jdbc:mysql://localhost/serviceorderdb
--username root -P
--table customercontactinfo
--columns "customernum,productnums"
--hbase-table customercontactinfo
--column-family ProductNums
--hbase-row-key customernum -m 1
Enter password:
...
13/08/17 17:05:54 INFO mapreduce.ImportJobBase: Retrieved 5 records.
If you were to carry out an HBase scan of your new table, youd see that the import and
translation from a relational database table on MySQL directly into HBase was a success.
The customercontactinfo table in this example is rather small, but imagine the power you now
have, using Sqoop and HBase, to quickly move relational tables that may be exceeding capacity
on your RDBMS or data warehouse into HBase, where capacity is virtually unlimited and
scalability is automatic.
hbase(main):033:0> scan 'customercontactinfo'
ROW
COLUMN+CELL
10000
column=ContactInfo:contactinfo, timestamp=1376773256317, value=1 Hadoop Lane, NY, 11111, John.Sm
ith@xyz.com
10000
10000
10001
10001
10001
0,D700
20000
20000
20000
20001
20001
20001
30000
30000
30000
Importing existing relational data via Sqoop into Hive and HBase tables can potentially enable a
wide range of new and exciting data analysis workflows. If this feature is of interest to you,
check out the Apache Sqoop documentation for additional Hive and HBase command line
arguments and features.