14
Sqoop with HBase 1 Apache Sqoop With Apache HBase Presentation designed & developed by Ashish Tiwari.

Apache Sqoop with Apache Hbase

Embed Size (px)

Citation preview

Page 1: Apache Sqoop with Apache Hbase

Sqoop with HBase 1

Apache SqoopWith

Apache HBasePresentation designed & developed by Ashish

Tiwari.

Page 2: Apache Sqoop with Apache Hbase

Sqoop with HBase 2

Apache HBase Introduction HBase is a distributed column-oriented database built on top

of the Hadoop file system.

It is an open-source project and is horizontally scalable.

HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data.

It leverages the fault tolerance provided by the Hadoop File System (HDFS).

It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System.

Page 3: Apache Sqoop with Apache Hbase

Sqoop with HBase 3

Storage Mechanism in HBase HBase is a column-oriented database and the tables in it are

sorted by row.

The table schema defines only column families, which are the key value pairs.

A table have multiple column families and each column family can have any number of columns.

Subsequent column values are stored contiguously on the disk.

Each cell value of the table has a timestamp. In short, in an HBase:

Table is a collection of rows. Row is a collection of column families. Column family is a collection of columns. Column is a collection of key value pairs.

Page 4: Apache Sqoop with Apache Hbase

Sqoop with HBase 4

Column Oriented and Row OrientedColumn-oriented databases are those that store data tables as

sections of columns of data, rather than as rows of data.

Row-Oriented Database Column-Oriented DatabaseIt is suitable for Online Transaction Process (OLTP).

It is suitable for Online Analytical Processing (OLAP).

Such databases are designed for small number of rows and columns.

Column-oriented databases are designed for huge tables.

Page 5: Apache Sqoop with Apache Hbase

Sqoop with HBase 5

HBase RDBMSHBase is schema-less, it doesn't have the concept of fixed columns schema; defines only column families.

An RDBMS is governed by its schema, which describes the whole structure of tables.

It is built for wide tables. HBase is horizontally scalable.

It is thin and built for small tables. Hard to scale.

No transactions are there in HBase.

RDBMS is transactional.

It has de-normalized data. It will have normalized data.It is good for semi-structured as well as structured data.

It is good for structured data.

HBase and RDBMS

Page 6: Apache Sqoop with Apache Hbase

Sqoop with HBase 6

Sqoop HBase Intro• Sqoop supports additional import targets beyond HDFS and

Hive.

• Sqoop can also import records into a table in HBase.

Page 7: Apache Sqoop with Apache Hbase

Sqoop with HBase 7

Sqoop will import data to the table specified as the argument to --hbase-table .

Each row of the input table will be transformed into an HBase Put operation to a row of the output table. The key for each row is taken from a column of the input.

By default Sqoop will use the split-by column as the row key column. If that is not specified, it will try to identify the primary key column, if any, of the source table.

You can manually specify the row key column with --hbase-row-key.

Each output column will be placed in the same column family, which must be specified with --column-family.

If the input table has composite key, the --hbase-row-key must be in the form of a comma-separated list of composite key attributes.

In this case, the row key for HBase row will be generated by combining values of composite key attributes using underscore as a separator.

NOTE: Sqoop import for a table with composite key will work only if parameter --hbase-row-key has been specified.

Page 8: Apache Sqoop with Apache Hbase

Sqoop with HBase 8

Argument Description

--column-family <family> Sets the target column family for the import

--hbase-create-table If specified, create missing HBase tables

--hbase-row-key <col>

Specifies which input column to use as the row key.In case, if input table contains compositekey, then <col> must be in the form of acomma-separated list of composite keyattributes

--hbase-table <table-name> Specifies an HBase table to use as the target instead of HDFS

--hbase-bulkload Enables bulk loading

If the target table and column family do not exist, the Sqoop job will exit with an error.

You should create the target table and column family before running an import.

If you specify --hbase-create-table, Sqoop will create the target table and column family if they do not exist, using the default parameters from your HBase configuration.

Page 9: Apache Sqoop with Apache Hbase

Sqoop with HBase 9

Syntax:

sqoop import \--connect <<jdbc-uri>> \--table <<table-name>> \--hbase-table <<hbase list name>>\--hbase-row-key <<table primary key or unique key column>>\--column-family metadata \--hbase-create-table \--username root -P

Page 10: Apache Sqoop with Apache Hbase

Sqoop with HBase 10

Example :

• Mysql –u root –p

• Enter password:

• show databases;

• use retail_db;

• show tables;

Page 11: Apache Sqoop with Apache Hbase

Sqoop with HBase 11

sqoop import \--connect "jdbc:mysql://localhost/retail_db" \--username root -P \--table customers \--columns customer_id,customer_fname,customer_lname,customer_city,customer_state \--hbase-create-table \--hbase-table retailtbl \--column-family customerInfo \--hbase-row-key customer_id

Page 12: Apache Sqoop with Apache Hbase

Sqoop with HBase 12

Page 13: Apache Sqoop with Apache Hbase

Sqoop with HBase 13

Page 14: Apache Sqoop with Apache Hbase

Sqoop with HBase 14

THANK YOUPLEASE

LIKEPLEASE

COMMENThttp://manotechtuts.blogspot.in/