22
IBM Software Using HBase for Real-time Access to your BigData Using administrative and advance features for schema creation and data retrieval

HBase_Lab3

Embed Size (px)

DESCRIPTION

hbase lab3

Citation preview

  • IBM Software

    Using HBase for Real-time Access to your BigData Using administrative and advance features for schema creation and data retrieval

  • Copyright IBM Corporation, 2013

    US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

  • IBM Software

    Contents USING ADMINISTRATIVE AND ADVANCE FEATURES FOR SCHEMA CREATION AND DATA RETRIEVAL ......................... 4

    3.1 CREATING AND MODIFYING SCHEMAS USING THE HBASEADMIN API .............................................................. 5 3.2 LOADING A DATA SET INTO HBASE ............................................................................................................. 8

    3.2.1 LOADING THE DATA INTO THE HDFS ................................................................................................. 8 3.2.2. IMPORTING THE DATA INTO HBASE ................................................................................................. 11

    3.3 CREATING AND USING FILTERS TO RETRIEVE YOUR DATA ........................................................................... 14 3.4 WORKING WITH COUNTERS..................................................................................................................... 16 3.5 SUMMARY ............................................................................................................................................. 18

    Contents Page 3

  • IBM Software

    Using administrative and advance features for schema creation and data retrieval In this lab, you will create and update schemas using the HBaseAdmin API. This will allow you to create your tables and column families for your data.

    Then you will use Java API to take advantage of Filters and Counters. As part of that lab setup, you will see how to load a sample data set using the ImportTsv tool.

    After completing this hands-on lab, you will be able to:

    Create and Modify HBase tables and schemas using the HBaseAdmin API

    Load data into HBase using the ImportTsv tool

    Apply Filters to your Scans or Get operations to enhance the data returned from HBase

    Use Counters for statistics collection

    This lab assumes some familiarity with the Eclipse environment. The lab solution is included with the files that you downloaded in the VM image.

    Allow 60 minutes to 90 minutes to complete this section of lab.

    This version of the lab was designed using the InfoSphere BigInsights 2.1 Quick Start Edition. Throughout this lab you will be using the following account login information:

    Username Password

    VM image setup screen root password

    Linux biadmin biadmin

    Page 4

  • IBM Software

    3.1 Creating and modifying schemas using the HBaseAdmin API

    You have been using tables and column families by creating them directly from the shell. What if you need to create them programmatically? Using the HBaseAdmin API will allow you to do so.

    Solutions for this part of the exercise can be found in the Lab_Files/LabSolutions.

    __1. Start Eclipse and go to the default workspace. Exercise 2s files will most likely be opened in your workspace if you had not closed them earlier. Close them all now:

    __2. You will create a new package named: hbase.exercise3. Then you will import the partially completed classes from the lab files under Lab_Files/Exercise3 into the workspace. These are the four files you should have imported.

    __3. Create some tables using HBaseAdmin. Open up HBase_SchemaTester.java. Fill in the code needed.

    __a. HBaseAdmin admin = new HBaseAdmin(conf);

    __b. HTableDescriptor desc = new HTableDescriptor(tableName);

    Contents Page 5

  • IBM Software

    __c. HColumnDescriptor colFamilyDesc = new HColumnDescriptor(columnFamily);

    __d. desc.addFamily(colFamilyDesc);

    __e. admin.createTable(desc);

    __f. Uncomment the code to test if the table has been created to complete the method for creating the table.

    __4. Run the program to see that your table gets created. Be sure to uncomment out the line that tests if the table is available if you havent done so in the step above. The results of your output should return true.

    You can also go to the HBaseShell and type in the command list or describe tableFromJava to see it.

    __5. Once your table has been created, you will need to comment out the code that you just wrote so that you can run the next set of code to modify an existing table. Go ahead and comment out these lines.

    Page 6

  • IBM Software

    You can leave the other parts of the method intact as you will still need them to modify the table.

    __6. Now you need to create the code to modify the table. You will type in the code below the code you just commented out.

    __a. HTableDescriptor htd1 = admin.getTableDescriptor(tableName);

    __b. long oldMaxFileSize = htd1.getMaxFileSize();

    __c. HColumnDescriptor colFamilyDesc2 = new HColumnDescriptor(Bytes.toBytes("cf2"));

    __d. htd1.addFamily(colFamilyDesc2);

    __e. htd1.setMaxFileSize(1024 * 1024 * 1024L);

    __f. admin.disableTable(tableName);

    __g. admin.modifyTable(tableName, htd1);

    __h. admin.enableTable(tableName);

    __i. HTableDescriptor htd2 = admin.getTableDescriptor(tableName);

    __j. Uncomment out the System.out.println() to test your class

    __7. When you are done, run the program and you should see the modifications to the schema based on the series of the System.outs.

    Contents Page 7

  • IBM Software

    3.2 Loading a data set into HBase

    3.2.1 Loading the data into the HDFS To prepare for this lab, you will load a sample data set into HBase using BigInsights. GSDB database, a rich and realistic database that contains sample data for the Great Outdoors company, which is a fictional outdoor equipment retailer, is used for this purpose. For simplicity, we will use only one table from this database.

    First we will load that file into HDFS. Then we will import it into an HBase table.

    You should have downloaded the lab files for this exercise already. If you did not download them, go to the Big Data Universitys course page to get the instructions to get your lab files. You will need the SLS_Sales_Fact.txt file.

    __8. Double click the BigInsights

    __9. Navigate to the Files tab, go to the biadmin directory under users, and click the create new directory icon on the menu bar.

    Page 8

  • IBM Software

    __10. Give the new folder the name : exercise3

    __11. Select the exercise 3 directory and then click the Upload icon:

    __12. Click Browse and search for the file to upload under Lab_Files/SLS_SALES_FACT.txt

    Contents Page 9

  • IBM Software

    __13. Click Open and then OK to upload the file. Once the upload has been completed, you will see the file.

    Page 10

  • IBM Software

    3.2.2. Importing the data into HBase

    __14. Next thing you are going to do is create the table, sales_fact with a single column family that stores only one version of its value. You will do this using this shell command:

    create sales_fact, {NAME=> cf, VERSIONS=>1}

    __15. The table has been created, go ahead and exit the HBase Shell.

    __16. We are now going to use the ImportTsv tool load the data into the sales_fact table

    Contents Page 11

  • IBM Software

    Remember again, that the columns here are representative of typical column names in a traditional RDBMS. In HBase, you will not want to name your columns as such, but instead use as short of a column name as you can.

    __17. Run this command to add the columns and their respective values:

    $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -

    Dimporttsv.columns=HBASE_ROW_KEY,cf:ok,cf:ek,cf:rk,cf:rsk,cf:pdk,cf:pmk,cf:

    omk,cf:sok,cf:sdk,cf:cdk,cf:q,cf:uc,cf:up,cf:usp,cf:gm,cf:st,cf:gp -

    Dimporttsv.skip.bad.lines=false 'sales_fact'

    hdfs://bivm:9000/user/biadmin/exercise3/SLS_SALES_FACT.txt

    __18. Once it is done, count the rows in the result, by using this command in the HBase shell:

    count sales_fact

    Page 12

  • IBM Software

    You have imported 440 rows into the sales_fact table.

    The data has been loaded. Go to the next section to work with the data set.

    Contents Page 13

  • IBM Software

    3.3 Creating and Using Filters to retrieve your data

    __19. We will work with the Filters first. Open up AccessObject.java and go to the getInfo() method. In this method, you will create a scanner on the sales_fact table. You want to restrict the scan to only two columns, enough for our purposes. You will create the Filter later, for now write the code to add those two columns and set the filter:

    __a. scan.addColumn(COLUMN_FAMILY, UNIT_PRICE);

    __b. scan.addColumn(COLUMN_FAMILY, QUANTITY);

    __c. scan.setFilter(filter);

    __20. Now open up HBase_FilterTester.java. In here, you will write the code to create five different filters.

    __a. Filter f1 = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes("20070920")));

    __b. Filter f2 = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes("20050920")));

    __c. Filter f3 = new RowFilter(CompareFilter.CompareOp.EQUAL, new RegexStringComparator(".*2006."));

    __d. Filter f4 = new QualifierFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes("q")));

    Page 14

  • IBM Software

    __e. Filter f5 = new ValueFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("136.90")));

    __21. Once you have written the five filters, uncomment out the ao.getInfo(f1) and run the program with each of the filter and look at the results to validate your filters. You may also change the comparison operators or the comparators around to see different results.

    Contents Page 15

  • IBM Software

    3.4 Working with Counters

    __22. Next you will work with Counters. Go back to the AccessObject.java class. We have a method performIncrement() that increment counters. Write the appropriate code to complete this method:

    __a. Increment increment1 = new Increment(Bytes.toBytes(rowkey));

    __b. increment1.addColumn(COLUMN_FAMILY, Bytes.toBytes("ViewCount"), viewCountValue);

    __c. increment1.addColumn(COLUMN_FAMILY, Bytes.toBytes("AnotherCount"), anotherCountValue);

    __d. Result result1 = sales_fact.increment(increment1);

    __e. Uncomment out the last section of the code to complete the method.

    __23. Once you are done with AccessObject.java, open up HBaseCounterTester.java:

    Page 16

  • IBM Software

    We are just picking a random row to add the two counter columns that was defined earlier: ViewCount and AnotherCount. The two values to increment by are 1 and 20. Run the HBase_CounterTester to see the counter results. Run it multiple times to see the counter increment.

    __24. Run it with negative values to see the counters decrease or run it with a 0 to get current value:

    __25. You are done with this lab exercise. Go ahead and save and close your Eclipse and any other windows or terminals that you may have open.

    Contents Page 17

  • IBM Software

    3.5 Summary

    Excellent work! After using tables for a few lessons already, you have now seen how to create the tables programmatically using the HBaseAdmin API for your applications. Its also worth mentioning again that you could have created tables using the HBase Shell commands, but in an application, you would use some sort of Client API.

    You should now be familiar with some of the Filters available for filtering out the data that is returned from HBase. You also should understand how Counters work. Filters and Counters are essential to HBase as they allow you to narrow down large sets of data (were dealing with Big Data, so youll have LOTS of data) to get to what you need quickly. Counters help you manage and collect the statistics of your table.

    Also, as part of the lab setup, you saw how to use ImportTsv tool to import some data into HBase. First you had to load the data into the HDFS. Then you ran the tool which specified the column families and column names of the data from the text file.

    Page 18

  • NOTES

  • NOTES

  • Copyright IBM Corporation 2013.

    The information contained in these materials is provided for

    informational purposes only, and is provided AS IS without warranty

    of any kind, express or implied. IBM shall not be responsible for any

    damages arising out of the use of, or otherwise related to, these

    materials. Nothing contained in these materials is intended to, nor

    shall have the effect of, creating any warranties or representations

    from IBM or its suppliers or licensors, or altering the terms and

    conditions of the applicable license agreement governing the use of

    IBM software. References in these materials to IBM products,

    programs, or services do not imply that they will be available in all

    countries in which IBM operates. This information is based on

    current IBM product plans and strategy, which are subject to change

    by IBM without notice. Product release dates and/or capabilities

    referenced in these materials may change at any time at IBMs sole

    discretion based on market opportunities or other factors, and are not

    intended to be a commitment to future product or feature availability

    in any way.

    IBM, the IBM logo and ibm.com are trademarks of International

    Business Machines Corp., registered in many jurisdictions

    worldwide. Other product and service names might be trademarks of

    IBM or other companies. A current list of IBM trademarks is

    available on the Web at Copyright and trademark information at

    www.ibm.com/legal/copytrade.shtml.

    Using administrative and advance features for schema creation and data retrieval33.1 Creating and modifying schemas using the HBaseAdmin API3.2 Loading a data set into HBase3.2.1 Loading the data into the HDFS3.2.2. Importing the data into HBase

    3.3 Creating and Using Filters to retrieve your data3.4 Working with Counters3.5 Summary