Jump to content

How to Install Hadoop HBase

0
  chco's Photo
Posted Oct 31 2010 08:08 AM

The following excerpt from Hadoop: The Definitive Guide, Second Edition gives a step by step guide to installing HBase.
Download a stable release from an Apache Download Mirror and unpack it on your local filesystem. For example:

% tar xzf hbase-x.y.z.tar.gz


As with Hadoop, you first need to tell HBase where Java is located on your system. If you have the JAVA_HOME environment variable set to point to a suitable Java installation, then that will be used, and you don’t have to configure anything further. Otherwise, you can set the Java installation that HBase uses by editing HBase’s conf/hbase-env.sh, and specifying the JAVA_HOME variable to point to version 1.6.0 of Java.


Note:

HBase, just like Hadoop, requires Java 6.



For convenience, add the HBase binary directory to your command-line path. For example:

% export HBASE_HOME=/home/hbase/hbase-x.y.z
% export PATH=$PATH:$HBASE_HOME/bin


To get the list of HBase options, type:

% hbase
Usage: hbase <command>
where <command> is one of:
  shell            run the HBase shell
  master           run an HBase HMaster node
  regionserver     run an HBase HRegionServer node
  zookeeper        run a Zookeeper server
  rest             run an HBase REST server
  thrift           run an HBase Thrift server
  avro             run an HBase Avro server
  migrate          upgrade an hbase.rootdir
  hbck             run the hbase 'fsck' tool
 or
  CLASSNAME        run the class named CLASSNAME
Most commands print help when invoked w/o parameters.


Test Drive

To start a temporary instance of HBase that uses the /tmp directory on the local filesystem for persistence, type:

% start-hbase.sh


This will launch a standalone HBase instance that persists to the local filesystem; by default, HBase will write to /tmp/hbase-${USERID}.[115]

[115] In standalone mode, HBase master, regionserver, and a ZooKeeper instance are all run in the same JVM.


To administer your HBase instance, launch the HBase shell by typing:

% hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version: 0.89.0-SNAPSHOT, ra4ea1a9a7b074a2e5b7b24f761302d4ea28ed1b2, Sun Jul 18
15:01:50 PDT 2010 hbase(main):001:0>


This will bring up a JRuby IRB interpreter that has had some HBase-specific commands added to it. Type help and then RETURN to see the list of shell commands grouped into categories. Type help COMMAND_GROUP for help by category or help COMMAND for help on a specific command and example usage. Commands use Ruby formatting to specify lists and dictionaries. See the end of the main help screen for a quick tutorial.

Now let us create a simple table, add some data, and then clean up.

To create a table, you must name your table and define its schema. A table’s schema comprises table attributes and the list of table column families. Column families themselves have attributes that you in turn set at schema definition time. Examples of column family attributes include whether the family content should be compressed on the filesystem and how many versions of a cell to keep. Schemas can be later edited by offlining the table using the shell disable command, making the necessary alterations using alter, then putting the table back online with enable.

To create a table named test with a single column family name data using defaults for table and column family attributes, enter:

hbase(main):007:0> create 'test', 'data'
0 row(s) in 1.3066 seconds



Tip:

If the previous command does not complete successfully, and the shell displays an error and a stack trace, your install was not successful. Check the master logs under the HBase logs directory—the default location for the logs directory is ${HBASE_HOME}/logs—for a clue as to where things went awry.



See the help output for examples adding table and column family attributes when specifying a schema.

To prove the new table was created successfully, run the list command. This will output all tables in user space:

hbase(main):019:0> list
test                                                                                                          
1 row(s) in 0.1485 seconds


To insert data into three different rows and columns in the data column family, and then list the table content, do the following:

hbase(main):021:0> put 'test', 'row1', 'data:1', 'value1'
0 row(s) in 0.0454 seconds
hbase(main):022:0> put 'test', 'row2', 'data:2', 'value2'
0 row(s) in 0.0035 seconds
hbase(main):023:0> put 'test', 'row3', 'data:3', 'value3'
0 row(s) in 0.0090 seconds
hbase(main):024:0> scan 'test'
ROW                          COLUMN+CELL                                                                      
 row1                        column=data:1, timestamp=1240148026198, value=value1                             
 row2                        column=data:2, timestamp=1240148040035, value=value2                             
 row3                        column=data:3, timestamp=1240148047497, value=value3                             
3 row(s) in 0.0825 seconds


Notice how we added three new columns without changing the schema.

To remove the table, you must first disable it before dropping it:

hbase(main):025:0> disable 'test'
09/04/19 06:40:13 INFO client.HBaseAdmin: Disabled test
0 row(s) in 6.0426 seconds
hbase(main):026:0> drop 'test'
09/04/19 06:40:17 INFO client.HBaseAdmin: Deleted test
0 row(s) in 0.0210 seconds
hbase(main):027:0> list
0 row(s) in 2.0645 seconds


Shut down your HBase instance by running:

% stop-hbase.sh


To learn how to set up a distributed HBase and point it at a running HDFS, see the Getting Started section of the HBase documentation.

Cover of Hadoop: The Definitive Guide
Learn more about this topic from Hadoop: The Definitive Guide, 2nd Edition. 

Apache Hadoop is ideal for organizations with a growing need to store and process massive application datasets. With Hadoop: The Definitive Guide, programmers will find details for analyzing large datasets with Hadoop, and administrators will learn how to set up and run Hadoop clusters. The book includes case studies that illustrate how Hadoop is used to solve specific problems.

Learn More Read Now on Safari


Tags:
0 Subscribe


0 Replies