O'Reilly Answers is a community site for sharing knowledge, asking questions, and providing answers that brings together our customers, authors, editors, conference speakers, and Foo (Friends of O'Reilly). More »
The following excerpt from Hadoop: The Definitive Guide, Second Edition gives a step by step guide to installing HBase.Download a stable release from an Apache Download Mirror and unpack it on your lo...
In the following excerpt from Hadoop: The Definitive Guide, Second Edition we take a look at an example for implementing Hadoop in a step by step process.Let’s look at a simple example by writing th...
In this excerpt from the O'Reilly publication Hadoop: The Definitive Guide, Second Edition, we look at running Hadoop on Amazon EC2, which is a great way to try out your own Hadoop cluster on a lo...
In this excerpt from Hadoop: The Definitive Guide, Second Edition, we look at how MapReduce in Hadoop works in detail. This knowledge provides a good foundation for writing more advanced MapReduce pro...
There are some practical techniques that are worth knowing about
when you are developing and running Pig programs. This section covers
some of them.ParallelismWhen running in Hadoop mode...
Is the cluster set up correctly? The best way to answer this
question is empirically: run some jobs and confirm that you get the
expected results. Benchmarks make good tests, as you also...
You can run a MapReduce job with a single line of code:
JobClient.runJob(conf). It’s very short, but it
conceals a great deal of processing behind the scenes. This section
uncov...
So far in this chapter, you have seen the
mechanics of writing a program using MapReduce. We haven’t yet
considered how to turn a data processing problem into the MapReduce
mode...
Hadoop has an abstract notion of filesystem, of which HDFS is just
one implementation. The Java abstract class org.apache.hadoop.fs.FileSystem represents a
filesystem in Hadoop, and ther...
To take advantage of the parallel processing that Hadoop provides,
we need to express our query as a MapReduce job. After some local,
small-scale testing, we will be able to run it on a ...