O'Reilly Answers is a community site for sharing knowledge, asking questions, and providing answers that brings together our customers, authors, editors, conference speakers, and Foo (Friends of O'Reilly). More »
In this excerpt from Hadoop: The Definitive Guide, Second Edition, we look at how MapReduce in Hadoop works in detail. This knowledge provides a good foundation for writing more advanced MapReduce pro...
Querying CouchDB can be confusing to someone familiar with querying a relational database. This excerpt from Anderson, Lehnardt, & Slater's CouchDB: The Definitive Guide introduces you to using ...
In this excerpt from CouchDB: The Definitive Guide the authors urge developers to kick back and get to know this document-oriented database and its RESTful interface.
Apache CouchDB is one of a new...
Big Data is when your data is so large you seriously have to consider how you're going to organize, store, and manage it in order to gain some benefit from it. Here are a few links to get you star...
There are some practical techniques that are worth knowing about
when you are developing and running Pig programs. This section covers
some of them.ParallelismWhen running in Hadoop mode...
Is the cluster set up correctly? The best way to answer this
question is empirically: run some jobs and confirm that you get the
expected results. Benchmarks make good tests, as you also...
You can run a MapReduce job with a single line of code:
JobClient.runJob(conf). It’s very short, but it
conceals a great deal of processing behind the scenes. This section
uncov...
So far in this chapter, you have seen the
mechanics of writing a program using MapReduce. We haven’t yet
considered how to turn a data processing problem into the MapReduce
mode...
To take advantage of the parallel processing that Hadoop provides,
we need to express our query as a MapReduce job. After some local,
small-scale testing, we will be able to run it on a ...