You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 29, 2022. It is now read-only.
Luke Lovett edited this page Jun 10, 2016
·
9 revisions
This page describes how to use the MongoDB Hadoop Connector with vanilla MapReduce.
Installation
Obtain the MongoDB Hadoop Connector. You can either build it or download the jars. The releases page also includes instructions for use with Maven and Gradle. For MapReduce, all you need is the "core" jar.
Get a JAR for the MongoDB Java Driver. The connector requires at least version 3.0.0 of the driver "uber" jar (called "mongo-java-driver.jar").
Each node in the cluster will need to have access to the MongoDB Hadoop Connector JARs as well as the JAR for the MongoDB Java Driver. You can provision each machine in the cluster with the necessary JARs in $HADOOP_PREFIX/share/hadoop/common, for example, or you may use the Hadoop DistributedCache to distribute the JARs to pre-existing nodes. This is easily done using the -libjars option in the hadoop jar command:
hadoop jar \
-libjars mongo-hadoop-core.jar,mongo-java-driver.jar \
MyJob.jar com.mycompany.HadoopJob
See the instructions on the releases page for how to include the Mongo Hadoop Connector jars easily in your MapReduce project code.
Writing MapReduce Jobs
The MongoDB Hadoop Connector can do input/output with a live MongoDB cluster or BSON files (such as ones generated by mongodump:
Direction
MongoDB
BSON
Input
MongoInputFormat
BSONFileInputFormat
Output
MongoOutputFormat
BSONFileOutputFormat
These formats are found in the com.mongodb.hadoop package. For Mapreduce 1.x, these classes are in the com.mongodb.hadoop.mapred package.
Examples
There are a number of examples for writing MapReduce jobs using the MongoDB Hadoop Connector.