Building your first Hadoop Jar with maven and eclipse

This guide walks through creating your fist Hadoop program, but it skips over some important details, like how to compile a Jar file… It relies on the assumption that you’ve already compiled all of your Java code. Small assumption, but if you come from a C# programming background this may be confusing. Don’t worry it’s fairly easy if you use eclipse for an IDE and maven to manage the project dependencies.

Again, this post is complementary to an Azure walk through. Open it up! 

 

First, download and install eclipse. Then create a new project using Maven

 

clip_image001

 

Choose the quick start archetype

 

clip_image002

 

Add a group ID, this is where your code will live, and a ArtificatID, which is your class name

 

clip_image003

 

We’ll need to update the pom.xml file. Double click and choose pom.xml

image

 

Copy the below code as your pom.xml

project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.microsoft.andrewmoll</groupId>
<artifactId>testnumber2</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>testnumber2</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>1.7.0_05</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>2.5.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.5.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
<configuration>
<verbose>true</verbose>
<fork>true</fork>
<executable>C:\Azul\zulu1.7.0_65-7.6.0.1-win64</executable>
<compilerVersion>1.3</compilerVersion>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.4</version>
<configuration>
<archive>
<manifest>
<mainClass>org.microsoft.andrewmoll.testnumber2.App</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>

Now copy over the code from the Azure documentation into your app.java class.

Now the benefit of an IDE shows right up 🙂

clip_image005

Deprecation! Noooooooo. No worries, I got you. Use this line to create your job

Job job = Job.getInstance(conf, “word count”);

If you are getting errors you may need to update your maven project so you’ll be able to import the correct references

clip_image006

Ok now let’s package our jar file. Right click your project, choose Run As, and Maven build

clip_image007

Change the goal to package

 

clip_image008

Wala! If the build completes successfully, you’ll “fat” jar file will be in your project repository under ./target.

From here you can use this jar file to complete the remainder of the tutorial!

~Happy Developing

Andrew

Leave a Reply

Your email address will not be published. Required fields are marked *