0% found this document useful (0 votes)
4 views

✅ PART 1- Install Java and Hadoop on Ubuntu

This document provides a step-by-step guide to install Java and Hadoop on Ubuntu, configure environment variables, and write a WordCount Java program. It includes instructions for compiling the program, creating a JAR file, and running a MapReduce job to count word occurrences in a text file. The final output displays the count of each word processed by the job.

Uploaded by

ayeshagujrati00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

✅ PART 1- Install Java and Hadoop on Ubuntu

This document provides a step-by-step guide to install Java and Hadoop on Ubuntu, configure environment variables, and write a WordCount Java program. It includes instructions for compiling the program, creating a JAR file, and running a MapReduce job to count word occurrences in a text file. The final output displays the count of each word processed by the job.

Uploaded by

ayeshagujrati00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

✅ PART 1: Install Java and Hadoop on Ubuntu

🧰 Step 1: Install Java (JDK)


sudo apt update
sudo apt install openjdk-11-jdk -y
java -version

📦 Step 2: Download and Configure Hadoop (Standalone Mode)


🔽 Download Hadoop
cd ~
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
tar -xzf hadoop-3.3.6.tar.gz
mv hadoop-3.3.6 hadoop

🔧 Set Environment Variables


Edit ~/.bashrc:

nano ~/.bashrc

Add these at the end:

export HADOOP_HOME=~/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar

Apply the changes:


source ~/.bashrc

✅ Test:
hadoop version

✅ PART 2: Write the WordCount Java Code


Create a folder and Java file:

mkdir ~/wordcount
cd ~/wordcount
nano WordCount.java

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

​ public static class TokenizerMapper


​extends Mapper<Object, Text, Text, IntWritable> {
​ private final static IntWritable one = new IntWritable(1);
​ private Text word = new Text();
​ public void map(Object key, Text value, Context context)
​ throws IOException, InterruptedException {
​ StringTokenizer itr = new StringTokenizer(value.toString());
​ while (itr.hasMoreTokens()) {
​ word.set(itr.nextToken());
​ context.write(word, one);
​ }
​ }
​ }

​ public static class IntSumReducer


​ extends Reducer<Text,IntWritable,Text,IntWritable> {
​ private IntWritable result = new IntWritable();
​ public void reduce(Text key, Iterable<IntWritable> values,
​Context context) throws IOException, InterruptedException {
​ int sum = 0;
​ for (IntWritable val : values) {
​ sum += val.get();
​ }
​ result.set(sum);
​ context.write(key, result);
​ }
​ }

​ public static void main(String[] args) throws Exception {


​ Configuration conf = new Configuration();
​ Job job = Job.getInstance(conf, "word count");
​ job.setJarByClass(WordCount.class);
​ job.setMapperClass(TokenizerMapper.class);
​ job.setCombinerClass(IntSumReducer.class);
​ job.setReducerClass(IntSumReducer.class);
​ job.setOutputKeyClass(Text.class);
​ job.setOutputValueClass(IntWritable.class);
​ FileInputFormat.addInputPath(job, new Path(args[0]));
​ FileOutputFormat.setOutputPath(job, new Path(args[1]));
​ System.exit(job.waitForCompletion(true) ? 0 : 1);
​ }
}

✅ PART 3: Compile and Run the Program


🔧 Step 1: Compile
mkdir classes
javac -classpath
"$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/mapreduce/*" -d
classes WordCount.java

📦 Step 2: Create a JAR


jar -cvf wordcount.jar -C classes/ .

✅ PART 4: Run WordCount Job (Standalone)


📁 Step 1: Create Input File
mkdir input
echo "hadoop mapreduce hadoop word count word count" > input/test.txt

▶️ Step 2: Run MapReduce Job


hadoop jar wordcount.jar WordCount input output

📄 Step 3: View Output


cat output/part-r-00000

count 2
hadoop 2
mapreduce 1
word​ 2

You might also like