Hadoop Installation

Hadoop Installation on Ubuntu (Linux)

Live USB Installation

The following installation is extracted from


author: Thomas Rakwach

1. Install Java

Install the latest version of Java.

$ sudo apt install default-jdk default-jre -y

Verify the installed version of Java.

$ java -version

2. Create Hadoop User and Configure Password-

less SSH

Add a new user hadoop.

$ sudo adduser hadoop

Add the hadoop user to the sudo group.

$ sudo usermod -aG sudo hadoop

Switch to the created user.

$ sudo su - hadoop

Install the OpenSSH server and client.

$ sudo apt install openssh-server openssh-client -y



When you get a prompt, respond with:

keep the local version currently installed

Switch to the created user.

$ sudo su - hadoop

Generate public and private key pairs.

$ ssh-keygen -t rsa

Add the generated public key from id_rsa.pub to authorized_keys.

$ sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Change the permissions of the authorized_keys file.

$ sudo chmod 640 ~/.ssh/authorized_keys

Verify if the password-less SSH is functional.

$ ssh localhost

3. Install Apache Hadoop

Log in with hadoop user.

$ sudo su - hadoop

Download the latest stable version of Hadoop. To get the latest version, go to Apache
Hadoop official download page.

$ sudo wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz

Extract the downloaded file.

$ tar -xvzf hadoop-3.3.1.tar.gz



Move the extracted directory to the /usr/local/ directory.

$ sudo mv hadoop-3.3.1 /usr/local/hadoop

Create directory to store system logs.

$ sudo mkdir /usr/local/hadoop/logs

Change the ownership of the hadoop directory.

$ sudo chown -R hadoop:hadoop /usr/local/hadoop

4. Configure Hadoop

Edit file ~/.bashrc to configure the Hadoop environment variables.

$ sudo nano ~/.bashrc

Add the following lines to the file. Save and close the file.

export HADOOP_HOME=/usr/local/hadoop
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

Activate the environment variables.

$ source ~/.bashrc

5. Configure Java Environment Variables

Hadoop has a lot of components that enable it to perform its core functions. To configure
these components such as YARN, HDFS, MapReduce, and Hadoop-related project settings,
you need to define Java environment variables in hadoop-env.sh configuration file.

Find the Java path.

$ which javac

Find the OpenJDK directory.

$ readlink -f /usr/bin/javac

Edit the hadoop-env.sh file.

$ sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Add the following lines to the file. Then, close and save the file.

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_CLASSPATH+=" $HADOOP_HOME/lib/*.jar"

Browse to the hadoop lib directory.

$ cd /usr/local/hadoop/lib

Download the Javax activation file.

$ sudo wget https://jcenter.bintray.com/javax/activation/javax.activation-


Verify the Hadoop version.

$ hadoop version

Edit the core-site.xml configuration file to specify the URL for your NameNode.

$ sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

Add the following lines. Save and close the file.

<description>The default file system URI</description>



Create a directory for storing node metadata and change the ownership to hadoop.

$ sudo mkdir -p /home/hadoop/hdfs/{namenode,datanode}

$ sudo chown -R hadoop:hadoop /home/hadoop/hdfs

Edit hdfs-site.xml configuration file to define the location for storing node metadata, fs-
image file.

$ sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Add the following lines. Close and save the file.




Edit mapred-site.xml configuration file to define MapReduce values.

$ sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Add the following lines. Save and close the file.


Edit the yarn-site.xml configuration file and define YARN-related settings.

$ sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Add the following lines. Save and close the file.




Log in with hadoop user.

$ sudo su - hadoop

Validate the Hadoop configuration and format the HDFS NameNode.

$ hdfs namenode -format

6. Start the Apache Hadoop Cluster

Start the NameNode and DataNode.

$ start-dfs.sh

Start the YARN resource and node managers.

$ start-yarn.sh

Verify all the running components.

$ jps

7. Access Apache Hadoop Web Interface

You can access the Hadoop NameNode on your browser via http://server-IP:9870. For



