Hadoop Installation

Hadoop Installation on Ubuntu (Linux)
Live USB Installation

The following installation is extracted from
https://www.vultr.com/docs/install-and-configure-apache-hadoop-on-ubuntu-20-04
author: Thomas Rakwach
1. Install Java
Install the latest version of Java.
$ sudo apt install default-jdk default-jre -y
Verify the installed version of Java.
$ java -version
2. Create Hadoop User and Configure Password-

less SSH
Add a new user hadoop.
$ sudo adduser hadoop
Add the hadoop user to the sudo group.
$ sudo usermod -aG sudo hadoop
Switch to the created user.
$ sudo su - hadoop
Install the OpenSSH server and client.
$ sudo apt install openssh-server openssh-client -y
Sham Shul Shukri Mat

When you get a prompt, respond with:
keep the local version currently installed
Switch to the created user.
$ sudo su - hadoop
Generate public and private key pairs.
$ ssh-keygen -t rsa
Add the generated public key from id_rsa.pub to authorized_keys.
$ sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Change the permissions of the authorized_keys file.
$ sudo chmod 640 ~/.ssh/authorized_keys
Verify if the password-less SSH is functional.
$ ssh localhost
3. Install Apache Hadoop
Log in with hadoop user.
$ sudo su - hadoop
Download the latest stable version of Hadoop. To get the latest version, go to Apache
Hadoop official download page.
$ sudo wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
Extract the downloaded file.
$ tar -xvzf hadoop-3.3.1.tar.gz

Move the extracted directory to the /usr/local/ directory.
$ sudo mv hadoop-3.3.1 /usr/local/hadoop
Create directory to store system logs.
$ sudo mkdir /usr/local/hadoop/logs
Change the ownership of the hadoop directory.
$ sudo chown -R hadoop:hadoop /usr/local/hadoop
4. Configure Hadoop
Edit file ~/.bashrc to configure the Hadoop environment variables.
$ sudo nano ~/.bashrc
Add the following lines to the file. Save and close the file.
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Activate the environment variables.
$ source ~/.bashrc
5. Configure Java Environment Variables
Hadoop has a lot of components that enable it to perform its core functions. To configure
these components such as YARN, HDFS, MapReduce, and Hadoop-related project settings,
you need to define Java environment variables in hadoop-env.sh configuration file.
Find the Java path.
$ which javac
Find the OpenJDK directory.
$ readlink -f /usr/bin/javac
Edit the hadoop-env.sh file.
$ sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Add the following lines to the file. Then, close and save the file.
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_CLASSPATH+=" $HADOOP_HOME/lib/*.jar"
Browse to the hadoop lib directory.
$ cd /usr/local/hadoop/lib
Download the Javax activation file.
$ sudo wget https://jcenter.bintray.com/javax/activation/javax.activation-

api/1.2.0/javax.activation-api-1.2.0.jar
Verify the Hadoop version.
$ hadoop version
Edit the core-site.xml configuration file to specify the URL for your NameNode.
$ sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml
Add the following lines. Save and close the file.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:9000</value>
<description>The default file system URI</description>
</property>
</configuration>

Create a directory for storing node metadata and change the ownership to hadoop.
$ sudo mkdir -p /home/hadoop/hdfs/{namenode,datanode}
$ sudo chown -R hadoop:hadoop /home/hadoop/hdfs
Edit hdfs-site.xml configuration file to define the location for storing node metadata, fs-
image file.
$ sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Add the following lines. Close and save the file.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hdfs/datanode</value>
</property>
</configuration>
Edit mapred-site.xml configuration file to define MapReduce values.
$ sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Edit the yarn-site.xml configuration file and define YARN-related settings.
$ sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Log in with hadoop user.
$ sudo su - hadoop
Validate the Hadoop configuration and format the HDFS NameNode.
$ hdfs namenode -format
6. Start the Apache Hadoop Cluster
Start the NameNode and DataNode.
$ start-dfs.sh
Start the YARN resource and node managers.
$ start-yarn.sh
Verify all the running components.
$ jps
7. Access Apache Hadoop Web Interface
You can access the Hadoop NameNode on your browser via http://server-IP:9870. For
example:
http://192.0.2.11:9870

Hadoop Installation

Uploaded by

Copyright:

Available Formats

Hadoop Installation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hadoop Installation

Uploaded by

Copyright:

Available Formats

Hadoop Installation on Ubuntu (Linux)

Live USB Installation

author: Thomas Rakwach

Install the latest version of Java.

$ sudo apt install default-jdk default-jre -y

Verify the installed version of Java.

2. Create Hadoop User and Configure Password-

Add a new user hadoop.

$ sudo adduser hadoop

Add the hadoop user to the sudo group.

$ sudo usermod -aG sudo hadoop

Switch to the created user.

Install the OpenSSH server and client.

$ sudo apt install openssh-server openssh-client -y

Sham Shul Shukri Mat

keep the local version currently installed

Switch to the created user.

Generate public and private key pairs.

Add the generated public key from id_rsa.pub to authorized_keys.

$ sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Change the permissions of the authorized_keys file.

$ sudo chmod 640 ~/.ssh/authorized_keys

Verify if the password-less SSH is functional.

3. Install Apache Hadoop

Log in with hadoop user.

$ sudo wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz

Extract the downloaded file.

$ tar -xvzf hadoop-3.3.1.tar.gz

Sham Shul Shukri Mat

$ sudo mv hadoop-3.3.1 /usr/local/hadoop

Create directory to store system logs.

$ sudo mkdir /usr/local/hadoop/logs

Change the ownership of the hadoop directory.

$ sudo chown -R hadoop:hadoop /usr/local/hadoop

Edit file ~/.bashrc to configure the Hadoop environment variables.

$ sudo nano ~/.bashrc

Activate the environment variables.

5. Configure Java Environment Variables

Find the OpenJDK directory.

Edit the hadoop-env.sh file.

$ sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Browse to the hadoop lib directory.

Download the Javax activation file.

$ sudo wget https://jcenter.bintray.com/javax/activation/javax.activation-

Verify the Hadoop version.

$ sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml

Add the following lines. Save and close the file.

Sham Shul Shukri Mat

$ sudo mkdir -p /home/hadoop/hdfs/{namenode,datanode}

$ sudo chown -R hadoop:hadoop /home/hadoop/hdfs

$ sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Add the following lines. Close and save the file.

Edit mapred-site.xml configuration file to define MapReduce values.

$ sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Add the following lines. Save and close the file.

Edit the yarn-site.xml configuration file and define YARN-related settings.

$ sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Add the following lines. Save and close the file.

Sham Shul Shukri Mat

Log in with hadoop user.