0% found this document useful (0 votes)

6 views5 pages

Hadoop Commands

The document provides a comprehensive guide on using Hadoop commands for managing files and directories in HDFS. It covers commands for creating, copying, moving, and deleting files, as well as checking permissions and disk usage. Additionally, it explains the significance of .crc files, replication factors, and includes examples for various operations.

Uploaded by

vaishnavi kumari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views5 pages

Hadoop Commands

Uploaded by

vaishnavi kumari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 5

Hadoop commands:

cd (sbin path)
start-all.cmd
jps (there shd be 5 objs)
hdfs dfs -ls /
hdfs dfs -mkdir /el1 :make a directory
hdfs dfs -ls /
hdfs dfs -mkdir /el1/el : make a directory inside directory
hdfs dfs -touchz /el1/e2.txt : make a file inside directory

d : directory
- : file
priviledges:
2,3,4 : user
5,6,7 : owner
8,9,10 : other
r: read
w: write
x: execute
4+2+1 = 7(all permissions given)

now go to c drive -> make a folder named new -> create a text file rs -> write smtg
in that

put command -> used to copy smtg from local system

Used to upload (copy) files from the local filesystem to HDFS.
get command -> used to copy smtg to local system

hdfs dfs -put C:\new\rs.txt /el1 (what to copy and where to copy) (The local file
rs.txt (from C:\new) is uploaded to /el1 in HDFS.)

go to http://localhost:9870/explorer.html#/ and check the files

hdfs dfs -get /el1/el.txt C:\New

hdfs dfs -cat /el1/rs.txt (displays the content in file, file shd be present in
hdfs only)
hdfs dfs -cp /el1/rs.txt /new.txt
Used to copy files/directories from one location to another within the HDFS
itself.
hdfs dfs -mv /el1/rs.txt /
hdfs dfs -rm /new.txt

-rmdir is used to delete empty directory

-rm is used to delete file
-rm -r is used to delete a directory which has files

-du : disk usage

hdfs dfs -du -s /new

The hdfs dfs -du -s command is used to display the disk usage summary of a specific
directory or file in Hadoop Distributed File System (HDFS).

du → Displays the disk usage of files/directories in HDFS.

-s → Shows only the summary (total size of the directory), rather than listing all
files inside.
<hdfs_directory_or_file_path> → Path of the directory or file in HDFS.
hdfs dfs -du -v /new (directory name) (doesn't work for empty directory)

Column Description
SIZE The actual size of the file in bytes (original, unreplicated size).
DISK_SPACE_CONSUMED_WITH_ALL_REPLICAS Total disk space used after replication.
FULL_PATH_NAME Full path of the file in HDFS.

Option Description
-h Human-readable format (e.g., KB, MB, GB)
-s Summary: shows only the total size for the directory
-v Verbose: shows a detailed view with replication info

The put command is used to copy a file or directory from the local filesystem to
HDFS.
The get command is used to copy a file or directory from HDFS to the local
filesystem.

Alternative to put: You can also use copyFromLocal, which works similarly.
hdfs dfs -copyFromLocal <local_source_path> <hdfs_target_path>

Alternative to get: You can also use copyToLocal, which works similarly.
hdfs dfs -copyToLocal <hdfs_source_path> <local_target_path>

desktop is local and directories and file created by Hadoop is where we work

hdfs dfs -chgrp el /abc1.txt (new name/file name which we want to change)
chgrp : change group

hdfs dfs -chown el /abc1.txt (change ownership)

hdfs dfs -expunge : for permanently deleting the files which are in bin

hdfs dfs -ls file:///C:\new (listing out files and folders present in c drive)

hdfs dfs -count /data

//shows count of file
eg:
//1 : number of directories
//3 : number of files inside file
//14 : total size in bytes of all files

hdfs dfs -appendToFile "C:\Users\vaish\Desktop\heyy.txt" /dir1/put_file.txt (path

of file whose text we want to copy- it shd be on desktop, directory where we want
to copy)

The hdfs dfs -chmod command is used to change file or directory permissions in
Hadoop's HDFS (similar to the chmod command in Linux).
hdfs dfs -chmod [permissions] <hdfs_file_or_directory>
hdfs dfs -chmod 755 /user/hadoop/data.txt

hdfs dfs -chmod "g+rw,g-x" /dir1

hdfs dfs -chmod "g=w" /dir1

owner(user) : u
other : o
group : g

hdfs dfs -chmod "g=rx,o=r,u=x" /dir1

echo "hello" | hdfs dfs -appendToFile - /file1.txt (for writing in a file that is
present in Hadoop)

hdfs dfs -getmerge /hi/hello "C:/new/vaishnavi.txt" (The getmerge command in Hadoop

is used to merge multiple files from an HDFS directory into a single local file. It
is particularly useful when dealing with MapReduce output, where multiple part
files need to be combined into one.)
hdfs dfs -getmerge /hi "C:/new/vaishnavi.txt" (content of all the files present in
this directory is copied altogether in local file)
(local file shd be in c drive, and change the original back slash)

Why is a .crc File Created When Using hdfs dfs -getmerge?

When you use the hdfs dfs -getmerge command to merge files from HDFS to your local
file system (C: drive in Windows), Hadoop automatically generates a CRC (Cyclic
Redundancy Check) file. This .crc file is used for data integrity verification.

Reasons for .crc File Creation

Hadoop Uses CRC for Data Integrity

Hadoop maintains CRC checksums for each file in HDFS.

When you retrieve files from HDFS to a local file system, Hadoop generates a .crc
file in the same directory as the target file.
This file helps verify that the data is not corrupted during transfer.
Hadoop Treats Local FileSystem Like HDFS

When you run hdfs dfs -getmerge, Hadoop treats your local file system (C: drive)
like HDFS.
Since HDFS uses CRC files for block verification, it applies the same logic when
storing the merged file locally.
How to Prevent .crc File Creation?
If you don’t want Hadoop to create .crc files in your local directory, you can
disable checksum verification:

1. Use -ignoreCrc Option

hdfs dfs -getmerge -ignoreCrc /hi "C:\Users\vaish\Desktop\vaishnavi.txt"

This tells Hadoop not to generate the .crc file while merging files.

2. Manually Delete .crc File

If the .crc file is already created, you can remove it manually:

del "C:\Users\vaish\Desktop\vaishnavi.txt.crc"

hdfs dfs -cat file:///"C:\Users\vaish\Desktop\heyy.txt"

hdfs dfs -ls file:///"C:\Java"

(these both commands are used when printing files or content from local /desktop)

hdfs dfs -moveFromLocal "C:\Users\vaish\Desktop\chimpanzee.txt" /

(-moveToLocal is not present)

stop-all.cmd

hdfs dfs -test -e /hi/heyy (checking error level)

echo %ERRORLEVEL%

Common Options in hdfs dfs -test

Option Description
-e Checks if the file or directory exists.
-d Checks if the path is a directory.
-f Checks if the path is a file.
-z Checks if the file is empty (zero bytes).
-s Checks if the file is non-empty (size > 0).
Examples and Exit Codes
The hdfs dfs -test command does not print output but sets an exit status (error
level):

Exit Code 0 → The test succeeds (the condition is true).

Exit Code 1 → The test fails (the condition is false).

Difference Between -e and -f in hdfs dfs -test Command

Both -e and -f are options for the hdfs dfs -test command in Hadoop, but they serve
different purposes:

Option Meaning Returns 0 (Success) if... Returns 1 (Failure) if...

-e Checks if a file or directory exists The path exists (file or directory)
The path does not exist
-f Checks if the path is a file
(not a directory) The path exists and is a regular file
The path does not exist or is a directory

Difference Between head and tail Commands in Hadoop

Both head and tail commands are used to view a portion of a file stored in HDFS,
but they serve different purposes:

Command Function Default Behavior Common Use Cases

hdfs dfs -head Displays the first few lines of a file in HDFS Shows first 1 KB
of data Quickly preview the beginning of a file
hdfs dfs -tail Displays the last few lines of a file in HDFS Shows last 1 KB of
data Monitor the end of a log file

The hdfs dfs -setrep command in Hadoop is used to change the replication factor of
a file or directory in HDFS.

🔹 Syntax:

hdfs dfs -setrep [-R] <replication_factor> <path>

🔹 Parameters:
-R → (Optional) Recursive. Applies the replication factor to all files under
directories.

<replication_factor> → The new replication factor (e.g., 2, 3, 4).

<path> → The HDFS path to the file or directory.

🔹 Example 1: Set replication of a file

hdfs dfs -setrep 2 /user/hadoop/file1.txt
✅ Sets the replication factor of file1.txt to 2.

🔹 Example 2: Recursively set replication for all files in a directory

hdfs dfs -setrep -R 3 /user/hadoop/data/

✅ Sets the replication factor of all files inside /user/hadoop/data/ to 3.

🔹 Verify Replication:
To verify if the replication factor has changed:

hdfs fsck /user/hadoop/file1.txt -files -blocks -locations

hdfs dfs -mkdir /wordcount

hdfs dfs -put C:\new\wordcounteg.txt /wordcount
hadoop jar %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-*.jar
wordcount /wordcount/wordcounteg.txt /wordcount_output
hdfs dfs -cat /wordcount_output/part-r-00000

UNIX Internals The New Frontiers Vahalia
67% (3)
UNIX Internals The New Frontiers Vahalia
638 pages
Hadoop Command Line Interface
No ratings yet
Hadoop Command Line Interface
10 pages
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
13 pages
Lista de Comandos HDFS
No ratings yet
Lista de Comandos HDFS
8 pages
Big Data Cheat Sheet
No ratings yet
Big Data Cheat Sheet
12 pages
Hadoop
No ratings yet
Hadoop
4 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
2 pages
2 HDFS Commands
No ratings yet
2 HDFS Commands
7 pages
COMMAND Line Interface
No ratings yet
COMMAND Line Interface
26 pages
Hadoop
No ratings yet
Hadoop
6 pages
Hadoop Linux Commands
No ratings yet
Hadoop Linux Commands
8 pages
Apache Hadoop
No ratings yet
Apache Hadoop
3 pages
Practical 1 - 1 - Hadoop Commands
No ratings yet
Practical 1 - 1 - Hadoop Commands
3 pages
Hadoop HDFS Commands
No ratings yet
Hadoop HDFS Commands
1 page
Hadoop Commands
100% (1)
Hadoop Commands
6 pages
Ai&Ml (Bdamanual)
No ratings yet
Ai&Ml (Bdamanual)
24 pages
Hafs Commands
No ratings yet
Hafs Commands
17 pages
HDFS File System Shell Guide
No ratings yet
HDFS File System Shell Guide
10 pages
HDFS Commands
No ratings yet
HDFS Commands
15 pages
HDFS Tutorial
No ratings yet
HDFS Tutorial
5 pages
HDFS Commands
No ratings yet
HDFS Commands
2 pages
Unit 2-HDFS SGS
No ratings yet
Unit 2-HDFS SGS
29 pages
HDFS Command
No ratings yet
HDFS Command
15 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
C:/Users/HP Hdfs Namenode - Format
No ratings yet
C:/Users/HP Hdfs Namenode - Format
7 pages
HDFS Commands v02 PDF
No ratings yet
HDFS Commands v02 PDF
7 pages
Hadoop HDFS Commands With Examples
No ratings yet
Hadoop HDFS Commands With Examples
3 pages
HDFS Commands - Revised
No ratings yet
HDFS Commands - Revised
6 pages
HDFS and HAdoop Command
No ratings yet
HDFS and HAdoop Command
5 pages
Command
No ratings yet
Command
1 page
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
No ratings yet
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
5 pages
Dsa Practical File
No ratings yet
Dsa Practical File
16 pages
HDFS Commands
No ratings yet
HDFS Commands
7 pages
Hadoop 1
No ratings yet
Hadoop 1
15 pages
Basic HDFS Commands
No ratings yet
Basic HDFS Commands
7 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
HDFS Basic Commands
No ratings yet
HDFS Basic Commands
2 pages
HDFS Shell Commands On AWS
No ratings yet
HDFS Shell Commands On AWS
12 pages
HDFS Commands AfterINstallation
No ratings yet
HDFS Commands AfterINstallation
4 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Create A Directory in HDFS at Given Path(s) .: Upload
No ratings yet
Create A Directory in HDFS at Given Path(s) .: Upload
11 pages
Hadoop Tutorial
No ratings yet
Hadoop Tutorial
13 pages
HDFS Commands
No ratings yet
HDFS Commands
1 page
Hadoop Linux Hdfs Commands
No ratings yet
Hadoop Linux Hdfs Commands
2 pages
Hadoop HDFS Commands
No ratings yet
Hadoop HDFS Commands
6 pages
Experiment No 1
No ratings yet
Experiment No 1
13 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
5 pages
BDA Exp 2
No ratings yet
BDA Exp 2
15 pages
HDFS Commands1
No ratings yet
HDFS Commands1
18 pages
HDFS
No ratings yet
HDFS
6 pages
BDA Final Compiled - Pagenumber
No ratings yet
BDA Final Compiled - Pagenumber
71 pages
2335 m4 Demo1 v1 b54 kwf9d75
No ratings yet
2335 m4 Demo1 v1 b54 kwf9d75
8 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Hadoop Assignement Sumit 241111 133837
No ratings yet
Hadoop Assignement Sumit 241111 133837
13 pages
PDF - HDFS Commandsdsa
No ratings yet
PDF - HDFS Commandsdsa
22 pages
Exp-2 Hadoop Commands
No ratings yet
Exp-2 Hadoop Commands
6 pages
Hadoop Commands
No ratings yet
Hadoop Commands
2 pages
Os - Co3
No ratings yet
Os - Co3
14 pages
Building A Kubernetes Cluster
No ratings yet
Building A Kubernetes Cluster
3 pages
Linux MCQ
No ratings yet
Linux MCQ
5 pages
N200030 (OS Lab)
No ratings yet
N200030 (OS Lab)
46 pages
Operating Systems Lab Experiment 3 CPU Scheduling Algorithms
No ratings yet
Operating Systems Lab Experiment 3 CPU Scheduling Algorithms
30 pages
Crash 2024 05 21 19 11 25 365
No ratings yet
Crash 2024 05 21 19 11 25 365
9 pages
Memory Hierarchy Exercises Answers
No ratings yet
Memory Hierarchy Exercises Answers
2 pages
Tree Command
No ratings yet
Tree Command
4 pages
Linux InterviewQuestions For Level 2 by Ratnakar
No ratings yet
Linux InterviewQuestions For Level 2 by Ratnakar
278 pages
SISd
No ratings yet
SISd
17 pages
Windows Command For Security Analysis
No ratings yet
Windows Command For Security Analysis
33 pages
Uninstall w7 N Programs
No ratings yet
Uninstall w7 N Programs
7 pages
Install Multikeys x64 Win8
No ratings yet
Install Multikeys x64 Win8
3 pages
Game Log
No ratings yet
Game Log
105 pages
Changes
No ratings yet
Changes
8 pages
Bugreport Cv7a - Lao - Com OPM1.171019.019 2022 04 16 21 54 08
No ratings yet
Bugreport Cv7a - Lao - Com OPM1.171019.019 2022 04 16 21 54 08
6,902 pages
Add Info B-64303en-5 02
No ratings yet
Add Info B-64303en-5 02
14 pages
Junction Path and Other
No ratings yet
Junction Path and Other
15 pages
Log
No ratings yet
Log
26 pages
Lab Manual-2
No ratings yet
Lab Manual-2
106 pages
Direct X
No ratings yet
Direct X
39 pages
FastPHASE Is Software For Haplotype Reconstruction and Missing Genotype Estimation From Population Genetic SNP Data
No ratings yet
FastPHASE Is Software For Haplotype Reconstruction and Missing Genotype Estimation From Population Genetic SNP Data
2 pages
OrangePi 2G-IOT User Manual - v0.9.6 PDF
No ratings yet
OrangePi 2G-IOT User Manual - v0.9.6 PDF
47 pages
Gpu-Arc
No ratings yet
Gpu-Arc
37 pages
Operating 4
No ratings yet
Operating 4
57 pages
Im Falling in Love With The Villainess
No ratings yet
Im Falling in Love With The Villainess
3 pages
Posix SHM Slides
No ratings yet
Posix SHM Slides
21 pages
Apache Spark On Kubernetes
No ratings yet
Apache Spark On Kubernetes
63 pages
Digital Content Manager Version 20.1 Installation Guide
No ratings yet
Digital Content Manager Version 20.1 Installation Guide
32 pages

Hadoop Commands

Uploaded by

Hadoop Commands

Uploaded by

Hadoop commands:

put command -> used to copy smtg from local system

go to http://localhost:9870/explorer.html#/ and check the files

hdfs dfs -get /el1/el.txt C:\New

-rmdir is used to delete empty directory

-du : disk usage

hdfs dfs -du -s /new

du → Displays the disk usage of files/directories in HDFS.

hdfs dfs -chown el /abc1.txt (change ownership)

hdfs dfs -count /data

hdfs dfs -appendToFile "C:\Users\vaish\Desktop\heyy.txt" /dir1/put_file.txt (path

hdfs dfs -chmod "g+rw,g-x" /dir1

hdfs dfs -chmod "g=w" /dir1

hdfs dfs -chmod "g=rx,o=r,u=x" /dir1

hdfs dfs -getmerge /hi/hello "C:/new/vaishnavi.txt" (The getmerge command in Hadoop

Why is a .crc File Created When Using hdfs dfs -getmerge?

Reasons for .crc File Creation

Hadoop maintains CRC checksums for each file in HDFS.

1. Use -ignoreCrc Option

hdfs dfs -getmerge -ignoreCrc /hi "C:\Users\vaish\Desktop\vaishnavi.txt"

2. Manually Delete .crc File

hdfs dfs -cat file:///"C:\Users\vaish\Desktop\heyy.txt"

hdfs dfs -ls file:///"C:\Java"

hdfs dfs -moveFromLocal "C:\Users\vaish\Desktop\chimpanzee.txt" /

(-moveToLocal is not present)

hdfs dfs -test -e /hi/heyy (checking error level)

Common Options in hdfs dfs -test

Exit Code 0 → The test succeeds (the condition is true).

Difference Between -e and -f in hdfs dfs -test Command

Option Meaning Returns 0 (Success) if... Returns 1 (Failure) if...

Difference Between head and tail Commands in Hadoop

Command Function Default Behavior Common Use Cases

hdfs dfs -setrep [-R] <replication_factor> <path>

<replication_factor> → The new replication factor (e.g., 2, 3, 4).

<path> → The HDFS path to the file or directory.

🔹 Example 1: Set replication of a file

🔹 Example 2: Recursively set replication for all files in a directory

hdfs dfs -setrep -R 3 /user/hadoop/data/

hdfs fsck /user/hadoop/file1.txt -files -blocks -locations

hdfs dfs -mkdir /wordcount

You might also like