Hadoop commands:
cd (sbin path)
start-all.cmd
jps (there shd be 5 objs)
hdfs dfs -ls /
hdfs dfs -mkdir /el1 :make a directory
hdfs dfs -ls /
hdfs dfs -mkdir /el1/el : make a directory inside directory
hdfs dfs -touchz /el1/e2.txt : make a file inside directory
d : directory
- : file
priviledges:
2,3,4 : user
5,6,7 : owner
8,9,10 : other
r: read
w: write
x: execute
4+2+1 = 7(all permissions given)
now go to c drive -> make a folder named new -> create a text file rs -> write smtg
in that
put command -> used to copy smtg from local system
Used to upload (copy) files from the local filesystem to HDFS.
get command -> used to copy smtg to local system
hdfs dfs -put C:\new\rs.txt /el1 (what to copy and where to copy) (The local file
rs.txt (from C:\new) is uploaded to /el1 in HDFS.)
go to http://localhost:9870/explorer.html#/ and check the files
hdfs dfs -get /el1/el.txt C:\New
hdfs dfs -cat /el1/rs.txt (displays the content in file, file shd be present in
hdfs only)
hdfs dfs -cp /el1/rs.txt /new.txt
Used to copy files/directories from one location to another within the HDFS
itself.
hdfs dfs -mv /el1/rs.txt /
hdfs dfs -rm /new.txt
-rmdir is used to delete empty directory
-rm is used to delete file
-rm -r is used to delete a directory which has files
-du : disk usage
hdfs dfs -du -s /new
The hdfs dfs -du -s command is used to display the disk usage summary of a specific
directory or file in Hadoop Distributed File System (HDFS).
du → Displays the disk usage of files/directories in HDFS.
-s → Shows only the summary (total size of the directory), rather than listing all
files inside.
<hdfs_directory_or_file_path> → Path of the directory or file in HDFS.
hdfs dfs -du -v /new (directory name) (doesn't work for empty directory)
Column Description
SIZE The actual size of the file in bytes (original, unreplicated size).
DISK_SPACE_CONSUMED_WITH_ALL_REPLICAS Total disk space used after replication.
FULL_PATH_NAME Full path of the file in HDFS.
Option Description
-h Human-readable format (e.g., KB, MB, GB)
-s Summary: shows only the total size for the directory
-v Verbose: shows a detailed view with replication info
The put command is used to copy a file or directory from the local filesystem to
HDFS.
The get command is used to copy a file or directory from HDFS to the local
filesystem.
Alternative to put: You can also use copyFromLocal, which works similarly.
hdfs dfs -copyFromLocal <local_source_path> <hdfs_target_path>
Alternative to get: You can also use copyToLocal, which works similarly.
hdfs dfs -copyToLocal <hdfs_source_path> <local_target_path>
desktop is local and directories and file created by Hadoop is where we work
hdfs dfs -chgrp el /abc1.txt (new name/file name which we want to change)
chgrp : change group
hdfs dfs -chown el /abc1.txt (change ownership)
hdfs dfs -expunge : for permanently deleting the files which are in bin
hdfs dfs -ls file:///C:\new (listing out files and folders present in c drive)
hdfs dfs -count /data
//shows count of file
eg:
//1 : number of directories
//3 : number of files inside file
//14 : total size in bytes of all files
hdfs dfs -appendToFile "C:\Users\vaish\Desktop\heyy.txt" /dir1/put_file.txt (path
of file whose text we want to copy- it shd be on desktop, directory where we want
to copy)
The hdfs dfs -chmod command is used to change file or directory permissions in
Hadoop's HDFS (similar to the chmod command in Linux).
hdfs dfs -chmod [permissions] <hdfs_file_or_directory>
hdfs dfs -chmod 755 /user/hadoop/data.txt
hdfs dfs -chmod "g+rw,g-x" /dir1
hdfs dfs -chmod "g=w" /dir1
owner(user) : u
other : o
group : g
hdfs dfs -chmod "g=rx,o=r,u=x" /dir1
echo "hello" | hdfs dfs -appendToFile - /file1.txt (for writing in a file that is
present in Hadoop)
hdfs dfs -getmerge /hi/hello "C:/new/vaishnavi.txt" (The getmerge command in Hadoop
is used to merge multiple files from an HDFS directory into a single local file. It
is particularly useful when dealing with MapReduce output, where multiple part
files need to be combined into one.)
hdfs dfs -getmerge /hi "C:/new/vaishnavi.txt" (content of all the files present in
this directory is copied altogether in local file)
(local file shd be in c drive, and change the original back slash)
Why is a .crc File Created When Using hdfs dfs -getmerge?
When you use the hdfs dfs -getmerge command to merge files from HDFS to your local
file system (C: drive in Windows), Hadoop automatically generates a CRC (Cyclic
Redundancy Check) file. This .crc file is used for data integrity verification.
Reasons for .crc File Creation
Hadoop Uses CRC for Data Integrity
Hadoop maintains CRC checksums for each file in HDFS.
When you retrieve files from HDFS to a local file system, Hadoop generates a .crc
file in the same directory as the target file.
This file helps verify that the data is not corrupted during transfer.
Hadoop Treats Local FileSystem Like HDFS
When you run hdfs dfs -getmerge, Hadoop treats your local file system (C: drive)
like HDFS.
Since HDFS uses CRC files for block verification, it applies the same logic when
storing the merged file locally.
How to Prevent .crc File Creation?
If you don’t want Hadoop to create .crc files in your local directory, you can
disable checksum verification:
1. Use -ignoreCrc Option
hdfs dfs -getmerge -ignoreCrc /hi "C:\Users\vaish\Desktop\vaishnavi.txt"
This tells Hadoop not to generate the .crc file while merging files.
2. Manually Delete .crc File
If the .crc file is already created, you can remove it manually:
del "C:\Users\vaish\Desktop\vaishnavi.txt.crc"
hdfs dfs -cat file:///"C:\Users\vaish\Desktop\heyy.txt"
hdfs dfs -ls file:///"C:\Java"
(these both commands are used when printing files or content from local /desktop)
hdfs dfs -moveFromLocal "C:\Users\vaish\Desktop\chimpanzee.txt" /
(-moveToLocal is not present)
stop-all.cmd
hdfs dfs -test -e /hi/heyy (checking error level)
echo %ERRORLEVEL%
Common Options in hdfs dfs -test
Option Description
-e Checks if the file or directory exists.
-d Checks if the path is a directory.
-f Checks if the path is a file.
-z Checks if the file is empty (zero bytes).
-s Checks if the file is non-empty (size > 0).
Examples and Exit Codes
The hdfs dfs -test command does not print output but sets an exit status (error
level):
Exit Code 0 → The test succeeds (the condition is true).
Exit Code 1 → The test fails (the condition is false).
Difference Between -e and -f in hdfs dfs -test Command
Both -e and -f are options for the hdfs dfs -test command in Hadoop, but they serve
different purposes:
Option Meaning Returns 0 (Success) if... Returns 1 (Failure) if...
-e Checks if a file or directory exists The path exists (file or directory)
The path does not exist
-f Checks if the path is a file
(not a directory) The path exists and is a regular file
The path does not exist or is a directory
Difference Between head and tail Commands in Hadoop
Both head and tail commands are used to view a portion of a file stored in HDFS,
but they serve different purposes:
Command Function Default Behavior Common Use Cases
hdfs dfs -head Displays the first few lines of a file in HDFS Shows first 1 KB
of data Quickly preview the beginning of a file
hdfs dfs -tail Displays the last few lines of a file in HDFS Shows last 1 KB of
data Monitor the end of a log file
The hdfs dfs -setrep command in Hadoop is used to change the replication factor of
a file or directory in HDFS.
🔹 Syntax:
hdfs dfs -setrep [-R] <replication_factor> <path>
🔹 Parameters:
-R → (Optional) Recursive. Applies the replication factor to all files under
directories.
<replication_factor> → The new replication factor (e.g., 2, 3, 4).
<path> → The HDFS path to the file or directory.
🔹 Example 1: Set replication of a file
hdfs dfs -setrep 2 /user/hadoop/file1.txt
✅ Sets the replication factor of file1.txt to 2.
🔹 Example 2: Recursively set replication for all files in a directory
hdfs dfs -setrep -R 3 /user/hadoop/data/
✅ Sets the replication factor of all files inside /user/hadoop/data/ to 3.
🔹 Verify Replication:
To verify if the replication factor has changed:
hdfs fsck /user/hadoop/file1.txt -files -blocks -locations
hdfs dfs -mkdir /wordcount
hdfs dfs -put C:\new\wordcounteg.txt /wordcount
hadoop jar %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-*.jar
wordcount /wordcount/wordcounteg.txt /wordcount_output
hdfs dfs -cat /wordcount_output/part-r-00000