Linux Commands for Developers and Data Engineers
File and Directory Management
ls - Lists files and directories in the current directory. Example: ls -l shows
details.
cd - Changes the current directory. Example: cd /home navigates to the /home
directory.
pwd - Displays the current working directory.
mkdir - Creates a new directory. Example: mkdir project creates a folder named
'project'.
rm - Deletes files or directories. Example: rm file.txt deletes 'file.txt'. Use rm -r for
directories.
cp - Copies files or directories. Example: cp file1.txt file2.txt copies file1.txt to
file2.txt.
mv - Moves or renames files and directories. Example: mv old.txt new.txt renames
old.txt to new.txt.
find - Searches files and directories. Example: find / -name file.txt looks for 'file.txt'.
Text Processing (Critical for Data Engineering)
cat - Displays file contents. Example: cat file.txt shows the content of 'file.txt'.
less - Views file content page by page. Example: less file.txt.
grep - Searches for patterns in files. Example: grep 'error' log.txt finds 'error' in
log.txt.
awk - Processes and analyzes text data. Example: awk '{print $1}' file.txt prints the
first column.
sed - Performs text substitution and manipulation. Example: sed 's/old/new/g'
file.txt replaces 'old' with 'new'.
cut - Extracts specific columns from files. Example: cut -d',' -f2 file.csv extracts the
second column.
sort - Sorts file contents. Example: sort file.txt sorts lines alphabetically.
uniq - Removes duplicate lines. Example: uniq file.txt outputs unique lines.
wc - Counts lines, words, or characters. Example: wc -l file.txt counts lines.
Networking
ping - Tests connectivity to a host. Example: ping google.com.
curl - Fetches data from URLs. Example: curl http://example.com downloads the
page content.
wget - Downloads files from the internet. Example: wget http://example.com/file.zip.
scp - Securely copies files between servers. Example: scp file.txt user@host:/path
transfers file.txt.
netstat - Displays network connections, routing tables, etc.
ss - Shows detailed network statistics. Example: ss -tuln displays listening ports.
ftp - Transfers files using the FTP protocol. Example: ftp hostname.
Data Engineering-Specific Tools
hdfs dfs - Manages Hadoop Distributed File System (HDFS). Example: hdfs dfs -ls /
lists HDFS contents.
spark-submit - Submits Spark jobs. Example: spark-submit app.py runs a PySpark
application.
sqoop - Transfers data between Hadoop and relational databases.
kafka-console-producer
- Publishes messages to a Kafka topic.
kafka-console-consumer
- Reads messages from a Kafka topic.
flume-ng - Configures Flume agents to ingest data streams.
Process Management
ps - Displays current running processes. Example: ps aux shows all processes
with details.
top - Displays real-time system resource usage and running processes.
htop - An interactive process viewer (similar to top).
kill - Terminates a process by its PID. Example: kill 1234 kills the process with
PID 1234.
bg - Resumes a suspended job in the background.
fg - Resumes a job in the foreground.
Version Control (Git)
git init - Initializes a new Git repository.
git clone - Clones an existing repository. Example: git clone <repo_url>.
git add - Stages changes for commit. Example: git add file.txt.
git commit - Commits staged changes. Example: git commit -m 'message'.
git push - Pushes changes to a remote repository. Example: git push origin main.
git pull - Fetches and merges changes from a remote repository.
System Monitoring and Disk Usage
df - Displays disk space usage. Example: df -h shows human-readable disk
usage.
du - Shows directory size. Example: du -sh /home gives the size of /home.
free - Displays memory usage. Example: free -h shows human-readable memory
usage.
uptime - Shows how long the system has been running.
Archive and Compression
tar - Archives files. Example: tar -cvf archive.tar file.txt creates an archive.
gzip - Compresses files. Example: gzip file.txt compresses 'file.txt'.
gunzip - Decompresses files. Example: gunzip file.txt.gz decompresses 'file.txt.gz'.
Development Utilities
vim - Edits text files in the terminal. Example: vim file.txt opens 'file.txt' for editing.
nano - A simple text editor. Example: nano file.txt opens 'file.txt' for editing.
ssh - Connects to remote servers securely. Example: ssh user@host.
screen - Allows detached terminal sessions. Example: screen starts a session.