Navigating Linux System Commands
A guide for beginners to the Shell and GNU coreutils
Sayan Ghosh
August 10, 2024
IIT Madras
BS Data Science and Applications
Disclaimer
This document is a companion activity book for the System Commands
(BSSE2001) course taught by Prof. Gandham Phanikumar at IIT Madras
BS Program. This book contains resources, references, questions and
solutions to some common questions on Linux commands, shell scripting,
grep, sed, awk, and other system commands.
This was prepared with the help and guidance of the course instructors:
Santhana Krishnan and Sushil Pachpinde
Copyright
© This book is released under the public domain, meaning it is freely available
for use and distribution without restriction. However, while the content itself
is not subject to copyright, it is requested that proper attribution be given if
any part of this book is quoted or referenced. This ensures recognition of the
original authorship and helps maintain transparency in the dissemination of
information.
Colophon
This document was typeset with the help of KOMA-Script and LATEX using
the kaobook class.
The source code of this book is available at:
https://github.com/sayan01/se2001-book
(You are welcome to contribute!)
Edition
Compiled on August 10, 2024
UNIX is basically a simple operating system, but you
have to be a genius to understand the simplicity.
– Dennis Ritchie
Preface
Through this work I have tried to make learning and understanding the
basics of Linux fun and easy. I have tried to make the book as practical
as possible, with many examples and exercises. The structure of the book
follows the structure of the course BSSE2001 - System Commands, taught by
Prof. Gandham Phanikumar at IIT Madras BS Program. .
The book takes inspiration from the previous works done for the course,
▶ Sanjay Kumar’s Github Repository
▶ Cherian George’s Github Repository
▶ Prabuddh Mathur’s TA Sessions
as well as external resources like:
▶ Robert Elder’s Blogs and Videos
▶ Aalto University, Finland’s Scientific Computing - Linux Shell Crash
Course
The book covers basic commands, their motivation, use cases, and examples.
The book also covers some advanced topics like shell scripting, regular
expressions, and text processing using sed and awk.
This is not a substitute for the course, but a companion to it. The book is a
work in progress and any contribution is welcome at https://github.com/
sayan01/se2001-book
Sayan Ghosh
Contents
Preface v
Contents vii
1 Essentials of Linux 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 What is Linux? . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Desktop Environments . . . . . . . . . . . . . . . . . 4
1.1.3 Window Managers . . . . . . . . . . . . . . . . . . . 5
1.1.4 Why Linux? . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.5 What is Shell? . . . . . . . . . . . . . . . . . . . . . . 7
1.1.6 Shell vs Terminal . . . . . . . . . . . . . . . . . . . . 8
1.1.7 Why the Command Line? . . . . . . . . . . . . . . . 10
1.1.8 Command Prompt . . . . . . . . . . . . . . . . . . . 11
1.2 Simple Commands in GNU Core Utils . . . . . . . . . . . . 12
1.2.1 File System Navigation . . . . . . . . . . . . . . . . . 12
1.2.2 Manuals . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.3 System Information . . . . . . . . . . . . . . . . . . . 20
1.2.4 File Management . . . . . . . . . . . . . . . . . . . . 24
1.2.5 Text Processing and Pagers . . . . . . . . . . . . . . . 29
1.2.6 Aliases and Types of Commands . . . . . . . . . . . 35
1.2.7 User Management . . . . . . . . . . . . . . . . . . . . 41
1.2.8 Date and Time . . . . . . . . . . . . . . . . . . . . . . 43
1.3 Navigating the File System . . . . . . . . . . . . . . . . . . . 47
1.3.1 What is a File System? . . . . . . . . . . . . . . . . . 47
1.3.2 In Memory File System . . . . . . . . . . . . . . . . . 51
1.3.3 Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
1.3.4 Basic Commands for Navigation . . . . . . . . . . . . 57
1.4 File Permissions . . . . . . . . . . . . . . . . . . . . . . . . . 60
1.4.1 Read . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
1.4.2 Write . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
1.4.3 Execute . . . . . . . . . . . . . . . . . . . . . . . . . . 62
1.4.4 Interesting Caveats . . . . . . . . . . . . . . . . . . . 63
1.4.5 Changing Permissions . . . . . . . . . . . . . . . . . 65
1.4.6 Special Permissions . . . . . . . . . . . . . . . . . . . 66
1.4.7 Octal Representation of Permissions . . . . . . . . . 70
1.5 Types of Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
1.5.1 Regular Files . . . . . . . . . . . . . . . . . . . . . . . 73
1.5.2 Directories . . . . . . . . . . . . . . . . . . . . . . . . 73
1.5.3 Symbolic Links . . . . . . . . . . . . . . . . . . . . . 73
1.5.4 Character Devices . . . . . . . . . . . . . . . . . . . . 74
1.5.5 Block Devices . . . . . . . . . . . . . . . . . . . . . . 75
1.5.6 Named Pipes . . . . . . . . . . . . . . . . . . . . . . . 76
1.5.7 Sockets . . . . . . . . . . . . . . . . . . . . . . . . . . 77
1.5.8 Types of Regular Files . . . . . . . . . . . . . . . . . . 78
1.6 Inodes and Links . . . . . . . . . . . . . . . . . . . . . . . . . 80
1.6.1 Inodes . . . . . . . . . . . . . . . . . . . . . . . . . . 80
1.6.2 Separation of Data, Metadata, and Filename . . . . . 82
1.6.3 Directory Entries . . . . . . . . . . . . . . . . . . . . 83
1.6.4 Hard Links . . . . . . . . . . . . . . . . . . . . . . . . 84
1.6.5 Symbolic Links . . . . . . . . . . . . . . . . . . . . . 86
1.6.6 Symlink vs Hard Links . . . . . . . . . . . . . . . . . 89
1.6.7 Identifying Links . . . . . . . . . . . . . . . . . . . . 89
1.6.8 What are . and ..? . . . . . . . . . . . . . . . . . . . . 90
2 Command Line Editors 93
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.1.1 Types of Editors . . . . . . . . . . . . . . . . . . . . . 93
2.1.2 Why Command Line Editors? . . . . . . . . . . . . . 94
2.1.3 Mouse Support . . . . . . . . . . . . . . . . . . . . . 94
2.1.4 Editor war . . . . . . . . . . . . . . . . . . . . . . . . 95
2.1.5 Differences between Vim and Emacs . . . . . . . . . 95
2.1.6 Nano: The peacemaker amidst the editor war . . . . 98
2.2 Vim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.2.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.2.2 Ed Commands . . . . . . . . . . . . . . . . . . . . . . 110
2.2.3 Exploring Vim . . . . . . . . . . . . . . . . . . . . . . 121
2.3 Emacs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2.3.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2.3.2 Exploring Emacs . . . . . . . . . . . . . . . . . . . . 137
2.4 Nano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
2.4.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . 139
2.4.2 Exploring Nano . . . . . . . . . . . . . . . . . . . . . 141
2.4.3 Editing A Script in Nano . . . . . . . . . . . . . . . . 142
3 Networking and SSH 145
3.1 Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
3.1.1 What is networking? . . . . . . . . . . . . . . . . . . 145
3.1.2 Types of Networks . . . . . . . . . . . . . . . . . . . . 146
3.1.3 Devices in a Network . . . . . . . . . . . . . . . . . . 147
3.1.4 IP Addresses . . . . . . . . . . . . . . . . . . . . . . . 149
3.1.5 Subnetting . . . . . . . . . . . . . . . . . . . . . . . . 150
3.1.6 Private and Public IP Addresses . . . . . . . . . . . . 153
3.1.7 CIDR . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
3.1.8 Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
3.1.9 Protocols . . . . . . . . . . . . . . . . . . . . . . . . . 156
3.1.10 Firewalls . . . . . . . . . . . . . . . . . . . . . . . . . 157
3.1.11 SELinux . . . . . . . . . . . . . . . . . . . . . . . . . 158
3.1.12 Network Tools . . . . . . . . . . . . . . . . . . . . . . 160
3.2 SSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
3.2.1 What is SSH? . . . . . . . . . . . . . . . . . . . . . . . 169
3.2.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . 169
3.2.3 How does SSH work? . . . . . . . . . . . . . . . . . . 170
3.2.4 Key-based Authentication . . . . . . . . . . . . . . . 170
3.2.5 Configuring your SSH keys . . . . . . . . . . . . . . 171
3.2.6 Sharing your public key . . . . . . . . . . . . . . . . 173
3.2.7 How to login to a remote server . . . . . . . . . . . . 174
3.2.8 Call an exorcist, there’s a daemon in my computer . 175
3.2.9 SCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
4 Process Management 181
4.1 What is sleep? . . . . . . . . . . . . . . . . . . . . . . . . . . 181
4.1.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . 181
4.1.2 Scripting with sleep . . . . . . . . . . . . . . . . . . . 181
4.1.3 Syntax and Synopsis . . . . . . . . . . . . . . . . . . 182
4.2 Different ways of running a process . . . . . . . . . . . . . . 183
4.2.1 What are processes? . . . . . . . . . . . . . . . . . . . 183
4.2.2 Process Creation . . . . . . . . . . . . . . . . . . . . . 184
4.2.3 Process Ownership . . . . . . . . . . . . . . . . . . . 184
4.2.4 Don’t kill my children . . . . . . . . . . . . . . . . . . 186
4.2.5 Setsid . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
4.2.6 Nohup . . . . . . . . . . . . . . . . . . . . . . . . . . 190
4.2.7 coproc . . . . . . . . . . . . . . . . . . . . . . . . . . 190
4.2.8 at and cron . . . . . . . . . . . . . . . . . . . . . . . . 192
4.2.9 GNU parallel . . . . . . . . . . . . . . . . . . . . . . . 192
4.2.10 systemd services . . . . . . . . . . . . . . . . . . . . . 192
4.3 Process Management . . . . . . . . . . . . . . . . . . . . . . 193
4.3.1 Disown . . . . . . . . . . . . . . . . . . . . . . . . . . 193
4.3.2 Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
4.3.3 Suspending and Resuming Jobs . . . . . . . . . . . . 195
4.3.4 Killing Processes . . . . . . . . . . . . . . . . . . . . 197
4.4 Finding Processes . . . . . . . . . . . . . . . . . . . . . . . . 202
4.4.1 pgrep . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
4.4.2 pkill . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
4.4.3 pidwait . . . . . . . . . . . . . . . . . . . . . . . . . . 203
4.5 Listing Processes . . . . . . . . . . . . . . . . . . . . . . . . . 204
4.5.1 ps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
4.5.2 pstree . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
4.5.3 top . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
4.5.4 htop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
4.5.5 btop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
4.5.6 glances . . . . . . . . . . . . . . . . . . . . . . . . . . 209
4.6 Exit Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
5 Streams, Redirections, Piping 213
5.1 Multiple Commands in a Single Line . . . . . . . . . . . . . 213
5.1.1 Conjunction and Disjunction . . . . . . . . . . . . . . 213
5.2 Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
5.3 Redirection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
5.3.1 Standard Output Redirection . . . . . . . . . . . . . 221
5.3.2 Standard Error Redirection . . . . . . . . . . . . . . . 224
5.3.3 Appending to a File . . . . . . . . . . . . . . . . . . . 227
5.3.4 Standard Input Redirection . . . . . . . . . . . . . . . 228
5.3.5 Here Documents . . . . . . . . . . . . . . . . . . . . 230
5.3.6 Here Strings . . . . . . . . . . . . . . . . . . . . . . . 231
5.4 Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
5.4.1 UNIX Philosophy . . . . . . . . . . . . . . . . . . . . 233
5.4.2 Multiple Pipes . . . . . . . . . . . . . . . . . . . . . . 233
5.4.3 Piping Standard Error . . . . . . . . . . . . . . . . . . 241
5.4.4 Piping to and From Special Files . . . . . . . . . . . . 242
5.4.5 Named Pipes . . . . . . . . . . . . . . . . . . . . . . . 244
5.4.6 Tee Command . . . . . . . . . . . . . . . . . . . . . . 247
5.5 Command Substitution . . . . . . . . . . . . . . . . . . . . . 249
5.6 Arithmetic Expansion . . . . . . . . . . . . . . . . . . . . . . 250
5.6.1 Using variables in arithmetic expansion . . . . . . . 251
5.7 Process Substitution . . . . . . . . . . . . . . . . . . . . . . . 251
5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
6 Pattern Matching 259
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
6.2 Globs and Wildcards . . . . . . . . . . . . . . . . . . . . . . 259
6.3 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . 263
6.3.1 Basic Regular Expressions . . . . . . . . . . . . . . . 264
6.3.2 Character Classes . . . . . . . . . . . . . . . . . . . . 266
6.3.3 Anchors . . . . . . . . . . . . . . . . . . . . . . . . . 274
6.3.4 Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . 275
6.3.5 Alternation . . . . . . . . . . . . . . . . . . . . . . . . 281
6.3.6 Grouping . . . . . . . . . . . . . . . . . . . . . . . . . 283
6.3.7 Backreferences . . . . . . . . . . . . . . . . . . . . . . 284
6.4 Extended Regular Expressions . . . . . . . . . . . . . . . . . 287
6.5 Perl-Compatible Regular Expressions . . . . . . . . . . . . . 290
6.5.1 Minimal Matching (a.k.a. "ungreedy") . . . . . . . . 291
6.5.2 Multiline matching . . . . . . . . . . . . . . . . . . . 291
6.5.3 Named subpatterns . . . . . . . . . . . . . . . . . . . 291
6.5.4 Look-ahead and look-behind assertions . . . . . . . . 292
6.5.5 Comments . . . . . . . . . . . . . . . . . . . . . . . . 292
6.5.6 Recursive patterns . . . . . . . . . . . . . . . . . . . . 293
6.6 Other Text Processing Tools . . . . . . . . . . . . . . . . . . . 294
6.6.1 tr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
6.6.2 cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
6.6.3 paste . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
6.6.4 fold . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
6.6.5 grep . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
6.6.6 sed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
6.6.7 awk . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
7 Grep 309
7.1 Regex Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
7.2 PCRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
7.3 Print Only Matching Part . . . . . . . . . . . . . . . . . . . . 312
7.4 Matching Multiple Patterns . . . . . . . . . . . . . . . . . . . 313
7.4.1 Disjunction . . . . . . . . . . . . . . . . . . . . . . . . 313
7.4.2 Conjunction . . . . . . . . . . . . . . . . . . . . . . . 314
7.5 Read Patterns from File . . . . . . . . . . . . . . . . . . . . . 315
7.6 Ignore Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
7.7 Invert Match . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
7.8 Anchoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
7.9 Counting Matches . . . . . . . . . . . . . . . . . . . . . . . . 318
7.10 Print Filename . . . . . . . . . . . . . . . . . . . . . . . . . . 320
7.11 Limiting Output . . . . . . . . . . . . . . . . . . . . . . . . . 321
7.12 Quiet Quitting . . . . . . . . . . . . . . . . . . . . . . . . . . 324
7.13 Numbering Lines . . . . . . . . . . . . . . . . . . . . . . . . 324
7.14 Recursive Search . . . . . . . . . . . . . . . . . . . . . . . . . 325
7.15 Context Line Control . . . . . . . . . . . . . . . . . . . . . . 326
7.16 Finding Lines Common in Two Files . . . . . . . . . . . . . . 327
8 Shell Variables 331
8.1 Creating Variables . . . . . . . . . . . . . . . . . . . . . . . . 331
8.2 Printing Variables to the Terminal . . . . . . . . . . . . . . . 333
8.2.1 Echo Command . . . . . . . . . . . . . . . . . . . . . 334
8.2.2 Accessing and Updating Numeric Variables . . . . . 338
8.3 Removing Variables . . . . . . . . . . . . . . . . . . . . . . . 343
8.4 Listing Variables . . . . . . . . . . . . . . . . . . . . . . . . . 345
8.4.1 set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
8.4.2 declare . . . . . . . . . . . . . . . . . . . . . . . . . . 346
8.4.3 env . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
8.4.4 printenv . . . . . . . . . . . . . . . . . . . . . . . . . 347
8.5 Special Variables . . . . . . . . . . . . . . . . . . . . . . . . . 348
8.5.1 PWD . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
8.5.2 RANDOM . . . . . . . . . . . . . . . . . . . . . . . . 349
8.5.3 PATH . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
8.5.4 PS1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
8.6 Variable Manipulation . . . . . . . . . . . . . . . . . . . . . . 353
8.6.1 Default Values . . . . . . . . . . . . . . . . . . . . . . 353
8.6.2 Error if Unset . . . . . . . . . . . . . . . . . . . . . . 354
8.6.3 Length of Variable . . . . . . . . . . . . . . . . . . . . 355
8.6.4 Substring of Variable . . . . . . . . . . . . . . . . . . 355
8.6.5 Prefix and Suffix Removal . . . . . . . . . . . . . . . 356
8.6.6 Replace Substring . . . . . . . . . . . . . . . . . . . . 357
8.6.7 Anchoring Matches . . . . . . . . . . . . . . . . . . . 357
8.6.8 Deleting the match . . . . . . . . . . . . . . . . . . . 358
8.6.9 Lowercase and Uppercase . . . . . . . . . . . . . . . 358
8.6.10 Sentence Case . . . . . . . . . . . . . . . . . . . . . . 359
8.7 Restrictions on Variables . . . . . . . . . . . . . . . . . . . . 359
8.7.1 Integer Only . . . . . . . . . . . . . . . . . . . . . . . 359
8.7.2 No Upper Case . . . . . . . . . . . . . . . . . . . . . 360
8.7.3 No Lower Case . . . . . . . . . . . . . . . . . . . . . 360
8.7.4 Read Only . . . . . . . . . . . . . . . . . . . . . . . . 360
8.7.5 Removing Restrictions . . . . . . . . . . . . . . . . . 361
8.8 Bash Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
8.9 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
8.10 Brace Expansion . . . . . . . . . . . . . . . . . . . . . . . . . 363
8.10.1 Range Expansion . . . . . . . . . . . . . . . . . . . . 364
8.10.2 List Expansion . . . . . . . . . . . . . . . . . . . . . . 364
8.10.3 Combining Expansions . . . . . . . . . . . . . . . . . 365
8.11 History Expansion . . . . . . . . . . . . . . . . . . . . . . . . 367
8.12 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
8.12.1 Length of Array . . . . . . . . . . . . . . . . . . . . . 369
8.12.2 Indices of Array . . . . . . . . . . . . . . . . . . . . . 369
8.12.3 Printing all elements of Array . . . . . . . . . . . . . 370
8.12.4 Deleting an Element . . . . . . . . . . . . . . . . . . . 370
8.12.5 Appending an Element . . . . . . . . . . . . . . . . . 371
8.12.6 Storing output of a command in an Array . . . . . . 371
8.12.7 Iterating over an Array . . . . . . . . . . . . . . . . . 372
8.13 Associative Arrays . . . . . . . . . . . . . . . . . . . . . . . . 373
9 Shell Scripting 375
9.1 What is a shell script? . . . . . . . . . . . . . . . . . . . . . . 375
9.2 Shebang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
9.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
9.3.1 Multiline Comments . . . . . . . . . . . . . . . . . . 377
9.4 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
9.5 Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
9.5.1 Shifting Arguments . . . . . . . . . . . . . . . . . . . 384
9.6 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . 385
9.6.1 Reading Input . . . . . . . . . . . . . . . . . . . . . . 385
9.7 Conditionals . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
9.7.1 Test command . . . . . . . . . . . . . . . . . . . . . . 387
9.7.2 Test Keyword . . . . . . . . . . . . . . . . . . . . . . 391
9.8 If-elif-else . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
9.8.1 If . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
9.8.2 Else . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
9.8.3 Elif . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
9.9 Exit code inversion . . . . . . . . . . . . . . . . . . . . . . . . 397
9.10 Mathematical Expressions as if command . . . . . . . . . . . 398
9.11 Command Substitution in if . . . . . . . . . . . . . . . . . . 398
9.12 Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
9.12.1 Fall Through . . . . . . . . . . . . . . . . . . . . . . . 400
9.12.2 Multiple Patterns . . . . . . . . . . . . . . . . . . . . 400
9.13 Select Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
9.14 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
9.14.1 For loop . . . . . . . . . . . . . . . . . . . . . . . . . 403
9.14.2 C style for loop . . . . . . . . . . . . . . . . . . . . . 406
9.14.3 IFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
9.14.4 While loop . . . . . . . . . . . . . . . . . . . . . . . . 409
9.14.5 Until loop . . . . . . . . . . . . . . . . . . . . . . . . 411
9.14.6 Read in while . . . . . . . . . . . . . . . . . . . . . . 412
9.14.7 Break and Continue . . . . . . . . . . . . . . . . . . . 414
9.15 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
9.16 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
9.17 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
9.18 Shell Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . 423
9.18.1 bc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
9.18.2 expr . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
9.19 Running arbritary commands using source, eval and exec . 427
9.19.1 exec . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
9.20 Getopts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
9.20.1 With case statement . . . . . . . . . . . . . . . . . . . 430
9.21 Profile and RC files . . . . . . . . . . . . . . . . . . . . . . . 431
9.22 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
10 Stream Editor 433
10.1 Basic Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
10.2 Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
10.2.1 Negation . . . . . . . . . . . . . . . . . . . . . . . . . 436
10.3 Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
10.3.1 Command Syntax . . . . . . . . . . . . . . . . . . . . 436
10.3.2 Available Commands . . . . . . . . . . . . . . . . . . 437
10.3.3 Branching and Flow Control . . . . . . . . . . . . . . 437
10.3.4 Printing . . . . . . . . . . . . . . . . . . . . . . . . . . 438
10.3.5 Deleting . . . . . . . . . . . . . . . . . . . . . . . . . 438
10.3.6 Substitution . . . . . . . . . . . . . . . . . . . . . . . 439
10.3.7 Print Line Numbers . . . . . . . . . . . . . . . . . . . 443
10.3.8 Inserting and Appending Text . . . . . . . . . . . . . 444
10.3.9 Changing Lines . . . . . . . . . . . . . . . . . . . . . 445
10.3.10 Transliteration . . . . . . . . . . . . . . . . . . . . . . 446
10.4 Combining Commands . . . . . . . . . . . . . . . . . . . . . 446
10.5 In-place Editing . . . . . . . . . . . . . . . . . . . . . . . . . 448
10.6 Sed Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
10.6.1 Order of Execution . . . . . . . . . . . . . . . . . . . 450
10.6.2 Shebang . . . . . . . . . . . . . . . . . . . . . . . . . 451
10.7 Branching and Flow Control . . . . . . . . . . . . . . . . . . 452
10.7.1 Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
10.7.2 Branching . . . . . . . . . . . . . . . . . . . . . . . . 453
10.7.3 Appending Lines . . . . . . . . . . . . . . . . . . . . 456
10.7.4 Join Multiline Strings . . . . . . . . . . . . . . . . . . 458
10.7.5 If-Else . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
10.7.6 If-elif-else . . . . . . . . . . . . . . . . . . . . . . . . . 460
List of Figures
1.1 Linux Distributions Usage . . . . . . . . . . . . . . . . . . . . . 3
1.2 Desktop Environment Usage . . . . . . . . . . . . . . . . . . . . 5
1.3 Operating System Onion Rings . . . . . . . . . . . . . . . . . . . 7
1.4 GNU Core Utils Logo . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 ls -l Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 Linux Filesystem Hierarchy . . . . . . . . . . . . . . . . . . . . . 48
1.7 Relative Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1.8 File Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
1.9 Octal Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . 70
1.10 System Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
1.11 Inode and Directory Entry . . . . . . . . . . . . . . . . . . . . . 84
1.12 Directed Acyclic Graph . . . . . . . . . . . . . . . . . . . . . . . 85
1.13 Abstract Representation of Symbolic Links and Hard Links . . . 89
1.14 Symbolic Links and Hard Links . . . . . . . . . . . . . . . . . . 89
2.1 A Teletype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
2.2 Ken Thompson . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
2.3 Dennis Ritchie . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.4 Xerox Alto, one of the first VDU terminals with a GUI, released in
1973 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.5 A first generation Dectape (bottom right corner, white round tape)
being used with a PDP-11 computer . . . . . . . . . . . . . . . . 103
2.6 George Coulouris . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2.7 Bill Joy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.9 Stevie Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
2.8 The Keyboard layout of the ADM-3A terminal . . . . . . . . . . 106
2.10 Bram Moolenaar . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.11 The inital version of Vim, when it was called Vi IMitation . . . . 108
2.12 Vim 9.0 Start screen . . . . . . . . . . . . . . . . . . . . . . . . . 109
2.13 Neo Vim Window editing this book . . . . . . . . . . . . . . . . 110
2.14 Simplified Modes in Vim . . . . . . . . . . . . . . . . . . . . . . 122
2.15 Detailed Modes in Vim . . . . . . . . . . . . . . . . . . . . . . . 123
2.16 Vim Cheat Sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
2.17 Richard Stallman - founder of GNU and FSF projects . . . . . . 133
2.18 Guy L. Steele Jr. combined many divergent TECO with macros to
create EMACS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
2.19 James Gosling - creator of Gosling Emacs and later Java . . . . . 136
2.20 ADM-3A terminal . . . . . . . . . . . . . . . . . . . . . . . . . . 136
2.21 Space Cadet Keyboard . . . . . . . . . . . . . . . . . . . . . . . . 136
2.22 Nano Text Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
3.1 Types of Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 146
3.2 Growth Rate of Different Functions - Note how quickly 𝑛 2 grows 146
3.3 Hub and Spoke Model Employed by Airlines . . . . . . . . . . . 146
3.4 LAN and WAN connecting to the Internet . . . . . . . . . . . . 147
3.5 Hub, Switch, Router connecting to the Internet . . . . . . . . . . 148
3.6 SELinux Context . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
3.7 Symmetric Encryption . . . . . . . . . . . . . . . . . . . . . . . . 169
3.8 Symmetric Encryption . . . . . . . . . . . . . . . . . . . . . . . . 170
3.9 Asymmetric Encryption . . . . . . . . . . . . . . . . . . . . . . . 170
4.1 Example of a process tree . . . . . . . . . . . . . . . . . . . . . . 183
5.1 Standard Streams . . . . . . . . . . . . . . . . . . . . . . . . . . 218
5.2 File Descriptor Table . . . . . . . . . . . . . . . . . . . . . . . . . 219
5.3 Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
6.1 Positive and Negative Look-ahead and look-behind assertions . 292
6.2 Many-to-one mapping . . . . . . . . . . . . . . . . . . . . . . . . 295
6.3 Caesar Cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
7.1 The comm command . . . . . . . . . . . . . . . . . . . . . . . . . 319
9.1 Flowchart of the if, elif, and else construct . . . . . . . . . . . 396
10.1 Filtering Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
10.2 The different interfaces to sed . . . . . . . . . . . . . . . . . . . 433
List of Tables
1.1 Basic Shortcuts in Terminal . . . . . . . . . . . . . . . . . . . . . 9
1.2 Basic Commands in GNU Core Utils . . . . . . . . . . . . . . . 14
1.3 Manual Page Sections . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Keys in Info Pages . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5 Escape Characters in echo . . . . . . . . . . . . . . . . . . . . . . 30
1.6 Date Format Specifiers . . . . . . . . . . . . . . . . . . . . . . . . 44
1.7 Linux Filesystem Hierarchy . . . . . . . . . . . . . . . . . . . . . 47
1.8 Linux Filesystem Directory Classification . . . . . . . . . . . . . 51
1.9 Octal Representation of Permissions . . . . . . . . . . . . . . . . 71
1.10 Types of Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
1.11 Metadata of a File . . . . . . . . . . . . . . . . . . . . . . . . . . 83
1.12 Symlink vs Hard Link . . . . . . . . . . . . . . . . . . . . . . . . 89
2.1 History of Vim . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.2 Ed Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.3 Commands for location . . . . . . . . . . . . . . . . . . . . . . . 111
2.4 Commands for Editing . . . . . . . . . . . . . . . . . . . . . . . 111
2.5 Ex Commands in Vim . . . . . . . . . . . . . . . . . . . . . . . . 124
2.6 Navigation Commands in Vim . . . . . . . . . . . . . . . . . . . 125
2.7 Moving the Screen Commands in Vim . . . . . . . . . . . . . . . 125
2.8 Replacing Text Commands in Vim . . . . . . . . . . . . . . . . . 126
2.9 Toggling Case Commands in Vim . . . . . . . . . . . . . . . . . 126
2.10 Deleting Text Commands in Vim . . . . . . . . . . . . . . . . . . 127
2.11 Deleting Text Commands in Vim . . . . . . . . . . . . . . . . . . 128
2.12 Address Types in Search and Replace . . . . . . . . . . . . . . . 130
2.13 Keys to enter Insert Mode . . . . . . . . . . . . . . . . . . . . . . 131
2.14 History of Emacs . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2.15 Navigation Commands in Emacs . . . . . . . . . . . . . . . . . . 138
2.16 Exiting Emacs Commands . . . . . . . . . . . . . . . . . . . . . 138
2.17 Searching Text Commands in Emacs . . . . . . . . . . . . . . . . 138
2.18 Copying and Pasting Commands in Emacs . . . . . . . . . . . . 139
2.19 File Handling Commands in Nano . . . . . . . . . . . . . . . . . 141
2.20 Editing Commands in Nano . . . . . . . . . . . . . . . . . . . . 141
3.1 Private IP Address Ranges . . . . . . . . . . . . . . . . . . . . . 154
3.2 Well-known Ports . . . . . . . . . . . . . . . . . . . . . . . . . . 156
3.3 Network Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.1 Pipes, Streams, and Redirection syntax . . . . . . . . . . . . . . 257
6.1 Differences between BRE and ERE . . . . . . . . . . . . . . . . . 289
9.1 Differences between range expansion and seq command . . . . 405
9.2 Differences between seq and python’s range function . . . . . . 406
9.3 Summary of the bash constructs . . . . . . . . . . . . . . . . . . 432
Essentials of Linux 1
1.1 Introduction
1.1.1 What is Linux?
Definition 1.1.1 (Linux) Linux is a kernel that
is used in many operating systems. It is open
source and free to use. Linux is not an operating
system unto itself, but the core component of it.
So what is Ubuntu? Ubuntu is one of the many
distributions that use the Linux kernel. It is a com-
plete operating system that is free to use and open
source. It is based on the Debian distribution of
Linux. There are many other distributions of Linux,
such as:
▶ Debian - Used primarily on servers, it is
known for its stability.
• Ubuntu - A commercial distribution
based on Debian which is popular among
new users.
• Linux Mint - A distribution based on
Ubuntu which is known for its ease of
use. It is one of the distributions recom-
mended to new users.
• Pop OS - A distribution based on Ubuntu
which is known for its focus on develop-
ers, creators, and gamers.
• and many more
▶ Red Hat Enterprise Linux (RHEL) - A com-
mercial distribution used primarily in enter-
prises. It is owned by Red Hat and is targeted
2 1 Essentials of Linux
primarily to companies with their free OS
paid support model.
• Fedora - A community-driven distribu-
tion sponsored by Red Hat. It is known
for its cutting-edge features and is used
by developers. It remains on the up-
stream of RHEL, receiving new features
before RHEL.
• CentOS - A discontinued distribution
based on RHEL. It was known for its
stability and was used in servers. It was
downstream from RHEL.
• CentOS Stream - It is a midstream be-
tween the upstream development in Fe-
dora Linux and the downstream devel-
opment for Red Hat Enterprise Linux.
• Rocky Linux - A distribution created by
the Rocky Enterprise Software Founda-
tion after the announcement of discon-
tinuation of CentOS. It is a downstream
of RHEL that provides feature parity
and binary compatibility with RHEL.
• Alma Linux - A distribution created
by the CloudLinux team after the an-
nouncement of discontinuation of Cen-
tOS. It is a downstream of RHEL that
provides feature parity and binary com-
patibility with RHEL.
▶ Arch Linux - A community-driven distribu-
tion known for its simplicity and customiz-
ability. It is a rolling release distribution, which
means that it is continuously updated. It is
a bare-bones distribution that lets the user
decide which packages they want to install.
• Manjaro - A distribution based on Arch
Linux which is known for its user-friendliness.
It is a rolling release distribution that is
1.1 Introduction 3
easier to install for new users. It uses a
different repository for packages with
additional testing.
• EndeavourOS - A distribution based on
1: “Free software” means
Arch Linux which is known for its sim- software that respects users’
plicity and minimalism. It is a rolling re- freedom and community.
lease distribution that is easier to install Roughly, it means that the
for new users. It uses the same reposi- users have the freedom
to run, copy, distribute,
tory for packages as Arch Linux. study, change and improve
• Artix Linux - It uses the OpenRC init the software. Thus, “free
system instead of systemd. It also offers software” is a matter
of liberty, not price. To
other init systems like runit, s6, dinit.
understand the concept, you
should think of “free” as in
▶ openSUSE - It is a free and open-source “free speech,” not as in “free
Linux distribution developed by the open- beer.” We sometimes call it
“libre software,” borrowing
SUSE project. It is offered in two main vari- the French or Spanish word
ations: Tumbleweed, an upstream rolling for “free” as in freedom, to
release distribution, and Leap, a stable re- show we do not mean the
software is gratis.
lease distribution which is sourced from SUSE
Linux Enterprise. You may have paid
money to get copies of a free
program, or you may have
• Tumbleweed - Rolling Release upstream.
obtained copies at no charge.
• Leap - Stable Release downstream. But regardless of how you
got your copies, you always
▶ Gentoo - A distribution known for its cus- have the freedom to copy
and change the software,
tomizability and performance. It is a source- even to sell copies.
based distribution, which means that the user
compiles the software from source code. It is - GNU on Free Software
known for its performance optimizations for
Linux Distributions Share
the user’s hardware. Ubuntu
▶ Void - It is an independent rolling-release
Debian 34%
Linux distribution that uses the X Binary 16%
Package System package manager, which was
10%
designed and implemented from scratch, and RHEL 0%
39%
the runit init system. Excluding binary kernel Gentoo
blobs, a base install is composed entirely of Others
free1 software. Figure 1.1: Linux Distribu-
tions Usage in 2024
4 1 Essentials of Linux
1.1.2 Desktop Environments
Definition 1.1.2 (Desktop Environment) A desk-
top environment is a collection of software de-
signed to give functionality and a certain look
and feel to a desktop operating system. It is a
combination of a window manager, a file man-
ager, a panel, and other software that provides a
graphical user interface and utilities to a regular
desktop user, such as volume and brightness
control, multimedia applications, settings pan-
els, etc. This is only required by desktop (and
laptop) uses and are not present on server in-
stances.
There are many desktop environments available for
Linux, but the important ones are:
▶ GNOME - One of the most popular desktop
environments for Linux. It is known for its
simplicity and ease of use. It is the default
desktop environment for many distributions,
2: GTK is a free soft- including Ubuntu. It is based on the GTK
ware cross-platform widget
Toolkit. 2 Popular distros shipping by default
toolkit for creating graphical
user interfaces. with GNOME are Fedora, RHEL, CentOS,
3: Ubuntu used to ship with Debian, Zorin, and Ubuntu. 3
Unity as the default desktop ▶ KDE Plasma - A highly customizable desktop
environment, but switched environment based on the Qt Tookit. 4 Many
to GNOME in 2017.
4: Qt is cross-platform appli-
distributions like Slackware and OpenSUSE
cation development frame- ship with KDE Plasma as the default desktop
work for creating graphical environment, and most others have the option
user interfaces. to install with KDE Plasma. Ubuntu’s KDE
Plasma variant is called Kubuntu.
▶ Xfce - A lightweight desktop environment
known for its speed and simplicity. It is based
on the GTK Toolkit. It is used in many distri-
butions like Xubuntu, Manjaro, and Fedora.
1.1 Introduction 5
▶ LXQt - A lightweight desktop environment
known for its speed and simplicity. It is based
on the Qt Toolkit. It is used in many distribu-
tions like Lubuntu.
▶ Cinnamon
▶ MATE
Desktop Environment Share
Xfce
It is important to note that although some distri- 36%
butions come pre-bundled with certain Desktop
KDE Plasma 28%
Environments, it doesn’t mean that you cannot 8%
Others
11%
17%
use another DE with it. DE are simply packages Cinnamon
installed on your distribution, and almost all the Gnome Shell
popular DEs can be installed on all distributions. Figure 1.2: Desktop Environ-
Many distributions also come with multiple pre- ment Usage in 2022
bundled desktop environments due to user pref-
erences. Most server distributions and some en-
thusiast distributions come with no pre-bundled
desktop environment, and let the user determine
which one is required, or if one is required.
1.1.3 Window Managers
Definition 1.1.3 (Window Manager) A window
manager is system software that controls the
placement and appearance of windows within a
windowing system in a graphical user interface.
It is a part of the desktop environment, but can
also be used standalone. It is responsible for
the appearance and placement of windows, and
can also provide additional functionality like
virtual desktops, window decorations, window
title bars, and tiling.
Although usually bundled with a desktop environ-
ment, many window managers are also standalone
and installed separately by the user if they don’t
6 1 Essentials of Linux
want to use all the application from a single desktop
environment.
Some popular window managers are:
▶ Openbox - A lightweight window manager
known for its speed and simplicity. It is used
in many distributions like Lubuntu.
5: A tiling window man-
ager is a window manager ▶ i3 - It is a tiling window manager 5 which
that automatically splits the is usually one of the first window managers
screen into non-overlapping that users try when they want to move away
frames, which are used to
from a desktop environment and to a tiling
display windows. Most desk-
top environments ship with window manager.
a floating window manager ▶ awesome - A tiling window manager that
instead, which users of other is highly configurable and extensible. It is
operating systems are more
written in Lua and is known for its beautiful
familiar with.
configurations.
▶ bspwm - A tiling window manager. It is based
on binary space partitioning.
▶ dwm - A dynamic tiling window manager
that is known for its simplicity and minimal-
ism. It is written in C and is highly config-
urable.
1.1.4 Why Linux?
You might be wondering "Why should I use Linux?"
Most people use either Windows or Mac on their
personal computers. Although these consumer op-
erating systems get the job done, they don’t let the
6: Although Linux is just a user completely control their own hardware and
kernel and not an entire op-
software. Linux 6 is a free and open-source oper-
erating system, throughout
this book I would be refer- ating system that gives the user complete control
ring to GNU/Linux, the com- over their system. It is highly customizable and can
bination of GNU core utili- be tailored to the user’s needs. It is also known
ties and the Linux kernel, as
for its stability and security. It is used in almost all
Linux in short.
servers, supercomputers, and embedded systems. It
1.1 Introduction 7
is also used in many consumer devices like Android
phones, smart TVs, and smartwatches.
In this course we will be covering how to navigate
the linux file system, how to manage files, how to
manage the system, and how to write scripts to
automate tasks. In the later part of the course we
go over concepts such as pattern matching and text
processing.
This course does not go into details of the linux ker-
nel, but rather attempts to make the reader familiar
with the GNU core utils and able to navigate around
a linux server easily.
1.1.5 What is Shell?
The kernel is the core of the operating system.
It is responsible for managing the hardware and
providing services to the user programs. The shell Figure 1.3: Operating Sys-
is the interface between the user and the kernel tem Onion Rings - The layers
of an operating system
(Figure 1.3). Through the shell we can run many
commands and utilities, as well as some inbuilt
7: POSIX, or Portable Op-
features of the shell.
erating System Interface, is
a set of standards that
Definition 1.1.4 (Shell) A shell is a command- define the interfaces and
environment that operat-
line interpreter that provides a way for the user ing systems use to ac-
to interact with the operating system. It takes cess POSIX-compliant appli-
commands from the user and executes them. It cations. POSIX standards are
is a program that provides the user with a way based on the Unix operating
system and were released in
to interact with the operating system. the late 1980s.
8: Fish is a non-POSIX com-
pliant shell that is known
The most popular shell in Linux is the bash shell. for its features like auto-
It is the default shell in most distributions. It is suggestion, syntax highlight-
a POSIX-compliant 7 shell. There are many other ing, and tab completions. Al-
shells available, such as zsh, fish 8 , dash, csh, ksh, though a useful alternative
to other shells for scripting,
and tcsh. Each shell has its own features and syntax, it should not be set as the
default shell.
8 1 Essentials of Linux
but most of the keywords and syntax are the same.
In this course we will be covering only the bash
shell and its syntax, but most of what we learn here
is also applicable on other shells as well.
1.1.6 Shell vs Terminal
Definition 1.1.5 (Terminal) A terminal is a pro-
gram that provides a way to interact with the
shell. It is a program that provides a text-based
interface to the shell. It is also known as a ter-
minal emulator.
The terminal is the window that you see when
you open a terminal program. It provides a way
to interact with the shell. The shell is the program
that interprets the commands that you type in the
terminal. The terminal is the window that you
see, and the shell is the program that runs in that
window. Whereas the shell is the application that
is parsing your input and running the commands
and keywords, the terminal is the application that
lets you see the shell graphically. There are multiple
different terminal emulators, providing a lot of
customization and beautification to the terminal, as
well as providing useful features such as scroll back,
copying and pasting, and so on.
Some popular terminal emulators are:
▶ gnome-terminal - The default terminal emu-
lator for the GNOME desktop environment.
▶ konsole - The default terminal emulator for
the KDE desktop environment.
▶ xfce4-terminal - The default terminal emula-
tor for the Xfce desktop environment.
▶ alacritty - A terminal emulator known for its
speed and simplicity.
1.1 Introduction 9
▶ terminator - A terminal emulator known for
its features like splitting the terminal into
multiple panes.
▶ tilix - A terminal emulator known for its fea-
tures like splitting the terminal into multiple
panes.
▶ st - A simple terminal emulator known for its
simplicity and minimalism.
▶ urxvt
▶ kitty
▶ terminology
In most terminal emulators, there are some basic
shortcuts that can be used to make the terminal ex-
perience more efficient. Some of the basic shortcuts
are listed in Table Table 1.1.
Table 1.1: Basic Shortcuts in Terminal
Shortcut Description
Ctrl + C Terminate the current process
Ctrl + D Exit the shell
Ctrl + L Clear the terminal screen
Ctrl + A Move the cursor to the beginning of the line
Ctrl + E Move the cursor to the end of the line
Ctrl + U Delete from the cursor to the beginning of the line
Ctrl + K Delete from the cursor to the end of the line
Ctrl + W Delete the word before the cursor
Ctrl + Y Paste the last deleted text
Ctrl + R Search the command history
Ctrl + Z Suspend the current process
Ctrl + \\ Terminate the current process
Ctrl + S Pause the terminal output
Ctrl + Q Resume the terminal output
10 1 Essentials of Linux
1.1.7 Why the Command Line?
Both the command line interface (CLI) and the
graphical user interface (GUI) are simply shells
over the operating system’s kernel. They let you
interact with the kernel, perform actions and run
applications.
GUI:
The GUI requires a mouse and a keyboard, and is
more intuitive and easier to use for beginners. But
it is also slower and less efficient than the CLI. The
GUI severely limits the user’s ability to automate
tasks and perform complex operations. The user can
only perform those operations that the developers
of the GUI have thought of and implemented.
CLI:
The CLI is faster and more efficient than the GUI
as it lets the user use the keyboard to perform
actions. Instead of clicking on pre-defined buttons,
the CLI lets you construct your instruction to the
computer using syntax and semantics. The CLI lets
you combine simple commands that do one thing
well to perform complex operations. The biggest
advantage of the CLI is that it lets you automate
tasks. It might be faster for some users to rename
a file from file1 to file01 using the GUI, but it will
always be faster to automate this using the CLI if
you want to do this for thousands of files in the
folder.
In this course we will be learning how to use the CLI
to navigate the file system, manage files, manage the
system, process text, and write scripts to automate
tasks.
1.1 Introduction 11
1.1.8 Command Prompt
The command prompt is the text that is displayed
in the terminal to indicate that the shell is ready
to accept commands. It usually ends with a $ or a
# symbol. The $ symbol indicates that the shell is
running as a normal user, and the # symbol indicates
that the shell is running as the root user. The root
user has complete control over the system and can
perform any operation on the system.
An example of a command prompt is:
1 username@hostname:~$
Here, username is the name of the user, hostname is
the name of the computer, and $ indicates that the
shell is running as a normal user. The ~ symbol
indicates that the current working directory is the
user’s home directory. The home directory is the
directory where the user’s files and settings are
stored. It is usually located at /home/username. This
can be shorted to ~ in the shell. This prompt can
be changed and customized according to the user’s
preferences using the PS1 variable discussed in
Chapter 8.
12 1 Essentials of Linux
1.2 Simple Commands in GNU
Core Utils
Definition 1.2.1 (GNU Core Utils) The GNU
Core Utilities are the basic file, shell, and text
manipulation utilities of the GNU operating
Figure 1.4: GNU Core Utils system. These are the utilities that are used to
Logo interact with the operating system and perform
basic operations. a
a GNU Core Utils
The shell lets you simply type in the name of the
command and press enter to run it. You can also
pass arguments to the command to modify its be-
havior. Although the commands are simple, they
are powerful and can be combined to perform com-
9: The combination of com-
plex operations. 9
mands to perform complex
operations is called piping.
This will be covered later.
Some basic commands in the core-utils are listed in
Table Table 1.2 on page 14.
1.2.1 File System Navigation
pwd:
The pwd command prints the current working direc-
tory. The current working directory is the directory
that the shell is currently in. The shell starts in
the user’s home directory when it is opened. The
pwd command prints the full path of the current
working directory.
1 $ pwd
2 /home/username
ls:
1.2 Simple Commands in GNU Core Utils 13
The ls command lists the contents of a directory. By
default, it lists the contents of the current working
directory. The ls command can take arguments to
list the contents of a different directory.
1 $ ls
2 Desktop Documents Downloads Music
Pictures Videos
We can also list hidden files 10 using the -a flag. 10: Hidden files are files
whose names start with a dot.
1 $ ls -a They are hidden by default
2 . .. .bashrc Desktop Documents in the ls command.
Downloads Music Pictures Videos
Here, the . and .. directories are special directories.
The . directory is the current directory, and the ..
directory is the parent directory. The .bashrc file is
a configuration file for the shell which is a hidden
file.
ls can also list the details of the files using the -l
flag.
1 $ ls -l
2 total 24
3 drwxr-xr-x 2 username group 4096 Mar 1
12:00 Desktop
4 drwxr-xr-x 2 username group 4096 Mar 1
12:00 Documents
5 drwxr-xr-x 2 username group 4096 Mar 1
12:00 Downloads
6 drwxr-xr-x 2 username group 4096 Mar 1
12:00 Music
7 drwxr-xr-x 2 username group 4096 Mar 1
12:00 Pictures
8 drwxr-xr-x 2 username group 4096 Mar 1
12:00 Videos
14 1 Essentials of Linux
Table 1.2: Basic Commands in GNU Core Utils
Command Description
ls List the contents of a directory
cd Change the current working directory
pwd Print the current working directory
mkdir Create a new directory
rmdir Remove a directory
touch Create a new file
rm Remove a file
cp Copy a file
mv Move a file
echo Print a message
cat Concatenate and display the contents of a file
less Display the contents of a file one page at a time
head Display the first few lines of a file
tail Display the last few lines of a file
find Find files and directories
locate Find files and directories
which Find the location of a command
uname Print system information
ps Display information about running processes
kill Terminate a process
chmod Change the permissions of a file
chown Change the owner of a file
chgrp Change the group of a file
date Print the current date and time
cal, ncal Print a calendar
df Display disk space usage
du Display disk usage
free Display memory usage
top Display system information
history Display the command history
sleep Pause the shell for a specified time
true Do nothing, successfully
false Do nothing, unsuccessfully
tee Read from stdin and write to stdout and files
whoami Print the current user
groups Print the groups the user belongs to
clear Clear the terminal screen
exit Exit the shell
1.2 Simple Commands in GNU Core Utils 15
As seen in Figure 1.5, the first column is the file
type and permissions. The second column is the
number of links to the file or directory. The third
and fourth columns are the owner and group of
Figure 1.5: ls -l Output
the file or directory. The fifth column is the size of
the file or directory. The sixth, seventh, and eighth
columns are the last modified date and time of the
file or directory. The ninth column is the name of
the file or directory. 11 11: More details about the
file permissions and the file
We can also list the inode numbers 12 using the -i types will be covered later in
the course.
flag.
12: An inode is a data struc-
1 $ ls -i ture on a filesystem on Linux
2 123456 Desktop 123457 Documents 123458 and other Unix-like operat-
ing systems that stores all
Downloads 123459 Music 123460 Pictures
the information about a file
123461 Videos
except its name and its ac-
tual data. This includes the
Inodes will be discussed in detail later in the course. file type, the file’s owner, the
file’s group, the file’s permis-
cd: sions, the file’s size, the file’s
last modified date and time,
The cd command changes the current working di- and the file’s location on the
rectory. It takes the path to the directory as an disk. The inode is the refer-
ence pointer to the data in
argument. the disk.
1 $ cd Documents
2 $ pwd
3 /home/username/Documents
The cd command can also take the ~ symbol as an
argument to change to the user’s home directory.
This is the default behavior of the cd command
when no arguments are passed.
1 $ cd
2 $ pwd
3 /home/username
If we want to go back to the previous directory, 13: This internally uses the
we can use the - symbol as an argument to the cd OLDPWD environment vari-
able to change the directory.
command. 13 More about variables will be
covered later in the course.
16 1 Essentials of Linux
1 $ cd Documents
2 $ pwd
3 /home/username/Documents
4 $ cd -
5 $ pwd
6 /home/username
Question 1.2.1 ls can only show files and direc-
14: cwd means Current tories in the cwd14 , not subdirectories. True or
Working Directory False?
Answer 1.2.1 False. ls can show files and direc-
tories in the cwd, and also in subdirectories. The
-R flag can be used to show files and directories
in subdirectories, recursively.
1.2.2 Manuals
man:
How to remember so many flags and options for
each of the commands? The man command is used
to display the manual pages for a command.
Definition 1.2.2 (Manual Pages) Manual pages
are a type of software documentation that pro-
vides details about a command, utility, function,
or file format. They are usually written in a
simple and concise manner and provide infor-
mation about the command’s syntax, options,
and usage.
1 $ man ls
This will display the manual page for the ls com-
mand. The manual page is divided into sections,
1.2 Simple Commands in GNU Core Utils 17
and you can navigate through the sections using
the arrow keys. Press q to exit the manual page.
Example manual page:
1
2 LS(1)
User Commands
LS(1)
3
4 NAME
5 ls - list directory contents
6
7 SYNOPSIS
8 ls [OPTION]... [FILE]...
9
10 DESCRIPTION
11 List information about the FILEs (the
current directory by default). Sort
entries alphabetically if none of -
cftuvSUX nor --sort is specified.
12
13 Mandatory arguments to long options are
mandatory for short options too.
14
15 -a, --all
16 do not ignore entries starting
with .
17
18 -A, --almost-all
19 do not list implied . and ..
20 ...
The manual page provides information about the
command, its syntax, options, and usage. It is a good
practice to refer to the manual page of a command
before using it.
To exit the manual page, press q.
18 1 Essentials of Linux
There are multiple sections in the manual page, man
takes the section number as an argument to display
the manual page from that section.
1 $ man 1 ls
This will display the manual page for the ls com-
mand from section 1. The details of the sections can
be seen in Table Table 1.3.
Table 1.3: Manual Page Sec-
tions Section Description
1 User Commands
2 System Calls
3 Library Functions
4 Special Files usually found in /dev
5 File Formats and conventions
6 Games
7 Miscellaneous
8 System Administration
9 Kernel Developer’s Manual
Man pages only provide information about the
commands and utilities that are installed on the
system. They do not provide information about the
shell builtins or the shell syntax. For that, you can
refer to the shell’s documentation or use the help
command.
Some commands also have a --help flag that dis-
plays the usage and options of the command.
Some commands have their own info pages, which
are more detailed than the man pages.
To be proficient with shell commands, one needs to
15: An useful video by read the man, info, and help pages. 15
Robert Elder about the dif-
ferences between man, info,
and help can be found on Exercise 1.2.1 Run man, info, and --help on all
YouTube. the commands discussed in this section. Note
1.2 Simple Commands in GNU Core Utils 19
the differences in the information provided by
each of them. Read the documentations care-
fully and try to understand how each command
works, and the pattern in which the documen-
tations are written.
info:
The info command is used to display the info pages
for a command. The info pages are more detailed
than the man pages for some commands. It is navi-
gable like a hypertext document. There are links to
chapters inside the info pages that can be followed
using the arrow keys and entered using the enter
key. The Table Table 1.4 lists some of the keys that
can be used to navigate the info pages.
Table 1.4: Keys in Info Pages
Key Description
h Display the help page
q Exit the info page
n Move to the next node
p Move to the previous node
u Move up one level
d Move down one level
l Move to the last node
t Move to the top node
g Move to a specific node
<enter> Follow the link
m Display the menu
s Search for a string
S Search for a string (case-sensitive)
help:
The help command is a shell builtin that displays
information about the shell builtins and the shell
syntax.
1 $ help read
20 1 Essentials of Linux
This will list the information about the read builtin
command.
The help command can also be used to display
information about the shell syntax.
1 $ help for
This will list the information about the for loop in
the shell.
Help pages are not paged, and the output is dis-
played in the terminal. To page the output, one can
use the less command.
1 $ help read | less
1.2.3 System Information
uname:
The uname command prints system information. It
can take flags to print specific information about
the system. By default, it prints only the kernel
name.
1 $ uname
2 Linux
16: Here rex is the host-
name of the system, 6.8.2- The -a flag prints all the system information. 16
arch2-1 is the kernel ver-
sion, x86_64 is the architec- 1 $ uname -a
ture, and GNU/Linux is the 2 Linux rex 6.8.2-arch2-1 #1 SMP
operating system. PREEMPT_DYNAMIC Thu, 28 Mar 2024 17:06:35
+0000 x86_64 GNU/Linux
ps:
17: The PID is the process
ID, the TTY is the terminal The ps command displays information about run-
the process is running from, ning processes. By default, it displays information
the TIME is the time the pro-
cess has been running, and
about the processes run by the current user that are
the CMD is the command running from a terminal. 17
that is running.
1.2 Simple Commands in GNU Core Utils 21
1 $ ps
2 PID TTY TIME CMD
3 12345 pts/0 00:00:00 bash
4 12346 pts/0 00:00:00 ps
There are a lot of flags that can be passed to the ps
command to display more information about the
processes. These will be covered in Chapter 4.
Remark 1.2.1 ps has three types of options:
▶ UNIX options
▶ BSD options
▶ GNU options
The UNIX options are the preceded by a hyphen
(-) and may be grouped. The BSD options can
be grouped, but should not be preceded by
a hyphen (-). The GNU options are preceded
by two hyphens (–). These are also called long
options.
The same action can be performed by using
different options, for example, ps -ef and ps
aux are equivalent, although first is using UNIX
options, and the latter is using BSD options.
Another difference in GNU core utils and BSD
utils is that the GNU utils have long options,
whereas the BSD utils do not.
BSD utils also usually do not support having
flags after the positional arguments, whereas
most GNU utils are fine with this.
kill:
The kill command is used to terminate a process.
It takes the process ID as an argument.
1 $ kill 12345
22 1 Essentials of Linux
The kill command can also take the signal number
as an argument to send a signal to the process.
For example, the SIGKILL signal can be sent to the
18: The SIGKILL signal is process to terminate it. 18
used to terminate a process
immediately. It cannot be 1 $ kill -9 12345
caught or ignored by the pro-
cess. It is numbered as 9. free:
The free command is used to display the amount
of free and used memory in the system.
1 $ free
2 total used free
shared buff/cache available
3 Mem: 8167840 1234560 4567890
123456 2367890 4567890
4 Swap: 2097148 0 2097148
The free command can take the -h flag to display
the memory in human-readable format.
1 $ free -h
2 total used free
shared buff/cache available
3 Mem: 7.8Gi 1.2Gi 4.3Gi
120Mi 2.3Gi 4.3Gi
4 Swap: 2.0Gi 0B 2.0Gi
df:
The df command is used to display the amount of
disk space available on the filesystems.
1 $ df
2 Filesystem 1K-blocks Used Available
Use% Mounted on
3 /dev/sda1 12345678 1234567 11111111
10% /
4 /dev/sda2 12345678 1234567 11111111
10% /home
The df command can take the -h flag to display the
disk space in human-readable format.
1.2 Simple Commands in GNU Core Utils 23
1 $ df -h
2 Filesystem Size Used Avail Use%
Mounted on
3 /dev/sda1 12G 1.2G 9.9G 11% /
4 /dev/sda2 12G 1.2G 9.9G 11% /home
du:
The du command is used to display the disk usage
of directories and files. By default, it displays the
disk usage of the current directory.
1 $ du
2 4 ./Desktop
3 4 ./Documents
4 4 ./Downloads
5 4 ./Music
6 4 ./Pictures
7 4 ./Videos
8 28
The du command can take the -h flag to display the
disk usage in human-readable format. The -s flag
displays the total disk usage of the directory.
1 $ du -sh
2 28K .
Question 1.2.2 How to print the kernel version
of your system?
Answer 1.2.2 uname -r will print the kernel ver-
sion of your system. uname is a command to
print system information. The -r flag is to print
the kernel release. There are other flags to print
other system information.
We can also run uname -a to get all fields and
extract only the kernel info using commands
taught in later weeks.
24 1 Essentials of Linux
Question 1.2.3 How to see how long your sys-
tem is running for?
What about the time it was booted up?
Answer 1.2.3 uptime will show how long the
system is running for.
uptime -s will show the time the system was
booted up.
The -s flag is to show the time of last boot.
Question 1.2.4 How to see the amount of free
memory? What about free hard disk space? If
we are unable to understand the big numbers,
how to convert them to human readable format?
What is difference between MB and MiB?
Answer 1.2.4 free will show the amount of free
memory.
df will show the amount of free hard disk space.
df -h and free -h will convert the numbers to
human readable format.
MB is Megabyte, and MiB is Mebibyte.
1 MB = 1000 KB, 1 GB = 1000 MB, 1 TB = 1000
GB, this is SI unit.
1 MiB = 1024 KiB, 1 GiB = 1024 MiB, 1 TiB = 1024
GiB, this is 210 unit.
1.2.4 File Management
file:
The file command is used to determine the type of
a file. It can take multiple file names as arguments.
1.2 Simple Commands in GNU Core Utils 25
1 $ file file1
2 file1: ASCII text
3 $ file /bin/bash
4 /bin/bash: ELF 64-bit LSB shared object, x86
-64, version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2,
for GNU/Linux 3.2.0, BuildID[sha1
]=1234567890abcdef, stripped
mkdir:
The mkdir command is used to create new directo-
ries. It can take multiple directory names as argu-
ments.
1 $ mkdir a b c
2 $ ls -F
3 a/ b/ c/
Exercise 1.2.2 Run man ls to find out what the
-F flag does, and why we used it in the above
example.
touch:
The touch command is used to create new files. It
can take multiple file names as arguments. If a file
is already present, the touch command updates the
last modified date and time of the file, but does not
modify the contents of the file.
1 $ touch file1 file2 file3
2 $ ls -l
3 -rw-r--r-- 1 username group 0 Mar 1 12:00
file1
4 -rw-r--r-- 1 username group 0 Mar 1 12:00
file2
5 -rw-r--r-- 1 username group 0 Mar 1 12:00
file3
6 $ sleep 60
7 $ touch file3
26 1 Essentials of Linux
8 $ ls -l
9 -rw-r--r-- 1 username group 0 Mar 1 12:00
file1
10 -rw-r--r-- 1 username group 0 Mar 1 12:00
file2
11 -rw-r--r-- 1 username group 0 Mar 1 12:01
file3
Exercise 1.2.3 Notice the difference in the last
modified date and time of the file3 file from
the other files. Also notice the sleep command
used to pause the shell for 60 seconds.
rmdir:
The rmdir command is used to remove directories. It
can take multiple directory names as arguments.
1 $ mkdir a b c d
2 $ rmdir a b c
3 $ ls -F
4 d/
Remark 1.2.2 The rmdir command can only
remove empty directories. This is a safety feature
so that users dont accidentally delete directories
with files in them. To remove directories with
files in them along with those files, use the rm
command.
rm:
The rm command is used to remove files and direc-
tories. It can take multiple file and directory names
as arguments.
1 $ touch file1 file2 file3
2 $ ls -F
3 file1 file2 file3
1.2 Simple Commands in GNU Core Utils 27
4 $ rm file1 file2
5 $ ls -F
6 file3
However, using rm to delete a directory will give an
error.
1 $ mkdir a
2 $ rm a
3 rm: cannot remove ’a’: Is a directory
This is because the rm command does not remove
directories by default. This is a safety feature to
prevent users from accidentally deleting directories
with files in them.
To remove directories along with their files, use the
-rflag.
1 $ rm -r a
To force the removal of files and directories without
a confirmation, use the -f flag.
Warning 1.2.1 The rm command is a dangerous
command. It does not move the files to the trash,
but permanently deletes them. Be extremely
careful when using the rm command. Only use
the -f flag if you are absolutely sure that you
want to delete the files.
To force rm to always ask for confirmation before
deleting files, use the -i flag.
1 $ rm -i file3
2 rm: remove regular empty file ’file3’? y
cp:
The cp command is used to copy files. It takes the
source file and the destination file as arguments.
28 1 Essentials of Linux
1 $ touch file1
2 $ ls -F
3 file1
4 $ cp file1 file2
5 $ ls -F
6 file1 file2
The cp command can also take the -r flag to copy
directories.
1 $ mkdir a
2 $ touch a/file1
3 $ cp -r a b
4 $ ls -R
5 .:
6 a/ b/
7
8 ./a:
9 file1
10
11 ./b:
12 file1
Exercise 1.2.4 Why did we use the -R flag in the
above example? What does it do?
There are three ways to copy files using cp:
1 SYNOPSIS
2 cp [OPTION]... [-T] SOURCE DEST
3 cp [OPTION]... SOURCE... DIRECTORY
4 cp [OPTION]... -t DIRECTORY SOURCE...
Exercise 1.2.5 There are three ways of running
the cp command to copy a file. Here we have
demonstrated only one. Read the manual page
of the cp command to find out the other two
ways and try them out yourself.
1.2 Simple Commands in GNU Core Utils 29
mv:
The mv command is used to move files. The syntax is
similar to the cp command. 19 It is used to move files 19: This means that mv also
has three ways of running it.
from one location to another, or to rename files.
1 $ touch file1
2 $ ls -F
3 file1
4 $ mv file1 file2
5 $ ls -F
6 file2
Exercise 1.2.6 Create a directory dir1 using the
mkdir command, then create a file file1 inside
dir1. Now move (rename) the dir1 directory to
dir2 using the mv command. The newly created
directory should be named dir2 and should
contain the file1 file. Were you require to use
the -r flag with the mv command like you would
have in cp command?
1.2.5 Text Processing and Pagers
echo:
The echo command is used to print a message to
the terminal. It can take multiple arguments and
print them to the terminal.
1 $ echo Hello, World!
2 Hello, World!
The echo command can also take the -e flag to
interpret backslash escapes.
1 $ echo -e "Hello, \nWorld!"
2 Hello,
3 World!
30 1 Essentials of Linux
Some escape characters in echo are listed in Table
Table 1.5.
Table 1.5: Escape Characters
in echo Escape Description
\\ backslash
\a alert (BEL)
\b backspace
\c produce no further output
\e escape
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
\0NNN byte with octal value NNN (1 to 3 digits)
\xHH byte with hexadecimal value HH (1 to 2 digits)
Exercise 1.2.7 Run the command echo -e "\
x41=\0101" and try to understand the output
and the escape characters used.
cat:
The cat command is used to concatenate and dis-
play the contents of files.
1 $ cat file1
2 Hello, World! from file1
The cat command can take multiple files as ar-
guments and display their contents one after an-
other.
1 $ cat file1 file2
2 Hello, World! from file1
3 Hello, World! from file2
less:
1.2 Simple Commands in GNU Core Utils 31
Sometimes the contents of a file are too large to
be displayed at once. Nowadays modern terminal
emulators can scroll up and down to view the
contents of the file, but actual ttys cannot do that.
To view the contents of a file one page at a time,
use the less command. less is a pager program that
displays the contents of a file one page at a time.
20 20: more is another pager
program that displays the
1 $ less file1 contents of a file one page
at a time. It is older than
To scroll up and down, use the arrow keys, or the j less and has fewer features.
and k keys. 21 Press q to exit the less command. less is an improved version
of more and is more com-
head: monly used. Due to this, it
is colloquially said that "less
The head command is used to display the first few is more", as it has more fea-
lines of a file. By default, it displays the first 10 lines tures.
21: Using j and k to move
of a file.
the cursor up and down is
1 $ head /etc/passwd a common keybinding in
many terminal applications.
2 root:x:0:0:root:/root:/usr/bin/bash
This originates from the vi
3 bin:x:1:1::/:/usr/bin/nologin text editor which will be cov-
4 daemon:x:2:2::/:/usr/bin/nologin ered later in the course.
5 mail:x:8:12::/var/spool/mail:/usr/bin/
nologin
6 ftp:x:14:11::/srv/ftp:/usr/bin/nologin
7 http:x:33:33::/srv/http:/usr/bin/nologin
8 nobody:x:65534:65534:Kernel Overflow User
:/:/usr/bin/nologin
9 dbus:x:81:81:System Message Bus:/:/usr/bin/
nologin
10 systemd-coredump:x:984:984:systemd Core
Dumper:/:/usr/bin/nologin
11 systemd-network:x:982:982:systemd Network
Management:/:/usr/bin/nologin
The head command can take the -n flag to display
the first n lines of a file. 22 22: We can also directly run
head -5 /etc/passwd to
1 $ head -n 5 /etc/passwd display the first 5 lines of the
2 root:x:0:0:root:/root:/usr/bin/bash file.
3 bin:x:1:1::/:/usr/bin/nologin
32 1 Essentials of Linux
4 daemon:x:2:2::/:/usr/bin/nologin
5 mail:x:8:12::/var/spool/mail:/usr/bin/
nologin
6 ftp:x:14:11::/srv/ftp:/usr/bin/nologin
Remark 1.2.3 Here we are listing the file /etc
/passwd which contains information about the
users on the system. The file will usually be
present on all Unix-like systems and have a lot
23: A system user is a user of system users. 23
that is used by the system to
run services and daemons.
It does not belong to any tail:
human and usually is not
logged into. System users The tail command is used to display the last few
have a user ID less than 1000. lines of a file. By default, it displays the last 10 lines
of a file.
1 $ tail /etc/passwd
2 rtkit:x:133:133:RealtimeKit:/proc:/usr/bin/
nologin
3 sddm:x:964:964:SDDM Greeter Account:/var/lib
/sddm:/usr/bin/nologin
4 usbmux:x:140:140:usbmux user:/:/usr/bin/
nologin
5 sayan:x:1000:1001:Sayan:/home/sayan:/bin/
bash
6 qemu:x:962:962:QEMU user:/:/usr/bin/nologin
7 cups:x:209:209:cups helper user:/:/usr/bin/
nologin
8 dhcpcd:x:959:959:dhcpcd privilege separation
:/:/usr/bin/nologin
9 saned:x:957:957:SANE daemon user:/:/usr/bin/
nologin
The tail command can take the -n flag to display
the last n lines of a file.
1 $ tail -n 5 /etc/passwd
2 sayan:x:1000:1001:Sayan:/home/sayan:/bin/
bash
3 qemu:x:962:962:QEMU user:/:/usr/bin/nologin
1.2 Simple Commands in GNU Core Utils 33
4 cups:x:209:209:cups helper user:/:/usr/bin/
nologin
5 dhcpcd:x:959:959:dhcpcd privilege separation
:/:/usr/bin/nologin
6 saned:x:957:957:SANE daemon user:/:/usr/bin/
nologin
Exercise 1.2.8 Notice that the UID (3rd column)
of the sayan user is 1000. The last column is /bin
/bash instead of /usr/bin/nologin like others.
This is because it is a regular user and not a
system user.
wc:
The wc command is used to count the number of
lines, words, and characters in a file. By default, it
displays the number of lines, words, and characters
in a file.
1 $ wc /etc/passwd
2 43 103 2426 /etc/passwd
We can also use the -l, -w, and -c flags to display
only the number of lines, words, and characters
respectively.
1 $ wc -l /etc/passwd
2 43 /etc/passwd
Question 1.2.5 Can we print contents of multi-
ple files using a single command?
Answer 1.2.5 cat file1 file2 file3 will print
the contents of file0, file2, and file3 in the
order given. The contents of the files will be
printed one after the other.
34 1 Essentials of Linux
Question 1.2.6 Can cat also be used to write to
a file?
Answer 1.2.6 Yes, cat > file1 will write to file1
. The input will be taken from the terminal and
written to file1. The input will be written to
file1 until the user presses Ctrl+D to indicate
end of input. This is redirection, which we see
in later weeks.
Question 1.2.7 How to list only first 10 lines of
a file? How about first 5? Last 5? How about
lines 105 to lines 152?
Answer 1.2.7 head filename will list the first 10
lines of filename.
head -n 5 filename will list the first 5 lines of
filename.
tail -n 5 filename will list the last 5 lines of
filename.
head -n 152 filename tail -n 48| will list lines
105 to 152 of filename. This uses | which is a
pipe, which we will see in later weeks.
Question 1.2.8 Do you know how many lines a
file contains? How can we count it? What about
words? Characters?
Answer 1.2.8 wc filename will count the num-
ber of lines, words, and characters in filename.
wc -l filename will count the number of lines
in filename.
wc -w filename will count the number of words
1.2 Simple Commands in GNU Core Utils 35
in filename.
wc -c filename will count the number of char-
acters in filename.
Question 1.2.9 How to delete an empty direc-
tory? What about non-empty directory?
Answer 1.2.9 rmdir dirname will delete an empty
directory.
rm -r dirname will delete a non-empty direc-
tory.
Question 1.2.10 How to copy an entire folder
to another name? What about moving?
Why the difference in flags?
Answer 1.2.10 cp -r sourcefolder targetfolder
will copy an entire folder to another name.
mv sourcefolder targetfolder will move an en-
tire folder to another name.
The difference in flags is because cp is used to
copy, and mv is used to move or rename a file
or folder. The -r flag is to copy recursively, and
is not needed for mv as it is not recursive and
simply changes the name of the folder (or the
path).
1.2.6 Aliases and Types of Commands
alias:
The alias command is used to create an alias for
a command. An alias is a custom name given to a
36 1 Essentials of Linux
command that can be used to run the command.
24: Aliases are used to cre- 24
ate shortcuts for long com-
mands. They can also be 1 $ alias ll=’ls -l’
used to create custom com- 2 $ ll
mands.
3 total 24
4 drwxr-xr-x 2 username group 4096 Mar 1
12:00 Desktop
5 drwxr-xr-x 2 username group 4096 Mar 1
12:00 Documents
6 drwxr-xr-x 2 username group 4096 Mar 1
12:00 Downloads
7 drwxr-xr-x 2 username group 4096 Mar 1
12:00 Music
8 drwxr-xr-x 2 username group 4096 Mar 1
12:00 Pictures
9 drwxr-xr-x 2 username group 4096 Mar 1
12:00 Videos
The alias ll is created for the ls -l command.
Warning 1.2.2 Be careful when creating aliases.
Do not create aliases for existing commands.
This can lead to confusion and errors.
But this alias is temporary and will be lost when
the shell is closed. To make the alias permanent,
add it to the shell configuration file. For bash, this
is the .bashrc file in the home directory.
1 $ echo "alias ll=’ls -l’" >> ~/.bashrc
This will add the alias to the .bashrc file. To make
the changes take effect, either close the terminal
and open a new one, or run the source command.
1 $ source ~/.bashrc
Warning 1.2.3 Be careful when editing the shell
configuration files. A small mistake can lead to
1.2 Simple Commands in GNU Core Utils 37
the shell not working properly. Always keep a
backup of the configuration files before editing
them.
We can see the aliases that are currently set using
the alias command.
1 $ alias
2 alias ll=’ls -l’
We can also see what a particular alias expands to
using the alias command with the alias name as
an argument.
1 $ alias ll
2 alias ll=’ls -l’
To remove an alias, use the unalias command.
1 $ unalias ll
Exercise 1.2.9 Create an alias la for the ls -a
command. Make it permanent by adding it to
the .bashrc file. Check if the alias is set using
the alias command.
Exercise 1.2.10 Create an alias rm for the rm -i
command. Make it permanent by adding it to
the .bashrc file. Check if the alias is set using
the alias command. Try to delete a file using
the rm command. What happens?
which:
The which command is used to show the path of the
command that will be executed.
1 $ which ls
2 /sbin/ls
38 1 Essentials of Linux
We can also list all the paths where the command
is present using the -a flag.
1 $ which -a ls
2 /sbin/ls
3 /bin/ls
4 /usr/bin/ls
5 /usr/sbin/ls
This means that if we delete the /sbin/ls file, the
/bin/lsfile will be executed when we run the ls
command.
whatis:
The whatis command is used to show a short de-
scription of the command.
1 $ whatis ls
2 ls (1) - list directory contents
3 ls (1p) - list directory contents
Here the brackets show the section of the manual
where the command is present. This short exerpt is
taken from its man page itself.
whereis:
The whereis command is used to show the location
of the command, source files, and man pages.
1 $ whereis ls
2 ls: /usr/bin/ls /usr/share/man/man1/ls.1.gz /
usr/share/man/man1p/ls.1p.gz
Here we can see that the ls command is present
in /usr/bin/ls, and its man pages are present in
/usr/share/man/man1/ls.1.gz and /usr/share/man/
man1p/ls.1p.gz.
locate:
The locate command is used to find files by name.
The file can be present anywhere in the system and
1.2 Simple Commands in GNU Core Utils 39
if it is indexed by the mlocate database, it can be
found using the locate command.
1 $ touch tmp/48/hellohowareyou
2 $ pwd
3 /home/sayan
4 $ locate hellohowareyou
5 /home/sayan/tmp/48/hellohowareyou
Note: you may have to run updatedb to update the
database before using locate. This can only be run
by the root user or using sudo.
type:
The type command is used to show how the shell
will interpret a command. Usually some commands
are both an executable and a shell built-in. The type
command will show which one will be executed.
1 $ type ls
2 ls is hashed (/sbin/ls)
3 $ type cd
4 cd is a shell builtin
This shows that the ls command is an executable,
and the cd command is a shell built-in.
We can also use the -a flag to show all the ways the
command can be interpreted.
1 $ type -a pwd
2 pwd is a shell builtin
3 pwd is /sbin/pwd
4 pwd is /bin/pwd
5 pwd is /usr/bin/pwd
6 pwd is /usr/sbin/pwd
Here we can see that the pwd command is a shell
built-in, and is also present in multiple locations
in the system. But if we run the pwd command, the
shell built-in will be executed.
40 1 Essentials of Linux
type is also useful when you are not sure whether
to use man or help for a command. Generally for a
shell built-in, help is used, and for an executable
the info and the man pages are used.
Types of Commands: A command can be an alias,
a shell built-in, a shell function, keyword, or an
executable.
The type command will show which type the com-
mand is.
▶ alias: A command that is an alias to another
command defined by the user or the system.
▶ builtin: A shell built-in command is a com-
mand that is built into the shell itself. It is
executed internally by the shell. This is usu-
ally faster than an external command.
▶ file: An executable file that is stored in the file
system. It has to be stored somewhere in the
PATH variable.
▶ function: A shell function is a set of com-
mands that are executed when the function
is called.
▶ keyword: A keyword is a reserved word that
is part of the shell syntax. It is not a command,
but a part of the shell syntax.
Exercise 1.2.11 Find the path of the true com-
mand using which. Find a short description of
the true command using whatis. Is the exe-
cutable you found actually the one that is exe-
cuted when you run true? Check using type
true
Question 1.2.11 How to create aliases? How to
make them permanent? How to unset them?
1.2 Simple Commands in GNU Core Utils 41
Answer 1.2.11 alias name=’command’ will create
an alias.
unalias name will unset the alias.
To make them permanent, add the alias to the
$\sim$/.bashrc file.
The $\sim$/.bashrc file is a script that is exe-
cuted whenever a new terminal is opened.
Question 1.2.12 How to run the normal version
of a command if it is aliased?
Answer 1.2.12 \command will run the normal
version of command if it is aliased.
Question 1.2.13 What is the difference between
which, whatis, whereis, locate,and type?
Answer 1.2.13 Each of the commands serve a
different purpose:
▶ which will show the path of the command
that will be executed.
▶ whatis will show a short description of
the command.
▶ whereis will show the location of the com-
mand, source files, and man pages.
▶ locate is used to find files by name.
▶ type will show how the command will be
interpreted by the shell.
1.2.7 User Management
whoami:
42 1 Essentials of Linux
The whoami command is used to print the username
of the current user.
1 $ whoami
2 sayan
groups:
The groups command is used to display the groups
that the current user belongs to.
1 $ groups
2 sys wheel rfkill autologin sayan
passwd
The passwd command is used to change the pass-
word of the current user. The root user can also
25: This executable is a spe- change the password of other users. 25
cial one, as it is a setuid pro-
gram. This will be discussed who:
in detail in Section 1.4.6.
The who command is used to display the users who
are currently logged in.
1 $ who
2 sayan tty2 2024-05-22 13:49
3 sayan pts/0 2024-05-22 15:58 (:0)
4 sayan pts/1 2024-05-22 15:58 (tmux
(1082933).%2)
5 sayan pts/2 2024-05-22 15:58 (tmux
(1082933).%1)
6 sayan pts/3 2024-05-22 15:58 (tmux
(1082933).%3)
7 sayan pts/4 2024-05-22 15:58 (tmux
(1082933).%4)
8 sayan pts/5 2024-05-22 15:58 (tmux
(1082933).%5)
9 sayan pts/6 2024-05-22 15:58 (tmux
(1082933).%6)
10 sayan pts/7 2024-05-22 15:58 (tmux
(1082933).%7)
11 sayan pts/8 2024-05-22 15:58 (tmux
(1082933).%8)
1.2 Simple Commands in GNU Core Utils 43
12 sayan pts/9 2024-05-22 15:58 (tmux
(1082933).%9)
13 sayan pts/10 2024-05-22 17:58 (:0)
14 sayan pts/11 2024-05-22 18:24 (tmux
(1082933).%10)
15 sayan pts/12 2024-05-22 18:24 (tmux
(1082933).%11)
Exercise 1.2.12 Run the who command on the
system commands VM. What is the output?
w:
The w command is used to display the users who
are currently logged in and what they are doing.
1 $ w
2 19:47:07 up 5:57, 1 user, load average:
0.77, 0.80, 0.68
3 USER TTY LOGIN@ IDLE JCPU PCPU
WHAT
4 sayan tty2 13:49 5:57m 19:10 21.82
s dwm
This is different from the who command as it only
considers the login shell. Here dwm is the window
manager running on the tty2 terminal.
1.2.8 Date and Time
date:
The date command is used to print formatted date
and time information. Without any arguments, it
prints the current date and time.
1 $ date
2 Mon May 20 06:23:07 PM IST 2024
44 1 Essentials of Linux
We can specify the date and time to be printed using
the -d flag.
1 $ date -d "2020-05-20 00:30:45"
2 Wed May 20 12:30:45 AM IST 2020
3 $ date -d "2019-02-29"
4 date: invalid date ’2019-02-29’
5 $ date -d "2020-02-29"
6 Sat Feb 29 12:00:00 AM IST 2020
Exercise 1.2.13 Why did we get an error when
trying to print the date 2019-02-29?
We can also modify the format of the date and
time using the + flag and different format specifiers.
Some of the important format specifiers are listed
in Table Table 1.6. Rest of the format specifiers can
be found in the date manual page.
1 $ date +"%Y-%m-%d %H:%M:%S"
2 2024-05-20 18:23:07
3 $ date +"%A, %B %d, %Y"
4 Monday, May 20, 2024
Table 1.6: Date Format Spec-
ifiers Specifier Description
\%Y Year
\%m Month
\%d Day
\%H Hour
\%M Minute
\%S Second
\%A Full weekday name
\%B Full month name
\%a Abbreviated weekday name
\%b Abbreviated month name
We can even mention relative dates and times using
the date command.
1.2 Simple Commands in GNU Core Utils 45
1 $ date -d "next year"
2 Tue May 19 06:23:07 PM IST 2025
3 $ date -d "next month"
4 Thu Jun 20 06:23:07 PM IST 2024
5 $ date -d "tomorrow"
6 Tue May 21 06:23:07 PM IST 2024
7 $ date -d "yesterday"
8 Sun May 19 06:23:07 PM IST 2024
cal:
The cal command is used to print a calendar. By de-
fault, it prints the calendar of the current month.
1 $ cal
2 May 2024
3 Su Mo Tu We Th Fr Sa
4 1 2 3 4
5 5 6 7 8 9 10 11
6 12 13 14 15 16 17 18
7 19 20 21 22 23 24 25
8 26 27 28 29 30 31
We can specify the month and year to print the
calendar of that month and year.
1 $ cal 2 2024
2 February 2024
3 Su Mo Tu We Th Fr Sa
4 1 2 3
5 4 5 6 7 8 9 10
6 11 12 13 14 15 16 17
7 18 19 20 21 22 23 24
8 25 26 27 28 29
There are multiple flags that can be passed to the
cal command to display different types of calendars
and of multiple months or of entire year.
Remark 1.2.4 In Linux, there are sometimes mul-
tiple implementations of the same command.
46 1 Essentials of Linux
For example, there are two implementations of
the cal command, one in the bsdmainutils pack-
age, which is the BSD implementation and also
includes another binary named ncal for printing
the calendar in vertical format. The other imple-
mentation is in the util-linux package, which
does not contain a ncal binary. The flags and the
output of the cal command can differ between
the two implementations.
Question 1.2.14 How to print the current date
and time in some custom format?
Answer 1.2.14 date -d today +\%Y-\%m-\%d will
print the current date in the format YYYY-MM-
DD. The format can be changed by changing
the format specifiers. The format specifiers are
given in the man date page. The -d today can be
dropped, but is mentioned to show that the date
can be changed to any date. It can be strings like
’2024-01-01’ or ’5 days ago’ or ’yesterday’, etc.
These are some of the basic commands that are
used in the terminal. Each of these commands has
many more options and flags that can be used to
customize their behavior. It is left as an exercise to
the reader to explore the manual pages of these
commands and try out the different options and
flags.
Many of the commands that
we have discussed here are
also explained in the form
of short videos on Robert El-
der’s Youtube Channel.
1.3 Navigating the File System 47
1.3 Navigating the File System
1.3.1 What is a File System?
Unlike Windows which has different drive letters
for different partitions, Linux follows a unified file
structure. The filesystem hierarchy is a tree of di- 26: A non-directory is a leaf
node of a tree.
rectories and files26 . The root27 of the filesystem
27: The root of a tree is the
tree is the directory /. The basic filesystem hierar- first node from which the
chy structure can be seen in Figure 1.6 and Table tree originates. A tree can
Table 1.7. have only one root.
But what does so many directories mean? What do
they do? What is the purpose of each directory?
Table 1.7: Linux Filesystem Hierarchy
Directory Path Description
/ Root directory
/bin Essential command binaries
/boot Static files of the bootloader
/dev Device files
/etc Host-specific system configuration
/home User home directories
/lib Essential shared libraries and kernel modules
/media Mount point for removable media
/mnt Mount point for mounting a filesystem temporarily
/opt Add-on application software packages
/proc Virtual filesystem providing process information
/root Home directory for the root user
/run Run-time variable data
/sbin Essential system binaries
/srv Data for services provided by the system
/sys Kernel and system information
/tmp Temporary files
/usr Secondary hierarchy
/var Variable data
48 1 Essentials of Linux
Figure 1.6: Linux Filesystem
Hierarchy
Some directories do not store data on the disk,
but are used to store information about the sys-
tem. These directories are called virtual directories.
1.3 Navigating the File System 49
For example, the /proc directory is a virtual direc-
tory that provides information about the running
processes. The /sys directory is another virtual di-
rectory that provides information about the system.
The /tmp is a volatile directory whose data is deleted
as soon as the system is turned off. The /run direc-
tory is another volatile directory that stores runtime
data.
Rest directories are stored on the disk. The rea-
son for having so many directories is to categorize
the type of files they store. For example, all the
executable binaries of different applications and
utilities installed in the system is stored in /bin
and /sbin directories. All the shared libraries in-
stalled on the system are stored in /lib directory.
Sometimes some applications are installed in /opt
directory which are not installed directly by the
28: In Linux, you do not in-
package manager. 28
stall applications by down-
loading them from the in-
We also need to store the user’s documents and ternet and running an in-
files. This is done in the /home directory. Each user staller like in Windows. You
has their own directory in the /home directory. The use a package manager to in-
stall applications. The pack-
root user’s directory is /root. All the application’s
age manager downloads the
configuration files are stored in the user’s home application from the internet
directory in the /home directory itself. This separa- and installs it on your system
tion of application binary and per-user application automatically. This way the
package manager can also
settings helps people to easily change systems but
keep track of the installed ap-
keep their own /home directory constant and in plications and their depen-
turn, also all their application settings. dencies and whether they
should be updated. This is
Some settings however needs to be applied system- similar to the Play Store on
mobile phones.
wide and for all users. These settings are stored in
the /etc directory. This directory contains all the
system-wide configuration files.
29: Modern systems use
To boot up the system, the bootloader needs some UEFI instead of BIOS to
boot up the system. The
files. These files are stored in the /boot directory. 29 bootloader is stored in the
The bootloader is the first program that runs when /boot/EFI directory or in the
/efi directory directly.
50 1 Essentials of Linux
the computer is turned on. It loads the operating
system into memory and starts it.
Although the file system is a unified tree hierarchy,
this doesn’t mean that we cannot have multiple
partitions on Linux: au contraire, it is easier to
manage partitions on Unix. We simply need to
mention which empty directory in the hierarchy
should be used to mount a partition. As soon as
that partition is mounted, it gets populated with
the data stored on that disk with all the files and
subdirectories, and when the device is unmounted
the directory again becomes empty. Although a
partition can be mounted on any directory, there
are some dedicated folders in / as well for this
purpose. For example, the /mnt directory is used
to mount a filesystem temporarily, and the /media
directory is used to mount removable media like
USB drives, however it is not required to strictly
follow this.
Finally, the biggest revelation in Linux is that, ev-
erything is a file. Not only are all the system con-
figurations stored as plain text files which can be
read by humans, but the processes running on your
system are also stored as files in proc. Your ker-
nel’s interfaces to the applications or users are also
simple files stored in sys. Biggest of all, even your
30: Device files are not
stored as normal files on the hardware devices are stored as files in dev. 30
disk, but are special files that
the kernel uses to communi- The /usr directory is a secondary hierarchy that
cate with the hardware de- contains subdirectories similar to those present in /.
vices. These are either block
This was created as the olden system had started
or character devices. They
are used to read and write running out of disk space for the /bin and /lib
data to and from the hard- directories. Thus another directory named usr was
ware devices. made, and subdirectiores like /usr/bin and /usr/lib
were made to store half of the binaries and libraries.
There wasn’t however any rigid definition of which
binary should go where. Modern day systems have
1.3 Navigating the File System 51
more than enough disk space to store everything
on one partition, thus the /bin and /lib dont really
exist any more. If they do, they are simply shortcuts
31 31: Shortcuts in Linux are
to the /usr/bin and /usr/lib directories which are
called symbolic links or sym-
still kept for backwards compatibility. links.
These can also be loosely classified into sharable
and non-sharable directories and static and variable
directories as shown in Table Table 1.8.
Table 1.8: Linux Filesystem
Sharable Non-sharable Directory Classification
Static /usr, /opt /etc, /boot
Variable /var/spool /tmp, /var/log
1.3.2 In Memory File System
Some file systems like proc, sys, dev, run, and
tmp are not stored on the disk, but are stored in
memory.
They have a special purpose and are used to store
information about the system. These are called
virtual directories.
These cannot be stored in a disk as it would be too
slow to access them. Many of these files are very
short lived yet are accessed very frequently. So these
are stored in memory to speed up the access.
/dev and /run are mounted as tmpfs filesystems.
This can be seen by running the mount command or
the df command.
1 $ mount
2 /dev/sda1 on / type ext4 (rw,noatime)
3 devtmpfs on /dev type devtmpfs (rw,nosuid,size
=4096k,nr_inodes=990693,mode=755,inode64)
52 1 Essentials of Linux
4 tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,
inode64)
5 devpts on /dev/pts type devpts (rw,nosuid,
noexec,relatime,gid=5,mode=620,ptmxmode
=000)
6 sysfs on /sys type sysfs (rw,nosuid,nodev,
noexec,relatime)
7 securityfs on /sys/kernel/security type
securityfs (rw,nosuid,nodev,noexec,
relatime)
8 cgroup2 on /sys/fs/cgroup type cgroup2 (rw,
nosuid,nodev,noexec,relatime,nsdelegate,
memory_recursiveprot)
9 pstore on /sys/fs/pstore type pstore (rw,
nosuid,nodev,noexec,relatime)
10 efivarfs on /sys/firmware/efi/efivars type
efivarfs (rw,nosuid,nodev,noexec,relatime)
11 bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,
noexec,relatime,mode=700)
12 configfs on /sys/kernel/config type configfs (
rw,nosuid,nodev,noexec,relatime)
13 proc on /proc type proc (rw,nosuid,nodev,
noexec,relatime)
14 tmpfs on /run type tmpfs (rw,nosuid,nodev,size
=1590108k,nr_inodes=819200,mode=755,
inode64)
15 systemd-1 on /proc/sys/fs/binfmt_misc type
autofs (rw,relatime,fd=36,pgrp=1,timeout
=0,minproto=5,maxproto=5,direct,pipe_ino
=5327)
16 hugetlbfs on /dev/hugepages type hugetlbfs (rw
,nosuid,nodev,relatime,pagesize=2M)
17 mqueue on /dev/mqueue type mqueue (rw,nosuid,
nodev,noexec,relatime)
18 debugfs on /sys/kernel/debug type debugfs (rw,
nosuid,nodev,noexec,relatime)
19 tracefs on /sys/kernel/tracing type tracefs (
rw,nosuid,nodev,noexec,relatime)
20 fusectl on /sys/fs/fuse/connections type
fusectl (rw,nosuid,nodev,noexec,relatime)
21 systemd-1 on /data type autofs (rw,relatime,fd
1.3 Navigating the File System 53
=47,pgrp=1,timeout=60,minproto=5,maxproto
=5,direct,pipe_ino=2930)
22 tmpfs on /tmp type tmpfs (rw,noatime,inode64)
23 /dev/sda4 on /efi type vfat (rw,relatime,fmask
=0137,dmask=0027,codepage=437,iocharset=
ascii,shortname=mixed,utf8,errors=remount-
ro)
24 /dev/sda2 on /home type ext4 (rw,noatime)
25 binfmt_misc on /proc/sys/fs/binfmt_misc type
binfmt_misc (rw,nosuid,nodev,noexec,
relatime)
26 tmpfs on /run/user/1000 type tmpfs (rw,nosuid,
nodev,relatime,size=795052k,nr_inodes
=198763,mode=700,uid=1000,gid=1001,inode64
)
27 portal on /run/user/1000/doc type fuse.portal
(rw,nosuid,nodev,relatime,user_id=1000,
group_id=1001)
28 /dev/sdb3 on /data type ext4 (rw,noatime,x-
systemd.automount,x-systemd.idle-timeout=1
min)
Here we can see that the /dev directory is mounted
as a devtmpfs filesystem. The /run directory is
mounted as a tmpfs filesystem. The /proc directory
is mounted as a proc filesystem. The /sys directory
is mounted as a sysfs filesystem.
These are all virtual filesystems that are stored in
memory.
proc:
Proc is an old filesystem that is used to store in-
formation about the running processes. The /proc
directory contains a directory for each running pro-
cess. The directories are named as the process id of
the process.
1 $ ls -l /proc | head
2 total 0
3 dr-xr-xr-x 9 root root 0 May 23 13:01 1
54 1 Essentials of Linux
4 dr-xr-xr-x 9 root root 0 May 23 13:01 100
5 dr-xr-xr-x 9 sayan sayan 0 May 23 13:06 1004
6 dr-xr-xr-x 9 sayan sayan 0 May 23 13:06 1009
7 dr-xr-xr-x 9 root root 0 May 23 13:01 102
8 dr-xr-xr-x 9 root sayan 0 May 23 13:06 1029
9 dr-xr-xr-x 9 sayan sayan 0 May 23 13:06 1038
10 dr-xr-xr-x 9 sayan sayan 0 May 23 13:06 1039
11 dr-xr-xr-x 9 sayan sayan 0 May 23 13:06 1074
These folders are simply for information and do
not store any data. This is why they have a size of 0.
Each folder is owned by the user who started the
process.
Inside each of these directories, there are files that
contain information about the process.
You can enter the folder of a process that is started
by you and see the information about the process.
1 $ cd /proc/301408
2 $ ls -l | head -n15
3 total 0
4 -r--r--r-- 1 sayan sayan 0 May 23 16:55
arch_status
5 dr-xr-xr-x 2 sayan sayan 0 May 23 16:55 attr
6 -rw-r--r-- 1 sayan sayan 0 May 23 16:55
autogroup
7 -r-------- 1 sayan sayan 0 May 23 16:55 auxv
8 -r--r--r-- 1 sayan sayan 0 May 23 16:55
cgroup
9 --w------- 1 sayan sayan 0 May 23 16:55
clear_refs
10 -r--r--r-- 1 sayan sayan 0 May 23 16:55
cmdline
11 -rw-r--r-- 1 sayan sayan 0 May 23 16:55 comm
12 -rw-r--r-- 1 sayan sayan 0 May 23 16:55
coredump_filter
13 -r--r--r-- 1 sayan sayan 0 May 23 16:55
cpu_resctrl_groups
14 -r--r--r-- 1 sayan sayan 0 May 23 16:55
cpuset
1.3 Navigating the File System 55
15 lrwxrwxrwx 1 sayan sayan 0 May 23 16:55 cwd
-> /home/sayan/docs/projects/sc-handbook
16 -r-------- 1 sayan sayan 0 May 23 16:55
environ
17 lrwxrwxrwx 1 sayan sayan 0 May 23 13:41 exe
-> /usr/bin/entr
Here you can see that the command line of the pro-
cess is stored in the cmdline file. Here the process
is of a command called entr.
You can also see the current working directory
(cwd) of the process.
There are some other files in the /proc directory
that contain information about the system.
▶ cpuinfo - stores cpu information.
▶ version - stores system information, content
similar to uname -a command.
▶ meminfo - Diagnostic information about mem-
ory. Check free command.
▶ partitions - Disk partition information. Check
df.
▶ kcore - The astronomical size ( 247 bits) tells
the maximum virtual memory (47 bits) the
current Linux OS is going to handle.
32: A symlink is a special
type of file that points to an-
sys:
other file or directory. It is
similar to a shortcut in Win-
Sys is a newer filesystem that is used to store in-
dows. This will be discussed
formation about the system. It is neatly organized in detail in Section 1.6.
and is easier to navigate than proc. This highly uses 33: This will only work on
symlinks to organize the folders while maintaining your own system, not on the
redundancy as well. 32 system commands VM, since
you do not have the priv-
Try running the following code snippet in a terminal iledge to modify the files
there. Make sure you have
if you have a caps lock key on your keyboard and the ability to run commands
are running linux directly on your bare-metal. 33 as root and are able to use
sudo. It is also unlikely to
1 $ cd /sys/class/leds work on a virtual machine. It
2 $ echo 1 | sudo tee *capslock/brightness will also not work on linux
systems older than 2.6.
56 1 Essentials of Linux
If you are running a linux system directly on your
hardware, you will see the caps lock key light up.
Most modern keyboards will quickly turn off the
light again as the capslock is technically not turned
on, only the led was turned on manually by you.
/sys vs /proc:
34: Unix System V is one of
The /proc tree originated in System V Unix 34 ,
the first commercial versions
of the Unix operating system. where it only gave information about each running
It was originally developed process, using a /proc/\$PID/ format. Linux greatly
by AT&T and first released extended that, adding all sorts of information about
in 1983.
the running kernel’s status. In addition to these
read-only information files, Linux’s /proc also has
writable virtual files that can change the state of the
35: BSD, or Berkeley Soft-
running kernel. BSD 35 type OSes generally do not
ware Distribution, is a Unix-
like operating system that have /proc at all, so much of what you find under
was developed at the Univer- proc is non-portable.
sity of California, Berkeley.
It was first released in 1977 The intended solution for this mess in Linux’s /proc
and was based on the orig- is /sys. Ideally, all the non-process information that
inal Unix source code from
AT&T. BSD is not linux, it is a
got crammed into the /proc tree should have moved
totally different kernel, with to /sys by now, but historical inertia has kept a lot
similar core utils to GNU. of stuff in /proc. Often there are two ways to effect
a change in the running kernel: the old /proc way,
kept for backwards compatibility, and the new /sys
way that you’re supposed to be using now.
1.3.3 Paths
Whenever we open a terminal on a Linux system,
we are placed in a directory. This is called the current
working directory. All shells and applications have
a current working directory from where they are
launched.
To refer to and identify the directory you are talking
about, we use a path.
1.3 Navigating the File System 57
Definition 1.3.1 (Path) Path is a traversal in the
filesystem tree. It is a way to refer to a file or
directory.
Absolute Path:
The traversal to the directory from the root directory
is called the absolute path. For example, if we want
to refer to the directory named alex inside the
directory home in the root of the file system, then
it is qualified as:
1 /home/alex
Relative Path:
The traversal to the directory from the current work-
ing directory is called the relative path. For ex-
ample, if we want to refer to the directory named
alex inside the directory home from the /usr/share
directory, then it will be qualified as:
Figure 1.7: Relative Path
1 ../../home/alex
Remark 1.3.1 The .. in the path refers to the par-
ent directory. It is used in relative paths to refer
to directories whose path requires travelling up
the tree.
1.3.4 Basic Commands for Navigation
The file system can be navigated in the Linux com-
mand line using the following commands:
▶ pwd: Print the current working directory
▶ ls: List the contents of the current directory
▶ cd: Change the current working directory
▶ mkdir: Create a new directory
58 1 Essentials of Linux
▶ rmdir: Remove a directory
▶ touch: Create a new file
▶ rm: Remove a file
▶ pushd: Push the current directory to a stack
36: pushd and popd are use- ▶ popd: Pop the current directory from a stack36
ful for quickly switching be-
tween directories in scripts.
More details about these commands can be found
in their respective man pages. For example, to find
more about the ls command, you can type man ls.
Question 1.3.1 What is the command to list the
contents of the current directory?
Answer 1.3.1 ls
Question 1.3.2 What is the command to list
the contents of the current directory including
hidden files?
Answer 1.3.2 ls -a
Question 1.3.3 What is the command to list the
contents of the current directory in a long list
format? (show permissions, owner, group, size,
and time)
Answer 1.3.3 ls -l
Question 1.3.4 What is the command to list the
contents of the current directory in a long list
format and also show hidden files?
1.3 Navigating the File System 59
Answer 1.3.4 ls -al or ls -la or ls -l -a or
ls -a -l
Question 1.3.5 The output of ls gives multiple
files and directories in a single line. How can
you make it print one file or directory per line?
Answer 1.3.5 ls -1
This can also be done by passing the output of
ls to cat or storing the output of ls in a file and
then using cat to print it. We will see these in
later weeks.37 37: that is a one, not an L
60 1 Essentials of Linux
1.4 File Permissions
Figure 1.8: File Permissions
Definition 1.4.1 (File Permissions) File permis-
sions define the access rights of a file or directory.
There are three basic permissions: read, write,
and execute. These permissions can be set for
the owner of the file, the group of the file, and
others.
We have already briefly seen how to see the permis-
sions of a file using the ls -l command.
1 $ touch hello.txt
2 $ mkdir world
3 $ ls -l
4 total 4
5 -rw-r--r-- 1 sayan sayan 0 May 21 15:20
hello.txt
6 drwxr-xr-x 2 sayan sayan 4096 May 21 15:21
world
Here, the first column of the output of ls -l shows
the permissions of the file or directory. As seen in
Figure 1.8, the permissions are divided into four
parts:
▶ The first character shows the type of the file.
- for a regular file and d for a directory and
38: There are other types of more. 38
files as well, like l for a sym- ▶ The next three characters show the permis-
bolic link, c for a character
device, b for a block device, s
sions for the owner of the file.
for a socket, and p for a pipe. ▶ The next three characters show the permis-
These will be discussed later. sions for the group of the file.
1.4 File Permissions 61
▶ The last three characters show the permis-
sions for others.
Definition 1.4.2 (Owner) Although this can be
changed, the owner of a file is usually the user
who created it. All files in the filesystem have
an owner. This is symbolically coded as u.
Definition 1.4.3 (Group) The group of a file is
usually the group of the user who created it.
But it can also be changed to any other existing
group in the system. All users in the group a
have the same permissions on the file. This is
symbolically coded as g.
a except the owner of the file
Definition 1.4.4 (Others) Others are all the
users who are not the owner of the file and
are not in the group of the file. This is symboli-
cally coded as o.
There are three actions that can be performed on a
file: read, write, and execute.
▶ Read: The read permission allows the file to
be read. This is symbolically coded as r.
▶ Write: The write permission allows the file to
be modified. This is symbolically coded as w.
▶ Execute: The execute permission allows the 39: Executing a file means
running the file as a program.
file to be executed. 39 This is symbolically
For a directory, the execute
coded as x. permission allows the direc-
tory to be traversed into.
These however, have different meanings for files
and directories.
62 1 Essentials of Linux
1.4.1 Read
▶ For a file, the read permission allows the file
to be read. You can use commands like cat or
less to read the contents of the file if the user
has read permissions.
▶ For a directory, the read permission allows
the directory to be listed using ls.
1.4.2 Write
▶ For a file, the write permission allows the file
40: Redirection is a way to
send the output of a com-
to be modified. You can use commands like
mand to a file. echo along with redirection 40 or a text editor
like vim or nano to write to the file if the user
has write permissions.
▶ For a directory, the write permission allows
the directory to be modified. You can create,
delete, or rename files in the directory if the
user has write permissions.
1.4.3 Execute
▶ For a file, the execute permission allows the
file to be executed. This is usually only needed
for special files like executables, scripts, or
libraries. You can run the file as a program if
the user has execute permissions.
▶ For a directory, the execute permission allows
the directory to be traversed into. You can
change to the directory if the user has execute
permissions using cd. You can also only long-
list the contents of the directory if the user
has execute permissions on that directory.
1.4 File Permissions 63
1.4.4 Interesting Caveats
This causes some interesting edge-cases that one
needs to be familiar with.
Cannot modify a file? Think again!
If you have write and execute permissions on a
directory, even if you do not have write permission
on a file inside the directory, you can delete the file
due to your write permission on the directory, and
then re-create the modified version of the file with
the same name. But if you try to simply modify the
file directly, you will get permission error.
1 $ mkdir test
2 $ cd test
3 $ echo "hello world" > file1
4 $ chmod 400 file1 # 400 means read
permission only
5 $ cat file1
6 hello world
7 $ echo "hello universe" > file1 # unable to
write
8 -bash: file1: Permission denied
9 $ rm file1 # can remove as we have write
permission on folder
10 rm: remove write-protected regular file ’
file1’? y
11 $ echo "hello universe" > file1 # can create
new file
12 $ cat file1
13 hello universe
However, this only works on files. You cannot re-
move a directory if you do not have write per-
mission on the directory, even if you have write
permission on its parent directory.
Can list names but not metadata?
64 1 Essentials of Linux
If you have read permission on a directory but not
execute permission, you cannot traverse into the
directory, but you can still use ls to list the contents
of the directory. However, you cannot use ls -l to
long-list the contents of the directory. That is, you
only have access to the name of the files inside, not
their metadata.
1 $ mkdir test
2 $ touch test/1 test/2
3 $ chmod 600 test # removing execute
permission from folder
4 $ ls test # we can still list the files due
to read permission
5 1 2
6 $ ls -l test # but cannot long-list the
files
7 ls: cannot access ’test/2’: Permission
denied
8 ls: cannot access ’test/1’: Permission
denied
9 total 0
10 -????????? ? ? ? ? ? 1
11 -????????? ? ? ? ? ? 2
Cannot list names but can traverse?
If you have execute permission on a directory but
not read permission, you can traverse into the di-
rectory but you cannot list the contents of the direc-
tory.
1 $ mkdir test
2 $ touch test/1 test/2
3 $ chmod 300 test # removing read permission
from folder
4 $ ls test # we cannot list the files
5 ls: cannot open directory ’test’: Permission
denied
6 $ cd test # but we can traverse into the
folder
7 $ pwd
1.4 File Permissions 65
8 /home/sayan/test
Subdirectories with all permissions, still cannot
access?
If you have all the permissions to a directory, but
dont have execute permission on its parent direc-
tory, you cannot access the subdirectory, or even
list its contents.
1 $ mkdir test
2 $ mkdir test/test2 # subdirectory
3 $ touch test/test2/1 # file inside
subdirectory
4 $ chmod 700 test/test2 # all permissions to
subdirectory
5 $ chmod 600 test # removing execute
permission from parent directory
6 $ ls test
7 test2
8 $ cd test/test2 # cannot access subdirectory
9 -bash: cd: test/test2: Permission denied
10 $ ls test/test2 # cannot even list contents
of subdirectory
11 ls: cannot access ’test/test2’: Permission
denied
1.4.5 Changing Permissions
The permissions of a file can be changed using the
chmod command.
Synopsis:
1 chmod [OPTION]... MODE[,MODE]... FILE...
2 chmod [OPTION]... OCTAL-MODE FILE...
OCTAL-MODE is a 3 or 4 digit octal number where
the first digit is for the owner, the second digit is for
the group, and the third digit is for others. We will
66 1 Essentials of Linux
discuss how the octal representation of permissions
is calculated in the next section.
The MODE can be in the form of ugoa+-=rwxXst
where:
▶ u is the user who owns the file
▶ g is the group of the file
▶ o is others
▶ a is all
▶ + adds the permission
▶ - removes the permission
▶ = sets the permission
▶ r is read
▶ w is write
▶ x is execute
▶ X is execute only if its a directory or already
has execute permission.
▶ s is setuid/setgid
▶ t is restricted deletion flag or sticky bit
We are already familiar with what r, w, and x permis-
sions mean, but what are the other permissions?
1.4.6 Special Permissions
Definition 1.4.5 (SetUID/SetGID) The setuid
and setgid bits are special permissions that can
be set on executable files. When an executable
file has the setuid bit set, the file will be executed
with the privileges of the owner of the file. When
an executable file has the setgid bit set, the file
will be executed with the privileges of the group
of the files.
SetUID:
1.4 File Permissions 67
This is useful for programs that need to access
system resources that are only available to the owner
or group of the file.
A very notable example is the passwd command.
This command is used to set the password of an
user. Although changing password of a user is a
priviledged action that only the root user can do,
the passwd command can be run by any user to
change their password. This is possible due to the
setuid bit set on the passwd command. When the
passwd command is run, it is run with the privileges
of the root user, and thus can change the password
of that user.
You can check this out by running ls -l /usr/bin/
passwd and seeing the s in the permissions.
1 $ ls -l /usr/bin/passwd
2 -rwsr-xr-x 1 root root 80912 Apr 1 15:49 /
usr/bin/passwd
SetGID:
The behaviour of SetGID is similar to SetUID, but
the file is executed with the privileges of the group
of the file.
However, SetGID can also be applied to a directory.
When a directory has the SetGID bit set, all the files
and directories created inside that directory will
inherit the group of the directory, not the group
of the user who created the file or directory. This
is highly useful when you have a directory where
multiple users need to work on the same files and
directories, but you want to restrict the access to
only a certain group of users. The primary group of
each user is different from each other, but since they
are also part of another group (which is the group
owner of the directory) they are able to read and
write the files present in the directory. However, if
68 1 Essentials of Linux
the user creates a file in the directory, the file will be
owned by the user’s primary group, not the group
of the directory. So other users would not be able
to access the file. This is fixed by the SetGID bit on
the directory.
1 $ mkdir test
2 $ ls -ld test # initially the folder is
owned by the user’s primary group
3 drwxr-xr-x 2 sayan sayan 4096 May 22 16:27
test
4 $ chgrp wheel test # we change the group of
the folder to wheel, which is a group that
the user is part of
5 $ ls -ld test
6 drwxr-xr-x 2 sayan wheel 4096 May 22 16:27
test
7 $ whoami # this is the current user
8 sayan
9 $ groups # this is the users groups, first
one is its primary group
10 sayan wheel
11 $ touch test/file1 # before setting the
SetGID bit, a new file will have group
owner as the primary group of the user
creating it
12 $ ls -l test/file1 # notice the group owner
is sayan
13 -rw-r--r-- 1 sayan sayan 0 May 22 16:29 test
/file1
14 $ chmod g+s test # we set the SetGID bit on
the directory
15 $ ls -ld test # now the folder has a s in
the group permissions
16 drwxr-sr-x 2 sayan wheel 4096 May 22 16:29
test
17 $ touch test/file2 # now if we create
another new file, it will have the group
owner as the group of the directory
18 $ ls -l test/file2 # notice the group owner
is wheel
1.4 File Permissions 69
19 -rw-r--r-- 1 sayan wheel 0 May 22 16:29 test
/file2
Restricted Deletion Flag or Sticky Bit:
The restricted deletion flag or sticky bit is a special
permission that can be set on directories.
Historically, this bit was to be applied on executable
files to keep the program in memory after it has
finished executing. This was done to speed up the
execution of the program as the program would
not have to be loaded into memory again. This was
called sticky bit because the program would stick
in memory. 41 41: The part of memory
where the program’s text seg-
However, this is no longer how this bit is used. ment is stored is called the
swap.
When the sticky bit is set on a directory, only the
owner of the file, the owner of the directory, or the
root user can delete or rename files in the direc-
tory.
This is useful when you have a directory where
multiple users need to write files, but you want to
restrict the deletion of files to only the owner of the
file or the owner of the directory.
The most common example of this is the /tmp di-
rectory. The /tmp directory is a directory where
temporary files are stored. You want to let any user
create files in the /tmp directory, but you do not want
any user to delete files created by other users.
1 $ ls -ld /tmp
2 drwxrwxrwt 20 root root 600 May 22 16:43 /
tmp
Exercise 1.4.1 Log into the system commands
VM and cd into the /tmp directory. Create a file
70 1 Essentials of Linux
Figure 1.9: Octal Permissions
in the /tmp directory. Try to find if there are
files created by other users in the /tmp directory
using ls -l command. If there are files created
by other users, try to delete them. a
a You can create a file normally, or using the mktemp
command.
1.4.7 Octal Representation of
Permissions
The permissions of a file for the file’s owner, group,
42: If the octal is 4 digits, the
and others can be represented as a 3 or 4 digit octal
first digit is for special per-
missions like setuid, setgid, number. 42 Each of the octal digits is the sum of the
and sticky bit. permissions for the owner, group, and others.
▶ Read permission is represented by 4
▶ Write permission is represented by 2
▶ Execute permission is represented by 1
Thus if a file has read, write, and execute permis-
sions for the owner, read and execute permissions
for the group, and only read permission for others,
the octal representation of the permissions would
be 754.
1.4 File Permissions 71
Table 1.9: Octal Representation of Permissions
Octal Read Write Execute Representation Description
0 0 0 0 — No permissions
1 0 0 1 –x Execute only
2 0 1 0 -w- Write only
3 0 1 1 -wx Write and execute
4 1 0 0 r– Read only
5 1 0 1 r-x Read and execute
6 1 1 0 rw- Read and write
7 1 1 1 rwx Read, write, and execute
The octal format is usually used more than the
symbolic format as it is easier to understand and
remember and it is more concise.
1 $ chmod 754 myscript.sh # this sets the
permissions of myscript.sh to rwxr-xr--
2 $ ./myscript.sh
3 Hello World!
However, if you want to add or remove a permis-
sion without changing the other permissions, the
symbolic format is more useful.
1 $ chmod u+x myscript.sh # this adds execute
permission to the owner of myscript.sh
2 $ ./myscript.sh
3 Hello World!
Question 1.4.1 How to list the permissions of a
file?
Answer 1.4.1 ls -l
The permissions are the first 10 characters of
the output.
stat -c \%A filename will list only the permis-
72 1 Essentials of Linux
sions of a file.
There are other format specifiers of stat to show
different statistics which can be found in man
stat.
Question 1.4.2 How to change permissions of
a file? Let’s say we want to change file1’s per-
missions to rwxr-xr-- What is the octal form of
that?
Answer 1.4.2 chmod u=rwx,g=rx,o=r file1 will
change the permissions of file1
The octal form of rwxr-xr-- is 754.
So we can also use chmod 754 file1
Providing the octal is same as using = to set the
permissions.
We can also use + to add permissions and - to
remove permissions.
1.5 Types of Files 73
1.5 Types of Files
We had briefly seen that the output of ls -l shows
the type of the file as the first character of the
permissions.
There are 7 types of files in a linux file system as
shown in Table Table 1.10.
Table 1.10: Types of Files
1.5.1 Regular Files Type Symbol
Regular Files -
Regular files are the most common type of file. Al- Directories d
most all files are regular files. Scripts and executable Symbolic Links l
binaries are also regular files. All the configuration Character Devices c
files of the system are regular files as well. The Block Devices b
regular files are actually the only files that contain Named Pipes p
data and are stored on the disk. Sockets s
1.5.2 Directories
Directories are files that contain a list of other files.
Directories do not contain data, they contain refer-
ences to other files. Usually the size of a directory is
equal to the block size of the filesystem. Directories
have some special permissions that are different
from regular files as discussed in Section 1.4.
1.5.3 Symbolic Links 43: Symlinks is short for
symbolic links.
Symbolic links are files that point to other files. They There are another type of
only consume the space of the path they are pointing links called hard links. How-
ever, hard links are not files,
to. Symlinks 43 are useful to create shortcuts to files they are pointers to the same
or directories. They are dependent on the original inode. They do not consume
file and will stop working if the original file is extra space, and are not de-
deleted or moved. They are discussed in detail in pendent on the original file.
Hard links do not have a sep-
Section 1.6. arate type, they are just reg-
ular files.
74 1 Essentials of Linux
1.5.4 Character Devices
Character devices are files that represent devices
that are accessed as a stream of bytes. For example
the keyboard, mouse, webcams, and most USB
devices are character devices. These are not real
files stored on the disk, but are files that represent
devices. They can interacted with like a file using
the read and write system calls to interact with the
hardware directly. These files are made available
by the kernel and are stored in the /dev directory.
Any read/write operation on a character device is
monitored by the kernel and the data is sent to the
device.
1 $ cd /dev/input
2 $ ls -l
3 total 0
4 drwxr-xr-x 2 root root 220 May 22 13:49 by
-id
5 drwxr-xr-x 2 root root 420 May 22 13:49 by
-path
6 crw-rw---- 1 root input 13, 64 May 22 13:49
event0
7 crw-rw---- 1 root input 13, 65 May 22 13:49
event1
8 crw-rw---- 1 root input 13, 74 May 22 13:49
event10
9 crw-rw---- 1 root input 13, 75 May 22 13:49
event11
10 crw-rw---- 1 root input 13, 76 May 22 13:49
event12
11 crw-rw---- 1 root input 13, 77 May 22 13:49
event13
12 crw-rw---- 1 root input 13, 78 May 22 13:49
event14
13 crw-rw---- 1 root input 13, 79 May 22 13:49
event15
14 crw-rw---- 1 root input 13, 80 May 22 13:49
event16
1.5 Types of Files 75
15 crw-rw---- 1 root input 13, 81 May 22 13:49
event17
16 crw-rw---- 1 root input 13, 82 May 22 13:49
event18
17 crw-rw---- 1 root input 13, 83 May 22 13:49
event19
18 crw-rw---- 1 root input 13, 66 May 22 13:49
event2
19 crw-rw---- 1 root input 13, 84 May 22 13:49
event20
20 crw-rw---- 1 root input 13, 67 May 22 13:49
event3
21 crw-rw---- 1 root input 13, 68 May 22 13:49
event4
22 crw-rw---- 1 root input 13, 69 May 22 13:49
event5
23 crw-rw---- 1 root input 13, 70 May 22 13:49
event6
24 crw-rw---- 1 root input 13, 71 May 22 13:49
event7
25 crw-rw---- 1 root input 13, 72 May 22 13:49
event8
26 crw-rw---- 1 root input 13, 73 May 22 13:49
event9
27 crw-rw---- 1 root input 13, 63 May 22 13:49
mice
28 crw-rw---- 1 root input 13, 32 May 22 13:49
mouse0
29 crw-rw---- 1 root input 13, 33 May 22 13:49
mouse1
Here the event and mouse files are character de-
vices that represent input devices like the keyboard
and mouse. Note the c in the permissions, which
indicates that these are character devices.
1.5.5 Block Devices
Block devices are files that represent devices that
are accessed as a block of data. For example hard
76 1 Essentials of Linux
drives, SSDs, and USB drives are block devices.
These files also do not store actual data on the
disk, but represent devices. Any block file can be
mounted as a filesystem. We can interact with block
devices using the read and write system calls to
interact with the hardware directly. For example,
the /dev/sda file represents the first hard drive in
the system.
This makes it easy to write an image to a disk
44: The dd command is a directly using the dd command. 44
powerful tool that can be
used to copy and convert
The following example shows how we can use the
files. It is acronym of data
duplicator. However, it is also dd command to write an image 45 to a USB drive.
known as the disk destroyer It is this easy to create a bootable USB drive for
command, as it can be used linux.
to overwrite the entire disk
if you are not careful with 1 $ dd if=~/Downloads/archlinux.iso of=/dev/
which disk you are writing sdb bs=4M status=progress
the image to.
45: ISO file Here if is the input file, of is the output file, bs is
the block size, and status is to show the progress
of the operation.
Warning 1.5.1 Be very careful when using the
dd command. Make sure you are writing to the
correct disk. Writing to the wrong disk can cause
data loss.
1.5.6 Named Pipes
46: Also known as FIFOs
Named pipes 46 are files that are used for inter-
process communication. They do not store the data
that you write to them, but instead pass the data to
another process. A process can only write data to a
47: and vice versa
named pipe if another process is reading from the
named pipe. 47
1.5 Types of Files 77
1 $ mkfifo pipe1
2 $ ls -l pipe1
3 prw-r--r-- -1 sayan sayan 0 May 22 18:22 pipe1
Here the p in the permissions indicates that this is
a named pipe. If you now try to write to the named
pipe, the command will hang until another process
reads from the named pipe. Try the following in
two different terminals:
Terminal 1:
1 $ echo "hello" > pipe1
Terminal 2:
1 $ cat pipe1
You will notice that whichever command you run
first will hang until the other command is run.
1.5.7 Sockets
Sockets are a special file type, similar to TCP/IP
sockets, providing inter-process networking pro-
tected by the file system’s access control.
This is similar to named pipes, but the difference
is that named pipes are meant for IPC between
processes in the same machine, whereas sockets
can be used for communication across machines.
Try out the following in two different terminals:
Terminal 1:
1 $ nc -lU socket.sock
Terminal 2:
1 $ echo "hello" | nc -U socket.sock
78 1 Essentials of Linux
Notice here, that if you run the command in termi-
nal 2 first, it will error out with the text:
1 nc: socket.sock: Connection refused
Only if we run them in correct order can you see the
48: The nc command is the message "hello" being printed in terminal 1. 48
netcat command. It is a pow-
erful tool for network de- You can press Ctrl+C to stop the nc command in
bugging and exploration. It both terminals.
can be used to create sock-
ets, listen on ports, and send
and receive data over the net-
work. This will be discussed 1.5.8 Types of Regular Files
in more detail in the net-
working section and in the Regular files can be further classified into different
Modern Application Devel-
types based on the data they contain. In Linux
opment course.
systems, the type of a file is determined by its MIME
type. The extension of a file does not determine its
type, the contents of the file do. It is thus common
to have files without extensions in Linux systems,
as they provide no value.
The bytes at the start of a
file used to identify the type The file command can be used to determine the
of file are called the magic type of a file.
bytes.
More details can be found 1 $ file /etc/passwd
at: https://en.wikipedia.2 /etc/passwd: ASCII text
org/wiki/List_of_file_
3 $ file /bin/bash
signatures
4 /bin/bash: ELF 64-bit LSB pie executable, x86
-64, version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2,
BuildID[sha1]=165
d3a5ffe12a4f1a9b71c84f48d94d5e714d3db, for
GNU/Linux 4.4.0, stripped
Question 1.5.1 What types of files are possible
in a linux file system?
1.5 Types of Files 79
Answer 1.5.1 There are 7 types of files in a linux
file system:
▶ Regular Files (starts with -)
▶ Directories (starts with d)
▶ Symbolic Links (starts with l)
▶ Character Devices (starts with c)
▶ Block Devices (starts with b)
▶ Named Pipes (starts with p)
▶ Sockets (starts with s)
Question 1.5.2 How to know what kind of file
a file is? Can we determine using its extension?
Can we determine using its contents? What
does MIME mean? How to get that?
Answer 1.5.2 The file command can be used
to determine the type of a file.
The extension of a file does not determine its
type.
The contents of a file can be used to determine
its type.
MIME stands for Multipurpose Internet Mail
Extensions.
It is a standard that indicates the nature and
format of a document.
file -i filename will give the MIME type of
filename.
80 1 Essentials of Linux
1.6 Inodes and Links
1.6.1 Inodes
Definition 1.6.1 (Inodes) An inode is an in-
dex node. It serves as a unique identifier for a
specific piece of metadata on a given filesystem.
Whenever you run ls -l and see all the details of a
file, you are seeing the metadata of the file. These
metadata, however, are not stored in the file itself.
These data about the files are stored in a special
data structure called an inode.
Each inode is stored in a common table and each
filesystem mounted to your computer has its own
inodes. An inode number may be used more than
once but never by the same filesystem. The filesys-
tem id combines with the inode number to create a
unique identification label.
You can check how many inodes are used in a
filesystem using the df -i command.
1 $ df -i
2 Filesystem Inodes IUsed IFree IUse%
Mounted on
3 /dev/sda1 6397952 909213 5488739 15%
/
4 /dev/sda4 0 0 0 -
/efi
5 /dev/sda2 21569536 2129841 19439695 10%
/home
6 /dev/sdb3 32776192 2380 32773812 1%
/data
7 $ df
8 Filesystem 1K-blocks Used Available
Use% Mounted on
9 /dev/sda1 100063312 63760072 31174076
68% /
1.6 Inodes and Links 81
10 /dev/sda4 1021952 235760 786192
24% /efi
11 /dev/sda2 338553420 273477568 47805068
86% /home
12 /dev/sdb3 514944248 444194244 44518760
91% /data
You can notice the number of inodes present, num-
ber of inodes used, and number of inodes that are
free. The IUse% column shows the percentage of
inodes used. This however, does not mean how
much of space is used, but how many files can be
created.
Observe that although the /data partition has only
1% of inodes used, it has 91% of space used. This
is because the files in the /data partition are large
files, and thus the number of inodes used is less.
Remember that a file will take up one inode, no
matter how large it is. But the space it takes up will
be the size of the file.
We can also see the inode number of a file using the
ls -i command.
1 $ ls -i
2 1234567 file1
3 1234568 file2
4 1234569 file3
Here the first column is the inode number of the
file.
Remark 1.6.1 The inode number is unique only
within the filesystem. If you copy a file from
one filesystem to another, the inode number will
change.
82 1 Essentials of Linux
1.6.2 Separation of Data, Metadata, and
Filename
In UNIX systems, the data of a file and the metadata
of a file are stored separately. The inodes are stored
in a inode-array or table, and contain the metadata
of the file and the pointer to it in the storage block.
These metadata can be retrieved using the stat
49: A system call is a request system call. 49
in a operating system made
via a software interrupt by an Conveniently, a userland utility to list the metadata
active process for a service of a file is also called stat.
performed by the kernel. The
diagram in Figure 1.10 shows 1 $ stat /etc/profile
how system calls work. 2 File: /etc/profile
3 Size: 993 Blocks: 8 IO
Block: 4096 regular file
4 Device: 8,1 Inode: 2622512 Links: 1
5 Access: (0644/-rw-r--r--) Uid: ( 0/
root) Gid: ( 0/ root)
6 Access: 2024-05-21 18:30:27.000000000 +0530
7 Modify: 2024-04-07 23:32:30.000000000 +0530
Figure 1.10: System Calls 8 Change: 2024-05-21 18:30:27.047718323 +0530
9 Birth: 2024-05-21 18:30:27.047718323 +0530
We can also specify the format of the output of the
stat command using the --format or -c flag to print
only the metadata we want.
The data of the file is stored in the storage block.
The inode number indexes a table of inodes on the
file system. From the inode number, the kernel’s
file system driver can access the inode contents,
including the location of the file, thereby allowing
access to the file.
On many older file systems, inodes are stored in one
or more fixed-size areas that are set up at file system
creation time, so the maximum number of inodes is
fixed at file system creation, limiting the maximum
number of files the file system can hold.
1.6 Inodes and Links 83
Table 1.11: Metadata of a File
Metadata Description
Size Size of the file in bytes
Blocks Number of blocks used by the file
IO Block Block size of the file system
Device Device ID of the file system
Inode Inode number of the file
Links Number of hard links to the file
Access Access time of the file (atime)
Modify Modification time of the file (mtime)
Change Change time of the inode (ctime)
Birth Creation time of the file
Some Unix-style file systems such as JFS, XFS, ZFS,
OpenZFS, ReiserFS, btrfs, and APFS omit a fixed-
size inode table, but must store equivalent data in
order to provide equivalent capabilities. Common
alternatives to the fixed-size table include B-trees
and the derived B+ trees.
Remark 1.6.2 Although the inodes store the
metadata of the file, the filename is not stored
in the inode. It is stored in the directory entry.
Thus the filename, file metdata, and file data are
stored separately.
1.6.3 Directory Entries
Unix directories are lists of association structures,
each of which contains one filename and one inode
number. The file system driver must search a di-
rectory for a particular filename and then convert
the filename to the correct corresponding inode
number.
Thus to read a file from a directory, first the direc-
tory’s directory entry is read which stores the name
84 1 Essentials of Linux
Figure 1.11: Inodes and Directory Entry
of the file and its inode number. The kernel then
follows the inode number to find the inode of the
file. The inode stores all the metadata of the file,
and the location of the data of the file. The kernel
then follows the inode to find the data of the file.
This is shown in Figure 1.11.
So what happens if two directory entries point to
the same inode? This is called a hard link.
1.6.4 Hard Links
If multiple entries of the directory entry points to
the same inode, they are called hard links. Hard
links can have different names, but they are the
same file. As they point to the same inode, they also
have the same metadata.
This is useful if you want to have the same file in
multiple directories without taking up more space.
It is also useful if you want to keep a backup of a
important file which is accessed by many people.
If someone accidentally deletes the file, the other
hard links will still be there and able to access the
file.
1.6 Inodes and Links 85
Definition 1.6.2 (Hard Links) Hard Links are
just pointers to the same inode. They are the
same file. They are not pointers to the path of
the file. They are pointers to the file itself. They
are not affected by the deletion of the other file.
When creating a hard link, you need to provide
the path of the original file, and thus it has to
be either absolute path, or relative from the
current working directory, not relative from the
location of the hard link.
Hard links can be created for files only, and not di-
rectories. It can be created using the ln command.
1 $ ln file1 file2
This will create a hard link named file2 that points
to the same inode as file1.
Remark 1.6.3 Hard links are not dependent
on the original file. They are the same file and
equivalent. The first link to be created has no
special status.
Historically directories could also have hard links,
but this would cause the file tree to stop being a
Directed Acyclic Graph 50 and become a Directed Figure 1.12: Directed Acyclic
Cyclic Graph if a hardlink of an ancestor was put as Graph
a subdirectory. This would create confusions and
infinite walks in the file system. Modern systems 50: A Directed Acyclic
generally prohibit this confusing state, except that Graph is a graph that has
the parent of root is still defined as root. 51 no cycles as seen in Figure
1.12.
As hard links depend on the inode, they can only 51: The most notable ex-
ception to this prohibition
exist in the same filesystem as inodes are unique to
is found in Mac OS X (ver-
a filesystem only. sions 10.5 and higher) which
allows hard links of direc-
If we want to create shortcuts across filesystems, or tories to be created by the
superuser.
86 1 Essentials of Linux
if we want to create a link to a directory, we can use
symbolic links.
1.6.5 Symbolic Links
A symbolic link contains a text string that is auto-
matically interpreted and followed by the operating
system as a path to another file or directory. This
other file or directory is called the "target". The
symbolic link is a second file that exists indepen-
dently of its target. If a symbolic link is deleted, its
target remains unaffected. If a symbolic link points
to a target, and sometime later that target is moved,
renamed or deleted, the symbolic link is not auto-
matically updated or deleted, but continues to exist
and still points to the old target, now a non-existing
location or file. Symbolic links pointing to moved or
non-existing targets are sometimes called broken,
orphaned, dead, or dangling.
Definition 1.6.3 (Soft Links) Soft Links are spe-
cial kinds of files that just store the path given
to them. Thus the path given while making
soft links should either be an absolute path, or
relative from the location of the soft link to the
location of the original file. It should not be
relative from current working directory.a
a This is a common mistake.
Symlinks are created using the symlink system call.
This can be done using the ln -s command.
1 $ echo "hello" > file1
2 $ ln -s file1 file2
3 $ ls -l
4 total 4
1.6 Inodes and Links 87
5 -rw-r--r-- 1 sayan sayan 6 May 23 15:27 file1
6 lrwxrwxrwx 1 sayan sayan 5 May 23 15:27 file2
-> file1
7 $ cat file2
8 hello
Interesting Observation:
Usually we have seen that if we use ls -l with a
directory as its argument, it lists the contents of the
directory.
The only way to list the directory itself is to use
ls -ld.
But if a symlink is made to a directory, then ls -l
on that symlink will list only the symlink.
To list the contents of the symlinked directory we
have to append a / to the symlink.
1 $ ln -s /etc /tmp/etc
2 $ ls -l /tmp/etc
3 lrwxrwxrwx 1 sayan sayan 4 May 23 15:30 /tmp/
etc -> /etc
4 $ ls -l /tmp/etc/ | head -n5
5 total 1956
6 -rw-r--r-- 1 root root 44 Mar 18 21:50
adjtime
7 drwxr-xr-x 3 root root 4096 Nov 17 2023
alsa
8 -rw-r--r-- 1 root root 541 Apr 8 20:53
anacrontab
9 drwxr-xr-x 4 root root 4096 May 19 00:44
apparmor.d
Here I used head to limit the number of lines shown
52: This way of combining
as the directory is large. 52 commands will be discussed
later.
The symlink file stores only the path provided to it
while creating it. This was historically stored in the
data block which was pointed to by the inode. But
this made it slower to access the symlink.
88 1 Essentials of Linux
Modern systems store the symlink value in the
inode itself if its not too large. Inodes usually have
a limited space allocated for each of them, so a
symlink with a small target path is stored directly
in the inode. This is called a fast symlink.
However if the target path is too large, it is stored
in the data block pointed to by the inode. This is
retroactively called a slow symlink.
This act of storing the target path in the inode is
called inlining.
Symlinks do not have a permission set, thus they
always report lrwxrwxrwx as their permissions.
The size reported of a symlink file is independent
of the actual file’s size.
1 $ echo "hello" > file1
2 $ ln -s file1 file2
3 $ ls -l
4 -rw-r--r-- 1 sayan sayan 6 May 23 15:27 file1
5 lrwxrwxrwx 1 sayan sayan 5 May 23 15:27 file2
-> file1
6 $ echo "a very big file" > file2
7 $ ls -l
8 -rw-r--r-- 1 sayan sayan 16 May 23 15:40 file1
9 lrwxrwxrwx 1 sayan sayan 5 May 23 15:27 file2
-> file1
Rather, the size of a symlink is the length of the
target path.
1 $ ln -s /a/very/long/and/non-existant/path
link1
2 $ ln -s small link2
3 $ ls -l
4 total 0
5 lrwxrwxrwx 1 sayan sayan 34 May 23 15:41 link1
-> /a/very/long/and/non-existant/path
6 lrwxrwxrwx 1 sayan sayan 5 May 23 15:41 link2
-> small
1.6 Inodes and Links 89
Notice that the size of link1 is 34, the length of the
target path, and the size of link2 is 5, the length of
the target path.
1.6.6 Symlink vs Hard Links
Figure 1.13: Abstract Repre-
sentation of Symbolic Links
and Hard Links
Figure 1.14: Symbolic Links
and Hard Links
The difference between a symlink and a hard link is
that a symlink is a pointer to the original file, while
a hard link is the same file. Other differences are
listed in Table 1.12.
Table 1.12: Symlink vs Hard Link
Property Symlink Hard Link
File Type Special File Regular File
Size Length of the target path Size of the file
Permissions lrwxrwxrwx Same as the original file
Inode Different Same
Dependency Dependent on the original file Independent of the original file
Creation across filesystems in the same filesystem
Target Can point to directories Can only point to files
1.6.7 Identifying Links
Soft Links:
To identify if a file is a symlink or a hard link, you
can use the ls -l command. If the file is a symlink,
90 1 Essentials of Linux
the first character of the permissions will be l. ls -l
will also show the target of the symlink after a ->
symbol. However, you cannot acertain if a file has a
sofy link pointing to it somewhere else or not.
Hard Links:
To identify if a file is a hard link, you can use the
ls -i command. Hard links will have the same
inode number as each other. The inode number is
the first column of the output of ls -i.
Also the number of links to the file will be more
53: third if using ls -li
than 1. The number of links is the second 53 column
of the output of ls -l.
Even if a hard link is not present in current directory,
you can acertain that a file has a hard link pointing
to it somewhere else using the number of hardlinks
column of ls -l.
1 $ touch file1
2 $ ln -s file1 file2
3 $ ln file1 file3
4 $ ls -li
5 total 0
6 4850335 -rw-r--r-- 2 sayan sayan 0 May 23
15:56 file1
7 4851092 lrwxrwxrwx 1 sayan sayan 5 May 23
15:56 file2 -> file1
8 4850335 -rw-r--r-- 2 sayan sayan 0 May 23
15:56 file3
1.6.8 What are . and ..?
. and .. are special directory entries. They are
hard links to the current directory and the parent
directory respectively. Each directory has a . entry
pointing to itself and a .. entry pointing to its parent
directory.
1.6 Inodes and Links 91
Due to this, the number of hard links to a directory
is exactly equal to the number of subdirectories it
has plus 2.
This is because each subdirectory has a .. entry
pointing to the parent directory, and the parent
directory has a . entry pointing to itself.
So the directory’s name in its parent directory is 1
link, the . in the directory is 1 link, and all the sub-
directories have a .. entry pointing to the directory,
which is 1 link each.
Number of links to a directory = Number of subdirectories+2
This formula always stands because a user cannot
create additional hard links to a directory.
Question 1.6.1 How to list the inodes of a file?
Answer 1.6.1 ls -i will list the inodes of a file.
The inodes are the first column of the output
of ls -i This can be combined with other flags
like -l or -a to show more details.
Question 1.6.2 How to create soft link of a file?
Answer 1.6.2 ln -s sourcefile targetfile will
create a soft link of sourcefile named targetfile
. The soft link is a pointer to the original file.
Question 1.6.3 How to create hard link of a file?
92 1 Essentials of Linux
Answer 1.6.3 ln sourcefile targetfile will cre-
ate a hard link of sourcefile named targetfile.
The hard link is same as the original file. It does
not depend on the original file anymore after
creation. They are equals, both are hardlinks of
each other. There is no parent-child relationship.
The other file can be deleted and the original
file will still work.
Question 1.6.4 How to get the real path of a
file?
Assume three files:
▶ file1 is a soft link to file2
▶ file2 is a soft link to file3
▶ file3 is a regular file
Real path of all these three should be the same.
How to get that?
Answer 1.6.4 realpath filename will give the
real path of filename.
You can also use readlink -f filename to get
the real path.
Command Line Editors 2
2.1 Introduction
Now that we know how to go about navigating the
linux based operating systems, we might want to
view and edit files. This is where command line
editors come in.
Definition 2.1.1 A command line editor is a
type of text editor that operates entirely from
the command line interface. They usually do not
require a graphical user interface or a mouse. a
a This means that CLI editors are the only way to edit files
when you are connected to a remote server. Since remote
servers do not have any graphical server like X11, you
cannot use graphical editors like gedit or kate.
2.1.1 Types of Editors
▶ Graphical Editors: These are editors that re-
quire a graphical user interface. Examples
include gedit ‗ , kate † , vs code, etc.
▶ Command Line Editors: These are editors
that operate entirely from the command line
interface.
We will be only discussing command line editors
in this chapter.
‗ Gedit is the default text editor in GNOME desktop environ-
ment.
† Kate is the default text editor in KDE desktop environment.
94 2 Command Line Editors
2.1.2 Why Command Line Editors?
Command line editors are very powerful and effi-
cient. They let you edit files without having to leave
1: SSH stands for Secure
the terminal. This is usually faster than opening a
Shell. It is a cryptographic graphical editor. In many cases like sshing 1 into a
network protocol for oper- remote server, command line editors are the only
ating network services se- way to edit files. Another reason for the popular-
curely over an unsecured net-
work. This will be discussed
ity of command line editors is that they are very
in detail in a later chapter. lightweight.
2.1.3 Mouse Support
Most command line editors do not have mouse
support, and others do not encourage it. But wont it
be difficult to navigate without a mouse? Not really.
Once you get used to the keyboard shortcuts, you
will find that you can navigate way faster than with
a mouse.
Mouse editors usually require the user to click on
certain buttons, or to follow multi-click procedures
in nested menus to perform certain tasks.
Whereas in keyboard based editors, all the actions
that can be performed are mapped to some key-
board shortcuts.
Modern CLI editors usually also allow the user to
totally customize the keyboard shortcuts.
2: X11 and Wayland are dis-
play servers that are used
to render graphical applica- This being said, most modern CLI editors do have
tions. Although not directly mouse support as well if the user is running them
covered, these can be ex- in a terminal emulator that supports mouse over a
plored in details on the in-
ternet.
X11 or Wayland display server. 2
2.1 Introduction 95
2.1.4 Editor war
Although there are many command line editors
available, the most popular ones are vim and emacs.
Definition 2.1.2 The editor war is the rivalry
between users of the Emacs and vi (now usually
Vim, or more recently Neovim) text editors. The
rivalry has become an enduring part of hacker
culture and the free software community. 3 3: More on this including
the history and the humor
can be found on the internet.
Vim
Vim is a modal editor, meaning that it has different
modes for different tasks. Most editors are modeless,
this makes vim a bit difficult to learn. However, once
familiar with it, it is very powerful and efficient. Vim
heavily relies on alphanumeric keys for navigation
and editing. Vim keybindings are so popular that
many other editors and even some browsers 4 have 4: qutebrowser is a browser
that uses vim-like keybind-
vim-like keybindings.
ings. Firefox and Chromium
Emacs based browsers also have ex-
tensions that provide vim-
Emacs is a modeless editor, meaning that it does like keybindings. These al-
low the user to navigate the
not have different modes for different tasks. Emacs browser using vim-like key-
is also very powerful and efficient. It uses multi-key bindings and without ever
combinations for navigation and editing. touching the mouse.
2.1.5 Differences between Vim and
Emacs
Keystroke execution
Emacs commands are key combinations for which
modifier keys are held down while other keys are
pressed; a command gets executed once completely
typed.
96 2 Command Line Editors
Vim retains each permutation of typed keys (e.g. or-
der matters). This creates a path in the decision tree
which unambiguously identifies any command.
Memory usage and customizability
Emacs executes many actions on startup, many
of which may execute arbitrary user code. This
makes Emacs take longer to start up (even com-
pared to vim) and require more memory. However,
it is highly customizable and includes a large num-
ber of features, as it is essentially an execution
environment for a Lisp program designed for text-
editing.
Vi is a smaller and faster program, but with less
capacity for customization. vim has evolved from
vi to provide significantly more functionality and
customization than vi, making it comparable to
Emacs.
User environment
Emacs, while also initially designed for use on a
console, had X11 GUI support added in Emacs 18,
and made the default in version 19. Current Emacs
GUIs include full support for proportional spac-
ing and font-size variation. Emacs also supports
embedded images and hypertext.
Vi, like emacs, was originally exclusively used in-
side of a text-mode console, offering no graphical
user interface (GUI). Many modern vi derivatives,
e.g. MacVim and gVim, include GUIs. However,
support for proportionally spaced fonts remains
absent. Also lacking is support for different sized
fonts in the same document.
Function/navigation interface
Emacs uses metakey chords. Keys or key chords
can be defined as prefix keys, which put Emacs into
2.1 Introduction 97
a mode where it waits for additional key presses
that constitute a key binding. Key bindings can be
mode-specific, further customizing the interaction
style. Emacs provides a command line accessed
by M-x that can be configured to autocomplete in
various ways. Emacs also provides a defalias macro,
allowing alternate names for commands.
Vi uses distinct editing modes. Under "insert mode",
keys insert characters into the document. Under
"normal mode" (also known as "command mode",
not to be confused with "command-line mode",
which allows the user to enter commands), bare
keypresses execute vi commands.
Keyboard
The expansion of one of Emacs’ backronyms is
Escape, Meta, Alt, Control, Shift, which neatly sum-
marizes most of the modifier keys it uses, only
leaving out Super. Emacs was developed on Space-
cadet keyboards that had more key modifiers than
modern layouts. There are multiple emacs pack-
ages, such as spacemacs or ergoemacs that replace
these key combinations with ones easier to type, or
customization can be done ad hoc by the user.
Vi does not use the Alt key and seldom uses the
Ctrl key. vi’s keyset is mainly restricted to the al-
phanumeric keys, and the escape key. This is an
enduring relic of its teletype heritage, but has the
effect of making most of vi’s functionality accessible
without frequent awkward finger reaches.
Language and script support
Emacs has full support for all Unicode-compatible
writing systems and allows multiple scripts to be
freely intermixed.
98 2 Command Line Editors
Vi has rudimentary support for languages other
than English. Modern Vim supports Unicode if
used with a terminal that supports Unicode.
2.1.6 Nano: The peacemaker amidst the
editor war
Nano is a simple command line editor that is easy to
use. It does not have the steep learning curve of vim
or emacs. But it is not as powerful as vim or emacs
as well. It is a common choice for beginners who
just want to append a few lines to a file or make a
few changes. It is also a non-modal editor like editor
which uses modifier chording like emacs. However,
it mostly uses the control key for this purpose and
has only simple keybindings such as Ctrl+O to save
and Ctrl+X to exit.
2.2 Vim
2.2.1 History
The history of Vim is a very long and interesting
one.
Teletypes
Definition 2.2.1 A teletype (TTY) or a teleprinter
is a device that can send and receive typed mes-
sages from a distance.
2.2 Vim 99
Table 2.1: History of Vim
QED text editor by Butler
Lampson and Peter Deutsch
1967 · · · · · ·•
for Berkeley Timesharing
System.
Ken Thompson and Dennis
1967 · · · · · ·• Ritchie’s QED for MIT CTSS,
Multics, and GE-TSS.
Ken Thompson releases ed -
1969 · · · · · ·•
The Standard Text Editor.
George Coulouris and Patrick
1976 · · · · · ·• Mullaney release em - The
Editor for Mortals.
Bill Joy and Chuck Haley
1976 · · · · · ·• build upon em to make en,
which later becomes ex.
Bill Joy adds visual mode to
1977 · · · · · ·•
ex.
Bill Joy creates a hardlink ‘vi‘
1979 · · · · · ·•
for ex’s visual mode.
Tim Thompson develops a vi
clone for the Atari ST named
1987 · · · · · ·•
STevie (ST editor for VI
enthusiasts).
Bram Moolenaar makes a
1988 · · · · · ·• stevie clone for the Amiga
named Vim (Vi IMitation).
Very early computers used to use teletypes as the
output device. These were devices that used ink and
paper to actually print the output of the computer.
100 2 Command Line Editors
These did not have an automatic refresh rate like
modern monitors. Only when the computer sent a
signal to the teletype, would the teletype print the
output.
Figure 2.1: A Teletype
Due to these restrictions it was not economical or
practical to print the entire file on the screen. Thus
most editors used to print only one line at a time on
the screen and did not have updating graphics.
QED
QED was a text editor developed by Butler Lampson
and Peter Deutsch in 1967 for the Berkeley Time-
sharing System. It was a character-oriented editor
that was used to create and edit text files. It used
to print or edit only one character at a time on the
screen. This is because the computers at that time
used to use a teletype machine as the output device,
and not a monitor.
Figure 2.2: Ken Thompson
Ken Thompson used this QED at Berkeley before
he came to Bell Labs, and among the first things he
did on arriving was to write a new version for the
MIT CTSS system. Written in IBM 7090 assembly
5: Regular expressions are
language, it differed from the Berkeley version most
a sequence of characters that
define a search pattern. Usu- notably in introducing regular expressions 5 for
ally this pattern is used by specifying strings to seek within the document
string searching algorithms being edited, and to specify a substring for which
for "find" or "find and re-
a substitution should be made. Until that time,
place" operations on strings.
This will be discussed in de- text editors could search for a literal string, and
tail in a later chapter. substitute for one, but not specify more general
strings.
Ken not only introduced a new idea, he found
an inventive implementation: on-the-fly compiling.
Ken’s QED compiled machine code for each regular
expression that created a NDFA (non-deterministic
finite automaton) to do the search. He published
2.2 Vim 101
this in C. ACM 11 #6, and also received a patent for
the technique: US Patent #3568156.
While the Berkeley QED was character-oriented,
the CTSS version was line-oriented. Ken’s CTSS
qed adopted from the Berkeley one the notion of
multiple buffers to edit several files simultaneously
and to move and copy text among them, and also the
idea of executing a given buffer as editor commands,
thus providing programmability.
When developing the MULTICS project, Ken Thomp-
son wrote yet another version of QED for that sys-
Figure 2.3: Dennis Ritchie
tem, now in BCPL 6 and now created trees for reg-
ular expressions instead of compiling to machine
language. 6: BCPL ("Basic Combined
Programming Language") is
a procedural, imperative,
In 1967 when Dennis Ritchie joined the project, and structured program-
Bell Labs had slowly started to move away from ming language. Originally
Multics. While he was developing the initial stages intended for writing com-
pilers for other languages,
of Unix, he rewrote QED yet again, this time for
BCPL is no longer in com-
the GE-TSS system in Assembly language. This was mon use.
well documented, and was originally intented to be 7: At that time, systems
published as a paper. 7 did not have a standardized
CPU architecture or a gen-
eralized low level compiler.
ED Due to this, applications
were not portable across sys-
After their experience with multiple implementa- tems. Each machine needed
tions of QED, Ken Thompson wrote ed in 1969. its own version of the appli-
cation to be written from the
scratch, mostly in assembly
This was now written in the newly developed B
language.
language, a predecessor to C. This implementation
The reference manual for GE-
was much simpler than QED, and was line oriented. TSS QED can still be found
It stripped out much of regular expression support, on Dennis Ritchie’s website
and only had the support for *. It also got rid of Much of this information is
taken from his blog.
multiple buffers and executing contents of buffer.
Slowly, with time, Dennis Ritchie created the C
language, which is widely in use even today.
102 2 Command Line Editors
Ken Thompson re-wrote ed in C, and added back
some of the complex features of QED, like back
references in regular expressions.
Ed ended up being the Standard Text Editor for
Unix systems.
Remark 2.2.1 Since all of Bell-Labs and AT&T’s
software was proprietary, the source code for
ed was not available to the public. Thus, the ed
editor accessible today in GNU/Linux, is another
implementation of the original ed editor by the
GNU project.
However, ed was not very user friendly and it was
very terse. Although this was originally intented,
since it would be very slow to print a lot of diagnos-
tic messages on a teletype, slowly, as people moved
to faster computers and monitors, they wanted a
more user friendly editor.
VDU Terminals
Definition 2.2.2 A terminal that uses video dis-
play technology like cathode rat tubes (CRT) or
liquid crystal displays (LCD) to display the ter-
minal output is called a VDU terminal. (Video
Display Unit)
These terminals were able to show video output, in-
stead of just printing the output on paper. Although
initially these were very expensive, and were not a
household item, they were present in the research
parks like Xerox PARC.
Figure 2.4: Xerox Alto, one
of the first VDU terminals EM
with a GUI, released in 1973
2.2 Vim 103
George Coulouris (not the actor) was one of the
people who had access to these terminals in his 8: George named em as
work at the Queen Mary College in London. Editor for Mortals because
Ken Thompson visited his
The drawbacks of ed were very apparent to him lab at QMC while he was
when using on these machines. developing it and said some-
thing like: "yeah, I’ve seen
He found that the UNIX’s raw mode, which was at editors like that, but I don’t
that time totally unused, could be used to give some feel a need for them, I don’t
want to see the state of the
of the convenience and immediacy of feedback for file when I’m editing". This
text editing. made George think that Ken
was not a mortal, and thus he
He claimed that although the ed editor was ground- named it Editor for Mortals.
breaking in its time, it was not very user friendly.
He termed it as not being an editor for mortals.
He thus wrote em in 1976, which was an Editor for
Mortals. 8
Although em added a lot of features to ed, it was
still a line editor, that is, you could only see one line
Figure 2.5: A first generation
at a time. The difference from ed was that it allowed Dectape (bottom right cor-
visual editing, meaning you can see the state of the ner, white round tape) be-
ing used with a PDP-11 com-
line as you are editing it. puter
Whereas most of the development of Multics and
Unix was done in the United States, the develop- 9: A Dectape is a magnetic
ment of em was done in the United Kingdom, in the tape storage device that was
used in the 1970s. It was used
Queen Mary College, which was the first college in
to store data and programs.
the UK to have UNIX.
EN
In the summer of 1976, George Coulouris was a
visiting professor at the University of California,
Berkeley. With him, he had brought a copy of his em
editor on a Dectape 9 and had installed it there on
their departmental computers which were still us-
ing teletype terminals. Although em was designed
for VDU terminals, it was still able to run (albeit
slowly) on the teletype terminals by printing the
current line every time.
Figure 2.6: George Coulouris
104 2 Command Line Editors
There he met Bill Joy, who was a PhD student at
Berkeley. On showing him the editor, Bill Joy was
very impressed and wanted to use it on the PDP-11
computers at Berkeley. The system support team
at Berkley were using PDP-11 which used VDU
Terminals, an environment where em would really
shine.
He explained that ’em’ was an extension of ’ed’ that
gave key-stroke level interaction for editing within
a single line, displaying the up-to-date line on the
screen (a sort of single-line screen editor). This was
achieved by setting the terminal mode to ’raw’ so
that single characters could be read as they were
typed - an eccentric thing for a program to do in
1976.
Although the system support team at Berkeley were
impressed by this editor, they knew that if this was
made available to the general public, it would take
up too much resources by going to the raw mode on
every keypress. But Bill and the team took a copy
of the source code just to see if they might use it.
George then took a vacation for a few weeks, but
when he returned, he found that Bill had taken his
ex as a starting point and had added a lot of features
to it. He called it en initially, which later became
ex.
EX
Bill Joy took inspiration from several other ed clones
as well, and their own tweaks to ed, although the
primary inspiration was em. Bill and Chuck Haley
built upon em to make en, which later became ex.
This editor had a lot of improvements over em, such
as adding the ability to add abbreviations (using the
ab command), and adding keybindings (maps).
Figure 2.7: Bill Joy
2.2 Vim 105
It also added the ability to mark some line using
the k key followed by any letter, and then jump to
that line from any arbitrary line using the ’ key
followed by the letter.
Slowly, with time, the modern systems were able
able to handle the raw mode, and real time editing
more and more. This led to the natural progression,
What if we could see the entire file at once, and not
just one line at a time?
VI
10: This visual mode is not
the same as the visual mode
Bill added the visual mode to ex in 1977. 10 This
in vim.
was not a separate editor, but rather just another 11: This means that it did not
mode of the ex editor. You could open ex in visual take up additional space on
mode using the -v flag to ex. the disk, but was just another
entry in the directory entry
1 $ ex -v filename that pointed to the same in-
ode which stored the ex bi-
This visual mode was the first time a text editor nary path. Upon execution,
was modal. This means that the editor had different ex would detect if it was
called as vi and would start
modes for different tasks. When you want to edit
in visual mode by default.
text, you would go to the insert mode, and type the We have covered hardlinks
text. When you want to navigate, you would go to in Chapter 1.
the normal mode, and use the navigation keys and 12: Xerox PARC has always
other motions defined in vi. been ahead of its time. The
first graphical user inter-
Slowly, as the visual mode became more and more face was developed at Xe-
popular, Bill added a hardlink to ex called vi. 11 rox PARC. The bravo edi-
tor used bitmapped graph-
The modal version of vi was also inspired from ics to display the text, and
had extensive mouse sup-
another editor called bravo, which was developed
port. The overdependence
at Xerox PARC. 12 on the mouse in such an early
time was one of the reasons
If you use vi/vim, you may notice that the key to that the bravo editor was not
exit the insert mode is Esc. This may seem inconve- as popular as vi.
niently placed at the top left corner, but this was 13: Since the placement of
because the original vi was developed on a ADM- the Escape key is inconve-
nient in modern keyboard
3A terminal, which had the Esc key to the left of
layouts, many people remap
the Q key, where modern keyboards have the Tab the Escape key to the Caps
key. 13 Lock key either in vim or in
the operating system itself.
106 2 Command Line Editors
Also, the choice of h,j,k,l for navigation was because
the ADM-3A terminal did not have arrow keys,
rather, it had h,j,k,l keys for navigation.
This can be seen in Figure 2.8.
Figure 2.8: The Keyboard lay-
out of the ADM-3A terminal
Bill Joy was also one of the people working on the
Berkeley Software Distribution (BSD) of Unix.
Thus he bundled vi with the first BSD distribution
of UNIX released in 1978. The pre-installed nature of
vi in the BSD Distribution made it very popular.
However, since both the source code of ed was
restricted by Bell Labs - AT&T, and the source code
of vi was restricted by the University of California,
Berkeley, they could not be modified by the users
or distributed freely.
This gave birth to a lot of clones of vi.
Vi Clones
The vi clones were written because the source code
for the original version was not freely available
until recently. This made it impossible to extend the
functionality of vi. It also precluded porting vi to
other operating systems, including Linux.
▶ calvin: a freeware "partial clone of vi" for use
on MS-DOS. It has the advantages of small
size (the .exe file is only 46.1KB!) and fast
execution but the disadvantage that it lacks
many of the ex commands, such as search
Figure 2.9: Stevie Editor
and replace.
2.2 Vim 107
▶ lemmy: a shareware version of vi implemented
for the Microsoft Windows platforms which
combines the interface of vi with the look and
feel of a Windows application.
▶ nvi: It is a re-implementation of the classic
Berkeley vi editor, derived from the original
4.4BSD version of vi. It is the "official" Berkeley
clone of vi, and it is included in FreeBSD and
the other BSD variants.
▶ stevie: ‘ST Editor for VI Enthusiasts‘ was
developed by Tim Thompson for the Atari ST.
It is a clone of vi that runs on the Atari ST.
Tim Thompson wrote the code from scratch
(not based on vi) and posted its source code
as a free software to comp.sys.atari.st on June
1987. Later it was ported to UNIX, OS/2, and
Amiga. Because of this independence from vi
and ed’s closed source license, most vi clones
would base their work off of stevie to keep it
free and open source.
▶ elvis: Elvis creator, Steve Kirkendall, started
thinking of writing his own editor after Stevie
crashed on him, causing him to lose hours
of work and damaging his confidence in the
editor. Stevie stored the edit buffer in RAM,
which Kirkendall believed to be impractical
on the MINIX operating system. One of Kirk-
endall’s main motivation for writing his own
vi clone was that his new editor stored the
edit buffer in a file instead of storing it in
RAM. Therefore, even if his editor crashed,
the edited text could still be retrieved from
that external file. Elvis was one of the first vi
clones to offer support for GUI and syntax
highlighting.
The clones add numerous new features which make
them significantly easier to use than the original
108 2 Command Line Editors
vi, especially for neophytes. A particularly useful
feature in many of them is the ability to edit files in
multiple windows. This facilitates working on more
than one file at the same time, including cutting
and pasting text among them.
Many of the clones also offer GUI versions of vi that
operate under the X Windows system and can take
advantage of bit-mapped (high resolution) displays
and the mouse.
Vim
Bram Moolenaar, a Dutch programmer, was im-
pressed by STeVIe, a vi clone for the Atari ST. But
he was working with the Commodore Amiga at
that time, and there was no vi clone for the Amiga.
So Bram began working on the stevie clone for the
AmigaOS in 1988.
Figure 2.10: Bram Moole-
naar
Figure 2.11: The inital ver-
sion of Vim, when it was
called Vi IMitation
He released the first public release (v 1.14) in 1991
as visible in Figure 2.11.
Since Vim was based off of Stevie, and not ed or
vi so it could be freely distributed. It was licensed
under a charityware license, named as Vim License.
The license stated that if you liked the software, you
should consider making a donation to a charity of
your choice.
2.2 Vim 109
Moolenaar was an advocate of a NGO based in
Kibaale, Uganda, which he founded to support
children whose parents have died of AIDS. In 1994,
he volunteered as a water and sanitation engineer
for the Kibaale Children’s Centre and made several
return trips over the following twenty-five years.
Later Vim was re-branded as ‘Vi IMproved‘ as seen
in Figure 2.12.
Vim has been in development for over 30 years now,
and is still actively maintained. It has added a lot of
features over the years, such as syntax highlighting,
plugins, completion, PCRE support, mouse support,
etc.
Figure 2.12: Vim 9.0 Start
neovim screen
Recently there have been efforts to modernize the
vim codebase. Since it is more than 30 years old, it
has a lot of legacy code. The scripting language of
vim is also not a standard programming language,
but rather a custom language called vimscript.
To counter this, a new project called neovim has
been started. It uses lua as the scripting language,
and has a lot of modern features like out of the box
support for LSP, 14 better mouse integration, etc. 14: LSP stands for Language
Server Protocol. It is a pro-
In this course, we will be learning only about basic tocol that allows the editor
vi commands and we will be using vim as the to communicate with a lan-
guage server to provide fea-
editor.
tures like autocompletion,
go to definition, etc. This
makes vim more like an IDE.
This book is written using
neovim.
110 2 Command Line Editors
Figure 2.13: Neo Vim Win-
dow editing this book
2.2.2 Ed Commands
Before we move on to Vi Commands, let us first
learn about the basic ed commands. These will also
be useful in vim, since the ex mode of vim is based
on ed/ex where we can directly use ed commands
on our file in the buffer.
Table 2.2: Ed Commands
Description Commands
Show the Prompt P
Command Format [addr[,addr]]cmd[params]
Commands for location 1 . $ \% + - , ; /RE/
Commands for editing fpacdijsmu
Execute a shell command !command
edit a file e filename
read file contents into buffer r filename
read command output into buffer r !command
write buffer to filename w filename
quit q
Commands for location
2.2 Vim 111
Table 2.3: Commands for location
Commands Description
a number like 2 refers to second line of file
. refers to current line
$ refers to last line
% refers to all the lines
+ line after the cursor (current line)
- line before the cursor (current line)
, refers to buffer holding the file or last line in buffer
; refers to current position to end of the file
/RE/ refers line matched by pattern specified by ’RE’
Commands for Editing
Table 2.4: Commands for
Commands Description Editing
f show name of file being edited
p print the current line
a append at the current line
c change the current line
d delete the current line
i insert line at the current position
j join lines
s search for regex pattern
m move current line to position
u undo latest change
Let us try out some of these commands in the ed
editor.
Lets start with creating a file, which we will then
open in the ed editor.
1 $ echo "line-1 hello world
2 line-2 welcome to line editor
3 line-3 ed is perhaps the oldest editor out
there
4 line-4 end of file" > test.txt
This creates a file in the current working directory
112 2 Command Line Editors
with the name test.txt and the contents as given
above.
We invoke ed by using the executable ed and pro-
viding the filename as an argument.
1 $ ed test.txt
2 117
As soon as we run it, you will see a number, which
is the number of characters in the file. The terminal
may seem hung, since there is no prompt, of either
the bash shell, or of the ed editor. This is because
ed is a line editor, and it does not print the contents
of the file on the screen.
Off the bat, we can observe the terseness of the
ed editor since it does not even print a prompt. To
turn it on, we can use the P command. The default
prompt is *.
1 ed test.txt
2 117
3 P
4 *
Now we can see the prompt * is always present,
whenever the ed editor expects a command from
the user.
Lets go to the first line of the file using the 1 com-
mand. We can also go to the last line of the file using
the $ command.
1 *1
2 line-1 hello world
3 *$
4 line-4 end of file
5 *
To print out all the lines of the file, we can use the ,
or \% with p command.
2.2 Vim 113
1 *,p
2 line-1 hello world
3 line-2 welcome to line editor
4 line-3 ed is perhaps the oldest editor out
there
5 line-4 end of file
1 *%p
2 line-1 hello world
3 line-2 welcome to line editor
4 line-3 ed is perhaps the oldest editor out
there
5 line-4 end of file
However, if we use the , command without the p
command, it will not print all the lines. Rather, it
will just move the cursor to the last line and print
the last line.
1 *,
2 line-4 end of file
We can also print any arbitrary line range using the
line numbers separated by a comma and followed
by the p command.
1 *2,3p
2 line-2 welcome to line editor
3 line-3 ed is perhaps the oldest editor out
there
One of the pioneering features of ed was the ability
to search for a pattern in the file. Let us quickly
15: The details of regular ex-
explain the syntax of the search command. 15 pressions will be covered in
a later chapter.
1 */hello/
2 line-1 hello world
We may or may not include the p command after
the last / in the search command.
We can advance to the next line using the + com-
mand.
114 2 Command Line Editors
1 *p
2 line-1 hello world
3 *+
4 line-2 welcome to line editor
And go to the previous line using the - command.
1 *p
2 line-2 welcome to line editor
3 *-
4 line-1 hello world
We can also print all the lines from the current line
to the end of the file using the ;p command.
1 *.
2 line-2 welcome to line editor
3 *;p
4 line-2 welcome to line editor
5 line-3 ed is perhaps the oldest editor out
there
6 line-4 end of file
We can also run arbitrary shell commands using
the ! command.
1 *! date
2 Mon Jun 10 11:36:34 PM IST 2024
3 !
The output of the command is shown to the screen,
however, it is not saved in the buffer.
To read the output of a command into the buffer,
we can use the r command.
1 *r !date
2 32
3 *%p
4 line-1 hello world
5 line-2 welcome to line editor
6 line-3 ed is perhaps the oldest editor out
there
7 line-4 end of file
2.2 Vim 115
8 Mon Jun 10 11:37:42 PM IST 2024
The output after running the r !date command
is the number of characters read into the buffer.
We can then print the entire buffer using the \%p
command.
The read data is appended to the end of the file.
We can write the buffer 16 to the disk using the w 16: Remember that the
buffer is the in-memory copy
command.
of the file and any changes
made to the buffer are not
1 *w
saved to the file until we
2 149
write the buffer to the file.
The output of the w command is the number of
characters written to the file.
To exit ed, we can use the q command.
1 *q
To delete a line, we can use the d command. Lets say
we do not want the date output in the file. We can
re-open the file in ed and remove the last line.
1 $ ed test.txt
2 149
3 P
4 *$
5 Mon Jun 10 11:38:49 PM IST 2024
6 *d
7 *%p
8 line-1 hello world
9 line-2 welcome to line editor
10 line-3 ed is perhaps the oldest editor out
there
11 line-4 end of file
12 *wq
13 117
We can add lines to the file using the a command.
This appends the line after the current line. On
entering this mode, the editor will keep on taking
116 2 Command Line Editors
input for as many lines as we want to add. To end
the input, we can use the . command on a new
line.
1 $ ed test.txt
2 117
3 P
4 *3
5 line-3 ed is perhaps the oldest editor out
there
6 *a
7 perhaps not, since we know it was inspired
from QED
8 which was made multiple times by thompson and
ritchie
9 before ed was made.
10 .
11 *%p
12 line-1 hello world
13 line-2 welcome to line editor
14 line-3 ed is perhaps the oldest editor out
there
15 perhaps not, since we know it was inspired
from QED
16 which was made multiple times by thompson and
ritchie
17 before ed was made.
18 line-4 end of file
19 *
We can also utilize the regular expression support
in ed to perform search and replace operations. This
lets us either search for a fixed string and replace
with another fixed string, or search for a pattern
and replace it with a fixed string.
Let us change hello world to hello universe.
1 *1
2 line-1 hello world
3 *s/world/universe/
4 line-1 hello universe
2.2 Vim 117
5 *%p
6 line-1 hello universe
7 line-2 welcome to line editor
8 line-3 ed is perhaps the oldest editor out
there
9 perhaps not, since we know it was inspired
from QED
10 which was made multiple times by thompson and
ritchie
11 before ed was made.
12 line-4 end of file
13 *
We can print the name of the currently opened file
using the f command.
1 *f
2 test.txt
If we wish to join two lines, we can use the j com-
mand. Let us join lines 4,5, and 6.
1 *4
2 perhaps not, since we know it was inspired
from QED
3 *5
4 which was made multiple times by thompson and
ritchie
5 *6
6 before ed was made.
7 *4,5j
8 *4
9 perhaps not, since we know it was inspired
from QEDwhich was made multiple times by
thompson and ritchie
10 *5
11 before ed was made.
12 *4,5j
13 *4
14 perhaps not, since we know it was inspired
from QEDwhich was made multiple times by
thompson and ritchiebefore ed was made.
118 2 Command Line Editors
15 *
Here we can see that we do the joining in two steps,
first the lines 4 and 5 are joined, and then the newly
modified line 4 and 5 are joined.
We can move a line from its current position to
another line using the m command.
Lets insert a line-0 at the end of the file and then
move it to the beginning of the file.
1 *7
2 line-4 end of file
3 *a
4 line-0 in the beginning, there was light
5 .
6 *8
7 line-0 in the beginning, there was light
8 *m0
9 *1,4p
10 line-0 in the beginning, there was light
11 line-1 hello universe
12 line-2 welcome to line editor
13 line-3 ed is perhaps the oldest editor out
there
14 *
17: The undo command in We can also undo the last change using the u com-
ed is not as powerful as the
undo command in vim. In
mand. 17
vim, we can undo multiple
1 *1
changes using the u com-
2 line-0 in the beginning, there was light
mand. In ed, we can only
undo the last change. If we 3 *s/light/darkness
run the u command multi- 4 line-0 in the beginning, there was darkness
ple times, it will undo the 5 *u
last change of undoing the 6 *.
last change, basically redo- 7 line-0 in the beginning, there was light
ing the last change.
8 *
If search and replace is not exactly what we want,
and we want to totally change the line, we can use
2.2 Vim 119
the c command. It will let us type a new line, which
will replace the current line.
1 *%p
2 line-0 in the beginning, there was light
3 line-1 hello universe
4 line-2 welcome to line editor
5 line-3 ed is perhaps the oldest editor out
there
6 perhaps not, since we know it was inspired
from QEDwhich was made multiple times by
thompson and ritchiebefore ed was made.
7 line-4 end of file
8 *4
9 line-3 ed is perhaps the oldest editor out
there
10 *c
11 line-4 ed is the standard editor for UNIX
12 .
13 *4
14 line-4 ed is the standard editor for UNIX
15 *
Just like the a command, we can also use the i
command to insert a line at the current position.
This will move the current line to the next line.
1 *6
2 line-4 end of file
3 *i
4 before end of file
5 .
6 *6,$p
7 before end of file
8 line-4 end of file
9 *
Finally, we can also number the lines using the n
command.
1 *%p
2 line-0 in the beginning, there was light
120 2 Command Line Editors
3 line-1 hello universe
4 line-2 welcome to line editor
5 line-4 ed is the standard editor for UNIX
6 perhaps not, since we know it was inspired
from QEDwhich was made multiple times by
thompson and ritchiebefore ed was made.
7 before end of file
8 line-4 end of file
9 *%n
10 1 line-0 in the beginning, there was
light
11 2 line-1 hello universe
12 3 line-2 welcome to line editor
13 4 line-4 ed is the standard editor for
UNIX
14 5 perhaps not, since we know it was
inspired from QEDwhich was made multiple
times by thompson and ritchiebefore ed was
made.
15 6 before end of file
16 7 line-4 end of file
17 *
2.2 Vim 121
2.2.3 Exploring Vim
There are a plethora of commands in vim. We wont
be able to cover all of them in this course. Only the
basic commands required to get started with using
vim as your primary editor would be covered. A
detailed tutorial on vim can be found by running
the command vimtutor in your terminal.
1 $ vimtutor
This opens a temporary files that goes through a
lot of sections of vim, explaining the commands
in detail. This opens the text file in vim itself, so
you can actually try out each exercise as and when
you read it. Many exercises are present in this file
to help you remember and master commands. Feel
free to modify the file since it is a temporary file
and any changes made is lost if the command is
re-run.
To open a file in vim, we provide the filename as an
argument to the vim executable.
1 $ vim test.txt
Modal Editor Sometimes the normal mode
is called command mode or
Vim is a modal editor, which means that it has escape mode, since we can
different modes that it operates in. The primary run commands in this mode
modes are: and we press the Esc key to
go to this mode. However,
▶ Normal/Command Mode - The default mode the ex mode is also called
command mode, since we
where we can navigate around the file, and can run ex commands in this
run vim commands. mode. To avoid confusions,
▶ Insert Mode - The mode where we can type we will refer to the naviga-
tional(default) mode as nor-
text into the file.
mal mode, since vim inter-
▶ Visual Mode - The mode where we can select nally also refers to it as nor-
text to copy, cut, or delete. mal mode, and we will refer
▶ Ex Mode - The mode where we can run ex to the ex mode as ex mode.
commands.
122 2 Command Line Editors
Pressing the Esc key takes you to the normal mode
from any other mode.
Figure 2.14: Simplified
Modes in Vim
The figure Figure 2.14 demonstrates how to switch
18: This is a simplified between the different modes in vim. 18
version of the modes in
vim. There are other interim Commands in Ex mode
modes and other keystrokes
that toggle the modes. This Since we are already familar with commands in
is shown in detail in Figure
ed, most of the commands are same/similar in ex
2.15.
mode of vim.
There are many more commands in the ex mode
of vim. Along with implementing the original ed
commands, it also has a lot of additional commands
to make it more integrated with the vim editor,
such as the ability to open split windows, new tabs,
buffers, and perform normal mode commands in
ex mode.
Basic Navigation
The basic keys for moving around in a text file in
vim are the h,j,k,l keys. They move the cursor
one character to the left, down, up, and right re-
spectively. These keys are chosen because they are
19: These were historically present on the home row of the keyboard, and do
chosen because the ADM-3A not require the user to move their hands from the
terminal had these keys for
navigation as seen in Figure
home row to navigate. 19
2.8.
2.2 Vim 123
Figure 2.15: Detailed Modes in Vim
Along with these, we have keys to navigate word
by word, or to move to the next pattern match, or
the next paragraph, spelling error, etc.
Wait you forgot your cursor behind!
All of the above commands move the cursor to the
mentioned location. However, if you want to move
the entire screen, and keep the cursor at the its
current position, you can use the z command along
with t to top the line, b to bottom the line, and z to
center the line.
There are other commands using the Ctrl key that
moves the screen, and not the cursor.
Replacing Text
Usually in other text editors, if you have a word,
phrase, or line which you want to replace with
124 2 Command Line Editors
Table 2.5: Ex Commands in Vim
Key Description
:f show name of file
:p print current line
:a append at current line
:c change current line
:d delete current line
:i insert line at current position
:j join lines
:s search and replace regex pattern in current line
:m move current line to position
:u undo latest change
:w [filename] write buffer to filename
:q quit if no change
:wq write buffer to filename and quit
:x write buffer to filename and quit
:q! quit without saving
:r filename read file contents into buffer
:r !command read command output into buffer
:e filename edit a file
:sp [filename] split the screen and open another file
:vsp [filename] vertical split the screen and open another file
another, you would either press the backspace or
delete key to remove the text, and then type the
new text. However, in vim, there is a more efficient
way to replace text.
Toggling Case
You can toggle the case of a character, word, line,
or any arbitrary chunk of the file using the ~ or the
g~ command.
You might start to see a pattern emerging here.
Many commands in vim do a particular command,
and on which text it operates is determined by
the character followed by it. Such as c to change,
d to delete, y to yank, etc. The text on which the
2.2 Vim 125
Table 2.6: Navigation Commands in Vim
Key Description
h move cursor left
j move cursor down
k move cursor up
l move cursor right
w move to the beginning of the next word
e move to the end of the current word
b move to the beginning of the previous word
\% move to the matching parenthesis, bracket, or brace
0 move to the beginning of the current line
$ move to the end of the current line
/ search forward for a pattern
? search backward for a pattern
n repeat the last search in the same direction
N repeat the last search in the opposite direction
gg move to the first line of the file
G move to the last line of the file
1G move to the first line of the file
1gg move to the first line of the file
:1 move to the first line of the file
{ move to the beginning of the current paragraph
} move to the end of the current paragraph
fg move cursor to next occurrence of ‘g’ in the line
Fg move cursor to previous occurrence of ‘g’ in the line
Table 2.7: Moving the Screen
Key Description Commands in Vim
Ctrl+F move forward one full screen
Ctrl+B move backward one full screen
Ctrl+D move forward half a screen
Ctrl+U move backward half a screen
Ctrl+E move screen up one line
Ctrl+Y move screen down one line
command operates is mentioned using w for the
word, 0 for till beginning of line, etc.
126 2 Command Line Editors
Table 2.8: Replacing Text Commands in Vim
Key Description
r replace the character under the cursor
R replace the character from the cursor till escape is pressed
cw change the word under the cursor
c4w change the next 4 words
C delete from cursor till end of line and enter insert mode
cc delete entire line and enter insert mode
5cc delete next 5 lines and enter insert mode
S delete entire line and enter insert mode
s delete character under cursor and enter insert mode
Table 2.9: Toggling Case Commands in Vim
Key Description
~ toggle the case of the character under the cursor
g~w toggle the case of the word under the cursor
g~0 toggle the case from cursor till beginning of line
g~$ toggle the case from cursor till end of line
g~{ toggle the case from cursor till previous empty line
g~{ toggle the case from cursor till next empty line
g~\% toggle the case from the bracket, brace, or parenthesis till its pair
This is not a coincidence, but rather a design of vim
to make it more efficient to use. The first command
is called the operator command, and the second
command is called the motion command.
Vim follows a operator-count-motion pattern. For
example: d2w deletes the next 2 words. This makes it
very easy to learn and remember commands, since
you are literally typing out what you want to do.
Deleting or Cutting Text
In Vim, the delete command is used to cut text from
the file.
Motion - till, in, around
2.2 Vim 127
Table 2.10: Deleting Text
Key Description Commands in Vim
x delete the character under the cursor
X delete the character before the cursor
5x delete the next 5 characters
dw delete the word under the cursor
d4w delete the next 4 words
D delete from cursor till end of line
dd delete entire line
6dd delete next 6 lines
By now you should notice that dw doesn’t always
delete the word under the cursor. Technically dw
means delete till the beginning of the next word. So
if you press dw at the beginning of a word, it will
delete the word under the cursor. But if your cursor
is in the middle of a word and you type dw, it will
only delete the part of the word till the beginning
of the next word from the cursor position.
To delete the entire word under the cursor, regard-
less of where the cursor is in the word, you can use
the diw this means delete inside word.
However, now you may notice that diw doesn’t
delete the space after the word. This results in two
consequtive spaces, one from the end of the word,
and one from the beginning of the word being left
behind. To delete the space as well, you can use daw
which means delete around word.
This works not just with w but with any other motion
such as delete inside paragraph, which will delete
the entire paragraph under the cursor, resulting
in two empty lines being left behind, and delete
around paragraph, which will delete the entire
paragraph under the cursor, and only one empty
line being left behind.
Try out the same with deleting inside other items,
128 2 Command Line Editors
such as brackets, parenthesis, braces, quotes, etc.
The syntax remains the same, di{, di[, di(, di", di’,
etc.
Yanking and Pasting Text
Yes, copying is called yanking in vim. The command
to yank is y and to paste is p. You can combine y
with all the motions and in and around motions as
earlier. You can also add the count to yank multiple
lines or words.
Table 2.11: Deleting Text
Commands in Vim Key Description
yy yank the entire line
yw yank the word under the cursor
.. ..
. .
p paste the yanked text after the cursor
P paste the yanked text before the cursor
Remark 2.2.2 Important to note that the com-
mands
1 yy
and
1 0y$
are not the same. The first command yanks the
entire line, including the newline character at the
end of the line. The second one yanks the entire
line, but does not include the newline character
at the end of the line. Thus if you directly press
p after the first command, it will paste the line
below the current line, and if you press p after
the second command, it will paste the line at the
end of the current line.
Undo and Redo
The undo command in vim is u and the redo com-
2.2 Vim 129
mand is Ctrl+R. You can undo multiple changes,
unlike ed.
Remark 2.2.3 If you want to use vim as your
primary editor, it is highly recommended to
install the mbbill/undotree plugin. This plugin
will show you a tree of all the changes you have
made in the current buffer, and you can go to
any point in the tree and undo or redo changes.
This becomes very useful if you undo too many
changes and by mistake make a new change,
this changes your branch in undo tree, and you
cannot redo the changes you undid. With the
undotree plugin, you can switch branches of the
undo tree and redo the changes.
Searching and Replacing
The search command in vim is / for forward search
and ? for backward search. You can use the n com-
mand to repeat the last search in the same direction,
and the N command to repeat the last search in the
opposite direction. For example, if you perform for-
ward search then using the n command will search
for the next occurrence of the pattern in the forward
direction, and using the N command will search for
the previous occurrence of the pattern in the back-
ward direction. However if you perform a backward
search using the ? command, then using the n com-
mand will search for the previous occurrence of the
pattern in the backward direction, and using the N
command will search for the next occurrence of the
pattern in the forward direction.
You can also use the * command to search for the
word under the cursor, and the # command to search
for the previous occurrence of the word under the
cursor.
130 2 Command Line Editors
You can perform search and replace using the :s
command. The command takes a line address on
which to perform the search and replace. Usually
you can use the \% address to search in the entire
file, or the .,$ address to search from cursor till the
end of the file.
You can also use any line number to specify the
address range, similar to the ed editor.
1 :[addr]s/pattern/replace/[flags]
The flags at the end of the search and replace com-
mand can be g to replace all occurrences in the line,
and c to confirm each replacement.
The address can be a single line number, a range of
line numbers, or a pattern to search for. The pattern
can be a simple string, or a regular expression.
Some examples of addresses are shown in Table
2.12.
Table 2.12: Address Types in Search and Replace
Key Description
m,n from line m to line n
m line m
m,$ from line m to end of file
.,$ from current line to end of file
1,n from line 1 to line n
/regex/,n from line containing regex to line n
m,/regex/ from line m to line containing regex
.,/regex/ from current line to line containing regex
/regex/,. from line containing regex to current line
1,/regex/ from the first line to line containing regex
/regex/,$ from line containing regex to the last line
/regex1/;/regex2/ from line containing regex1 to line containing regex2
\% entire file
Insert Mode
2.2 Vim 131
Table 2.13: Keys to enter Insert Mode
Key Description
i enter insert mode before the cursor
a enter insert mode after the cursor
I enter insert mode at the beginning of the line
A enter insert mode at the end of the line
o add new line below the current line and enter insert mode
O add new line above the current line and enter insert mode
You can enter insert mode from escape mode using
the keys listed in Table 2.13. In insert mode, if you
want to insert any non-graphical character, you
can do that by pressing Ctrl+V followed by the key
combination for the character. For example, to insert
a newline character, you can press Ctrl+V followed
by Enter.
These are just the basic commands to get you started
with vim. You can refer to vim cheat sheets present
online to get more familiar with the commands.
▶ https://vim.rtorr.com/ is a good text based
HTML cheat sheet for vim.
▶ https://vimcheatsheet.com/ is a paid graph-
ical cheat sheet for vim. 20 20: A free version of the
graphical cheat sheet is
shown in Figure 2.16.
132 2 Command Line Editors
Figure 2.16: Vim Cheat Sheet
2.3 Emacs 133
2.3 Emacs
2.3.1 History
Emacs was mostly developed by Richard Stallman
and Guy Steele.
Table 2.14: History of Emacs
TECO (Tape Editor and
1962 · · · · · ·• Corrector) was developed at
MIT.
Richard Stallman visits
1976 · · · · · ·• Stanford AI Lab and sees Fred
Wright’s E editor.
Guy Steele accumulates a
1978 · · · · · ·• collection of TECO macros
into EMACS.
EMACS becomes MIT’s
1979 · · · · · ·•
standard text editor.
James Gosling writes Gosling
1981 · · · · · ·•
Emacs that runs on UNIX.
Richard Stallman starts GNU
1984 · · · · · ·• Emacs - a free software
alternative to Gosling Emacs.
TECO
TECO was developed at MIT in 1962. It was a text
editor used to correct the output of the PDP-1 com-
puter. It is short for Tape Editor and Corrector.
Unlike most modern text editors, TECO used sep-
arate modes in which the user would either add
text, edit existing text, or display the document One Figure 2.17: Richard Stall-
could not place characters directly into a document man - founder of GNU and
FSF projects
134 2 Command Line Editors
by typing them into TECO, but would instead enter
a character (’i’) in the TECO command language
telling it to switch to input mode, enter the required
characters, during which time the edited text was
not displayed on the screen, and finally enter a char-
acter (<esc>) to switch the editor back to command
mode. This is very similar to how vi works.
Stallman’s Visit to Stanford
In 1976, Richard Stallman visited the Stanford AI
21: What You See Is What
Lab where he saw Fred Wright’s E editor. He was
You Get impressed by E’s WYSIWYG 21 interface where
you do not need to tackle multiple modes to edit
a text file. This is the default behaviour of most
modern editors now. He then returned to MIT
where he found that Carl Mikkelsen had added
to TECO a combined display/editing mode called
Control-R that allowed the screen display to be
updated each time the user entered a keystroke.
Stallman reimplemented this mode to run efficiently
and added a macro feature to the TECO display-
editing mode that allowed the user to redefine any
keystroke to run a TECO program.
Initially TECO was able to only edit the file sequen-
tially, page by page. This was due to earlier memory
restrictions of the PDP-1. Stallman modified TECO
to read the entire file into the buffer, and then edit
the buffer in memory allowing for random access
to the file.
Too Many Macros!
The new version of TECO quickly became popular at
the AI Lab and soon accumulated a large collection
of custom macros whose names often ended in MAC
or MACS, which stood for macro. This quickly got
out of hand as there were many divergent macros,
2.3 Emacs 135
and a user would be totally lost when using a co-
worker’s terminal.
In 1979, Guy Steele combined many of the pop-
ular macros into a single file, which he called
EMACS, which stood for Editing MACroS, or E
with MACroS.
To prevent thousands of forks of EMACS, Stallman
declared that ‘EMACS was distributed on a basis of
communal sharing, which means all improvements
must be given back to me to be incorporated and Figure 2.18: Guy L. Steele
distributed.’ Jr. combined many divergent
TECO with macros to create
Till now, the EMACS, like TECO, ran on the PDP- EMACS
10 which ran the ITS operating system and not
UNIX.
EINE ZWEI SINE and other clones
No, that is not German. These are some of the
popular clones of EMACS made for other operating
22: EINE stands for Eine Is
systems. Not EMACS
EINE 22 was a text editor developed in the late
1970s. In terms of features, its goal was to ‘do what
Stallman’s PDP-10 (original) Emacs does’. Unlike
the original TECO-based Emacs, but like Multics
Emacs, EINE was written in Lisp. It used Lisp 23: ZWEI stands for ZWEI
Machine Lisp. Was Eine Initially
In the 1980s, EINE was developed into ZWEI 23 .
Innovations included programmability in Lisp Ma- These kinds of recursive
acronyms are common in
chine Lisp, and a new and more flexible dou- the nix world. For example,
bly linked list method of internally representing GNU stands for GNU’s Not
buffers. Unix, WINE (A compatibil-
ity layer to run Windows ap-
plications) is short for WINE
SINE 24 was written by Owen Theodore Anderson
Is Not an Emulator.
in 1981. 24: SINE stands for SINE Is
Not EINE
136 2 Command Line Editors
In 1978, Bernard Greenberg wrote a version of
EMACS for the Multics operating system called
Multics EMACS. This used Multics Lisp.
Gosling Emacs
In 1981, James Gosling wrote Gosling Emacs for
UNIX. It was written in C and used Mocklisp, a
language with lisp-like syntax, but not a lisp. It was
not free software.
GNU Emacs
Figure 2.19: James Gosling
- creator of Gosling Emacs In 1983, Stallman started the GNU project to create a
and later Java free software alternatives to proprietary softwares
and ultimately to create a free 25 operating sys-
25: Recall from the previ- tem.
ous chapter that free soft-
ware does not mean a soft- In 1984, Stallman started GNU Emacs, a free soft-
ware provided gratis, but ware alternative to Gosling Emacs. It was written
a software which respects
in C and used a true Lisp dialect, Emacs Lisp as
the user’s freedom to run,
copy, distribute, and modify the extension language. Emacs Lisp was also im-
the software. It is like free plemented in C. This is the version of Emacs that is
speech, not free beer. most popular today and available on most operating
systems repositories.
How the developer’s keyboard influences the edi-
tors they make
Remember that ed was made while using ADM-3A
which looked like Figure 2.20.
Whereas emacs was made while the Knight key-
board and the Space Cadet keyboard were in use,
Figure 2.20: ADM-3A termi-
nal which can be seen in Figure 2.21.
Notice how the ADM-3A has very limited modifier
keys, and does not even have arrow keys. Instead
it uses h,j,k,l keys as arrow keys with a modifier.
This is why vi uses mostly key combinations and a
modal interface. Vi also uses the Esc key to switch
Figure 2.21: Space Cadet between modes, which is present conveniently in
Keyboard
2.3 Emacs 137
place of the Caps Lock or Tab key in modern key-
board layouts.
The Space Cadet keyboard has many modifier keys,
and even a key for the Meta key. This is why emacs
uses many key modifier combinations, and has a
lot of keybindings.
2.3.2 Exploring Emacs
This is not a complete overview of Emacs, or even
its keybindings. A more detailed reference card can
be found on their website.
Opening a File
We can open a file in emacs by providing its filename
as an argument to the emacs executable.
1 $ emacs test.txt
Most of emacs keybindings use modifier keys such
as the Ctrl key, and the Meta key. The Meta key is
usually the Alt key in modern keyboards. In the
reference manual and here, we will be representing
the Meta key as M- and the Ctrl key as C-.
Basic Navigation
These keys are used to move around in the file.
Like vim, emacs also focusses on keeping the hands
free from the mouse, and on the keyboard. All the
navigation can be done through the keyboard.
Exiting Emacs
We can exit emacs either with or without saving
the file. We can also suspend emacs and return to
the shell. This is a keymapping of the shell, and not
of emacs.
Searching Text
138 2 Command Line Editors
Table 2.15: Navigation Com-
mands in Emacs Key Description
C-p move up one line
C-b move left one char
C-f move right one char
C-n move down one line
C-a goto beginning of current line
C-e goto end of current line
C-v move forward one screen
M-< move to first line of the file
M-b move left to previous word
M-f move right to next word
M-> move to last line of the file
M-a move to beginning of current sentence
M-e move to end of current sentence
M-v move back one screen
Table 2.16: Exiting Emacs
Commands Key Description
C-x C-s save buffer to file
C-z suspend emacs
C-x C-c exit emacs and stop it
Emacs can search for a fixed string, or a regular
expression and replace it with another string.
Table 2.17: Searching Text
Commands in Emacs Key Description
C-s search forward
C-r search backward
M-x replace string
Copying and Pasting
Copying can done by marking the region, and then
copying it.
2.4 Nano 139
Table 2.18: Copying and
Key Description Pasting Commands in Emacs
M-backspace cut the word before cursor
M-d cut the word after cursor
M-w copy the region
C-w cut the region
C-y paste the region
C-k cut from cursor to end of line
M-k cut from cursor to end of sentence
2.4 Nano
Although vim and emacs are the most popular
command line text editors, nano is also a very
useful text editor for beginners. It is very simple
and does not have a steep learning curve. Figure 2.22: Nano Text Edi-
tor
It is a non-modal text editor, which means that it
does not have different modes for different actions.
You can directly start typing text as soon as you
open nano.
Although it uses modifier keys to invoke commands,
it does not have a lot of commands as vim or
emacs.
2.4.1 History
26: It is believed that
pine stands for Pine is Not
Pine 26 was a text-based email client developed at Elm, Elm being another text-
the University of Washington. It was created in 1989. based email client. However,
The email client also had a text editor built in called the author clarifies that it
was not named with that in
Pico.
mind. Although if a backro-
nym was to be made, he pref-
Although the license of Pine and Pico may seem ered ‘Pine is Nearly Elm’ or
open source, it was not. The license was restrictive ‘Pine is No-longer Elm’
and did not allow for modification or redistribution.
27
140 2 Command Line Editors
27: Up to version 3.91, the Due to this, many people created clones of Pico
Pine license was similar to with free software licenses. One of the most popular
BSD, and it stated that ‘Per-
mission to use, copy, mod-
clones was TIP (TIP isn’t Pico) which was created
ify, and distribute this soft- by Chris Allegretta in 1999. Later in 2000 the name
ware and its documenta- was changed to Nano. 28 In 2001, nano became part
tion for any purpose and of the GNU project.
without fee to the Univer-
sity of Washington is hereby GNU nano implements several features that Pico
granted ...’ The university
lacks, including syntax highlighting, line numbers,
registered a trademark for
the Pine name with respect regular expression search and replace, line-by-line
to ‘computer programs used scrolling, multiple buffers, indenting groups of
in communication and elec- lines, rebindable key support, and the undoing and
tronic mail applications’ in
March 1995. From version
redoing of edit changes.
3.92, the holder of the copy-
In most modern linux systems, the nano binary is
right, the University of Wash-
ington, changed the license present along with the pico binary, which is actually
so that even if the source a symbolic link to the nano binary.
code was still available, they
did not allow modifications You can explore this by finding the path of the
and changes to Pine to be executable using the which command and long-
distributed by anyone other
listing the executable.
than themselves. They also
claimed that even the old li- 1 $ which pico
cense never allowed distribu-
2 /usr/bin/pico
tion of modified versions.
3 $ ls -l /usr/bin/pico
28: Mathematically, nano is
10−9 or one billionth. and 4 lrwxrwxrwx 1 root root 22 Sep 6 2023 /usr/
pico is 10−12 or one trillionth. bin/pico -> /etc/alternatives/pico
or put relatively, nano is 1000 5 $ ls -l /etc/alternatives/pico
times bigger than pico, al- 6 lrwxrwxrwx 1 root root 9 Sep 6 2023 /etc/
though the size of nano bi- alternatives/pico -> /bin/nano
nary is smaller than pico.
Remark 2.4.1 Note that here we have a sym-
link to another symlink. Theoretically, you can
extend to as many levels of chained symlinks
as you want. Thus, to find the final sink of the
symlink chain, you can use the readlink -f com-
mand or the realpath command.
1 $ realpath $(which pico)
2 /usr/bin/nano
2.4 Nano 141
2.4.2 Exploring Nano
In nano, the Control key is represented by the
^symbol. The Meta or Alt key is represented by the
M-.
File Handling
You can open a file in nano by providing the file-
name as an argument to the nano executable.
1 $ nano test.txt
Table 2.19: File Handling
Key Description Commands in Nano
^S save the file
^O save the file with a new name
^X exit nano
Editing
Nano is a simple editor, and you can do without
learning any more commands than the ones listed
above, but here are some more basic commands for
editing text.
Table 2.20: Editing Com-
Key Description mands in Nano
^K cut current line and save in cutbuffer
M-6 copy current line and save in cutbuffer
^U paste contents of cutbuffer
M-T cut until end of buffer
^] complete current word
M-U undo last action
M-E redo last undone action
^J justify the current paragraph
M-J justify the entire file
M-: start/stop recording a macro
M-; run the last recorded macro
F12 invoke the spell checker, if available
142 2 Command Line Editors
There are many more commands in nano, but they
are omitted from here for brevity. You can find the
complete list of keybindings by pressing ^G key in
You can also find third-party nano, or by running info nano.
cheat sheets online.
2.4.3 Editing A Script in Nano
Since learning nano is mostly to be able to edit a
text file even if you are not familiar with either vim
or emacs, let us try to edit a simple script file to
confirm that you can use nano.
1 $ touch myscript.sh
2 $ chmod u+x myscript.sh
3 $ nano myscript.sh
Now try to write a simple script in the file. An
example script is shown below.
1 #!/bin/bash
2 read -rp ’What is your name? ’ name
3 echo "Hello $name"
4 date=$(date "+%H:%M on a %A")
5 echo "Currently it is $date"
If you do not understand
how the script works, do not Now save the file by pressing ^S
worry. It will be covered in
depth in later chapters.
Remark 2.4.2 In some systems, the ^S key will
freeze the terminal. Any key you press after this
will seem to not have any effect. This is because
it is interpreted as the XOFF and is used to lock
the scrolling of the terminal. To unfreeze the
terminal, press ^Q. In such a system, you can
save the file by pressing ^O and then typing out
the name of the file if not present already, and
pressing Enter. To disable this behaviour, you
can add the line
1 set -ixon
2.4 Nano 143
to your .bashrc file.
and exit nano by pressing ^X.
Now you can run the script by typing
1 $ ./myscript.sh
2 What is your name? Sayan
3 Hello Sayan
4 Currently it is 21:50 on a Tuesday
Now that we are able to edit a text file using text
editors, we are ready to write scripts to solve prob-
lems.
Networking and SSH 3
3.1 Networking
3.1.1 What is networking?
Have you ever tried to get some work done on
a computer while the internet was down? It’s a
nightmare. Modern day computing relies highly on
networking. But what is networking?
Definition 3.1.1 (Networking) A computer net-
work comprises two or more computers that
are connected—either by cables (wired) or wifi
(wireless)—with the purpose of transmitting,
exchanging, or sharing data and resources.
We have been using the computer, and linux, for a
while now but the utility of a computer increases
exponentially when it is connected to a network. It
allows computers to share files and resources, and
to communicate with each other. Current day world
wide web is built on the internet.
Definition 3.1.2 (Internet) Internet is a global
network of networks that connects millions of
computers worldwide. It allows computers to
connect to other computers across the world
through a hierarchy of routers and servers.
Learning about networking and how networking
works is useful, although we won’t be devling into
details in this book. It is left as an exercise for the
146 3 Networking and SSH
reader to explore external resources if they are
interested.
One succinct blogpost
explaining how the internet
works from which the
figure Figure 3.1 is taken 3.1.2 Types of Networks
is available at https:
//www.highspeedinternet.
com/resources/ If the end goal is to connect computers with each
how-the-internet-works other, one naive solution might be to connect all the
computers with each other. Although this might
seem intuitive at first, this quickly gets out of hand
when the number of computers keep increasing.
If we have 𝑛 computers, then the number of con-
nections required to connect all the computers with
Figure 3.1: Types of Net-
works each other is given by the formula
𝑛(𝑛 − 1) 𝑛 2 − 𝑛
=
2 2
This is a quadratic function and grows very quickly.
This means it will cost a lot to connect all the com-
puters to each other. This is applicable not only
in computer networking with physical wires, but
Figure 3.2: Growth Rate of in many other fields. Take an examples of airlines
Different Functions - Note
how quickly 𝑛 2 grows
and airplane routes. If there were 𝑛 airports, then
the number of routes required to connect all the
airports is given by the same formula. This would
be disastrous for the economy and the environment
if we ran so many airplanes daily. So what gives?
Hub and Spoke Network
The solution to this problem is to use a hub and
Figure 3.3: Hub and Spoke
spoke model, where there are one, or multiple,
Model Employed by Airlines central hubs which connect to many other nodes.
Any path from any node to another goes through
one or more hubs. This reduces the number of
connections required to connect all the nodes.
3.1 Networking 147
This is the solution used in airlines, and also in most
computer networks. 1 1: Although computer net-
works use a hub model for
Due to this, networks can be classified into three the local area network, the
broad categories based on their geographical area network of networks, espe-
cially the gateway routers fol-
coverage.
low a mesh model to ensure
redundancy and make the
▶ Local Area Network (LAN): A network that network more robust.
covers a small geographical area, like a home,
office, or a building.
▶ Metropolitan Area Network (MAN): A net-
work that covers a larger geographical area,
like a city or a town.
▶ Wide Area Network (WAN): A network that
covers a large geographical area, like a coun-
try or the entire world.
Figure 3.4: LAN and WAN
connecting to the Internet
To connect these networks to computers and also
to each other, we require some special devices.
3.1.3 Devices in a Network
In computer networks, this hub of the Hub and
Spoke Model can either be a level 1 hub, a level 2
switch, or a level 3 router.
Hub
A hub will simply broadcast the message to all the
connected nodes. This causes a lot of traffic to be
generated and is not very efficient. Hub does not
have the capability to identify which node is who. 2: To understand more
This is called a level 1 hub. 2 about the levels, refer OSI
Model
Switch
A switch is smarter than a hub. It can identify each
device connected to it and can send the packets of
data only to the intended recipient. This is more
efficient than a hub. This is called a level 2 switch
148 3 Networking and SSH
since it uses the level 2 of the OSI model (Data
Link Layer) to identify the devices. This means that
the devices are identified by their MAC addresses.
Using this, you can only communicate with devices
in your local network. This is useful for a home
network or a office network. But we cannot com-
municate with the entire world using this, since it
doesn’t understand IP addresses.
Router
A router is even smarter than a switch. It can under-
stand IP addresses and can route the packets from
one network to another. This is called a level 3 router
since it uses the level 3 of the OSI model (Network
Layer) to identify the devices. This means that the
networks are identified by their IP addresses. This is
Figure 3.5: Hub, Switch,
Router connecting to the In- what we use to connect to the internet. The internet
ternet is nothing but a whole lot of routers communicating
with each other to find the optimal path to send the
packets to its destination. Border Gateway Protocol
(BGP) is the protocol used by routers to commu-
nicate with each other and find the optimal path.
They usually are connected in a mesh network to
ensure redundancy and robustness.
Level 3 Switch
A level 3 switch, or a routing switch, is a switch
with the capabilities of a router. It can understand
the language of IP addresses and can route the
packets to different networks. This is useful in large
organizations where there are many networks and
each network can be divided into subnetworks
You can read more about the called VLANs.
differences between these de-
vices online. So, in short, internet is a network of network of
. . . networks. It is a hierarchical structure connecting
all the computers in the world. Some computers
are connected earlier in the hierarchy (usually the
3.1 Networking 149
ones closer geographically) and some are connected
later.
3.1.4 IP Addresses
So how do routers know which where to send the
data packets? This is where IP addresses come in.
To communicate over the internet, two computers
need to know their public IP addresses. The routers
then finds the optimal path to send the data packets
to the destination network.
IP addresses are of two types: IPv4 and IPv6. The
most common one is IPv4, which is a 32-bit address
represented as four octets separated by dots. For
example,
3: An octet is a group of
162.136.73.21 8 bits. Since an IP address
is 32 bits, it is represented
Here, each octet can take values from 0 to 255. 3 as 4 groups of 8 bits. 8 bits
can only represent numbers
Technically all such combinations are possible IP from 0 to 255. since 28 = 256.
addresses, resulting in 232 = 4 , 294 , 967 , 296 possi- Notice that there are some
ble IP addresses. That is a lot of IP addresses, but groups of zeros in the
not enough for the growing number of devices in address. These can be
compressed by writing
the world. only one zero in place of
multiple zeros in each group.
This is where IPv6 comes in. IPv6 is a 128-bit address Further, any leading zeros
represented as 8 groups of 4 hexadecimal digits can be omitted for each
separated by colons. For example, group. Making the above
address as 2001:db8:85
a3:0:0:8a2e:370:7334.
Further, if there are multiple
groups of zeros, they can
2001 : 0 𝑑𝑏 8 : 85 𝑎 3 : 0000 : 0000 : 8 𝑎 2 𝑒 : 0370 : 7334 be compressed to ::. This
can be used only once in
an address. Doing this,
This results in the above address can
be compressed further to
2128 = 340282366920938463463374607431768211456 2001:db8:85a3::8a2e
:370:7334.
150 3 Networking and SSH
possible IP addresses, which is a lot more than
IPv4.
3.1.5 Subnetting
Legacy Classes
10.125.42.62 → 00001010.01111101.00101010.00111110
Recall that an IP address, although represented as
four octets, is actually a 32-bit address. This means
that in binary form, an IP address is a string of 32
1s and 0s. Using the first four bits, we can classify
an IP address into five classes.
▶ Class A: The first bit is ‘0‘. The IP addresses
in the range 0.0.0.0 to 127.255.255.255.
▶ Class B: The first two bits are ‘10‘. IP addresses
in the range 128.0.0.0 to 191.255.255.255.
▶ Class C: The first three bits are ‘110‘. IP ad-
dresses in the range 192.0.0.0 to 223.255.255.255.
▶ Class D: The first four bits are ‘1110‘. IP ad-
dresses in the range 224.0.0.0 to 239.255.255.255.
These are reserved for multicast addresses.
▶ Class E: The first four bits are ‘1111‘. IP ad-
dresses in the range 240.0.0.0 to 255.255.255.255.
These are reserved for experimental purposes.
However, these classes do not simply assign an IP
to each machine. They are further divided into the
network part and the host part.
Class A assigns the first octet to the network part,
this is used to identify which network the machine
is in. The remaining three octets are used to identify
the host in that network. This means that a class A
3.1 Networking 151
network can have 224 − 2 = 16 , 777 , 214 hosts. How-
ever, there can only be 27 = 128 class A networks.
Thus, class A networks are used by large organiza-
tions which have many hosts, but not many large
organizations exist, so 128 networks are enough.
Similarly, class B assigns the first two octets to
identify the network and the remaining two octets to
identify the host. This means that a class B network
can have 216 − 2 = 65 , 534 hosts. And there can be
214 = 16 , 384 class B networks. These are used by
medium-sized organizations, which are plenty in
number, and have a moderate number of hosts.
The same goes for class C networks, where the
first three octets are used to identify the network
and the last octet is used to identify the host. This
network can have 28 − 2 = 254 hosts. And there
can be 221 = 2 , 097 , 152 class C networks. These are
used by small organizations, which are plenty in
number, and have a small number of hosts.
Subnet Masks
Definition 3.1.3 (Subnetting) The process of di-
viding a network into smaller network sections
is called subnetting.
Usually, each network has only one subnet, which
contains all the hosts in that network. However,
the network can be further divided into smaller
subnetworks, each containing a subset of the hosts.
This is useful in large organizations where the
network is divided into departments, and each
department is given a subnetwork.
To indicate which part of the IP address is the
network part and which part is the host part, we
use a subnet mask. A subnet mask is a 32-bit number
152 3 Networking and SSH
where the first 𝑛 bits are 1s and the remaining bits
are 0s. The number of 1s in the subnet mask indicates
the number of bits used to identify the network. For
example, for the IP Address 192.168.0.15, it can be
written in binary as
11000000 − 10101000 − 00000000 − 00001111
As we know, it belongs to the class C network,
where the first three octets are used to identify the
network, and the rest is used to identify the host.
So the default network mask is
11111111 − 11111111 − 11111111 − 00000000
or 255.255.255.0 in decimal. The network portion
4: The bitwise AND oper- of the IP address is found by taking the bitwise
ation is a binary operation
that takes two equal-length
AND 4 of the IP address and the subnet mask. This
binary representations and results in the network address
performs the logical AND
operation on each pair of cor- 11000000 − 10101000 − 00000000 − 00000000
responding bits. The result
in each position is 1 if the first
which is 192.168.0.0 and the host address is 0000
bit is 1 and the second bit is
1; otherwise, the result is 0. 1111 which is 15.
5: That is, if we have less
However, if we do not require all the 8 bits in the
than 254 hosts in the net-
work. host space 5 then we can use some of the initial bits
of the host space to identify subnetworks. This is
called subnetting.
6: We subtract 2 from the
total number of hosts to ac- For example, the netmask of 255.255.255.0 leaves
count for the network ad- 8 bits for the host space, or 28 − 2 = 254 6 hosts. If
dress and the broadcast ad- we want to split this network into two subnets, we
dress, which are the first(0)
and the last(255) addresses can use the MSB of the host space for representing
in the network. the subnetworks. This results in each subnet having
27 − 2 = 126 hosts.
3.1 Networking 153
Remark 3.1.1 Observe that we effectively lost
two available addresses from the total number of
hosts in the network. Earlier we could have 254
hosts, but now we can have only 126 × 2 = 252
hosts. This is because each subnet also reserves
the first and the last address for the network
address and the broadcast address.
To do this, we change the subnet mask to
11111111 − 11111111 − 11111111 − 10000000
which can be represented as 255.255.255.128.
This gives us two subnets, one with the address
range of 192.168.0.1 to 192.168.0.127, and another
of 192.168.0.129 to 192.168.0.255.
3.1.6 Private and Public IP Addresses
But what if we want to communicate with comput-
ers in our local network? This is where private IP
comes in. Some ranges of IP addresses are reserved
for private networks. These are not routable over the
internet. Each LAN has a private IP address range, 7: Dynamic Host Configu-
ration Protocol (DHCP) is a
and the router translates these private addresses network management proto-
to the public IP address when sending the packets col used on Internet Protocol
over the internet. The assignment of these private networks whereby a DHCP
IP addresses is done by the DHCP server 7 in the server dynamically assigns
an IP address and other net-
router. work configuration parame-
ters to each device on a net-
work so they can communi-
Each class of networks has a range of IP addresses
cate with other IP networks.
that are reserved for private networks.
154 3 Networking and SSH
Table 3.1: Private IP Address Ranges
Class Network Bits Address Range Number of Addresses
Class A 8 - 10.255.255.255
10.0.0.0 16,777,216
Class B 12 172.16.0.0 - 172.31.255.255 1,048,576
Class C 16 192.168.0.0 - 192.168.255.255 65,536
3.1.7 CIDR
However, this practice of subdividing IP addresses
into classes is a legacy concept, and not followed
anymore. Instead, we use CIDR (Classless Inter-
Domain Routing) to announce how many bits are
used to identify the network and how many bits
are used to identify the host.
For example, we could express the idea that the
IP address 192.168.0.15 is associated with the net-
mask 255.255.255.0 by using the CIDR notation of
192.168.0.15/24. This means that the first 24 bits of
the IP address given are considered significant for
the network routing.
This is helpful because not all organizations fit into
the tight categorization of the legacy classes.
3.1.8 Ports
Ports usually refer to physical holes in a computer
where you can connect a cable. However, in net-
working, ports refer to logical endpoints for com-
munication.
Definition 3.1.4 (Port) A port or port number
is a number assigned to uniquely identify a
connection endpoint and to direct data to a
3.1 Networking 155
specific service. At the software level, within an
operating system, a port is a logical construct
that identifies a specific process or a type of
network service.
For any communication between two computers,
the data needs to be sent to a specific port. This is
because there are multiple services running on a
computer, and the operating system needs to know
which service to direct the data to.
There are 216 = 65 , 536 ports available for use. How-
ever, the first 1024 ports are reserved for well-known
services. These are called the well-known ports
and are used by services like HTTP, FTP, SSH, etc.
The well known ports can be found in Table 3.2.
Other ports, from 1024 to 49151, are registered
ports, these ports can be registered with the Internet
Assigned Numbers Authority (IANA) by anyone
who wants to use them for a specific service.
Ports from 49152 to 65535 are dynamic ports, these
are used by the operating system for temporary
connections and are not registered.
Whenever you send a request to a web server for
8: or 443 for HTTPS.
example, the request is sent to the server’s IP ad-
dress and the port number 80 which is the default
port for HTTP 8 but the port number from your (the
client) side is usually a random port number from
the dynamic port range. This is a short-lived port
number and is used to establish a connection with
the server.
These kinds of ports which are short-lived and
used to establish a connection are called ephemeral
ports.
156 3 Networking and SSH
Table 3.2: Well-known Ports
Port Number Service Protocol
20 File Transfer Protocol (FTP) Data Transfer TCP
21 File Transfer Protocol (FTP) Command Control TCP
22 Secure Shell (SSH) Secure Login TCP
23 Telnet remote login service TCP
25 Simple Mail Transfer Protocol (SMTP) TCP
53 Domain Name System (DNS) service TCP/UDP
67, 68 Dynamic Host Configuration Protocol (DHCP) UDP
80 Hypertext Transfer Protocol (HTTP) TCP
110 Post Office Protocol (POP3) TCP
119 Network News Transfer Protocol (NNTP) TCP
123 Network Time Protocol (NTP) UDP
143 Internet Message Access Protocol (IMAP) TCP
161 Simple Network Management Protocol (SNMP) UDP
194 Internet Relay Chat (IRC) TCP
443 HTTP Secure (HTTPS) HTTP over TLS/SSL TCP
546, 547 DHCPv6 IPv6 version of DHCP UDP
3.1.9 Protocols
Definition 3.1.5 (Protocols) In computing, a
protocol is a set of rules that define how data is
transmitted between devices in a network.
There are many protocols used in networking, some
of the most common ones are
▶ HTTP: HyperText Transfer Protocol
▶ HTTPS: HyperText Transfer Protocol Secure
▶ FTP: File Transfer Protocol
▶ SSH: Secure Shell
▶ SMTP: Simple Mail Transfer Protocol
▶ POP3: Post Office Protocol
▶ IMAP: Internet Message Access Protocol
▶ DNS: Domain Name System
▶ DHCP: Dynamic Host Configuration Protocol
3.1 Networking 157
▶ NTP: Network Time Protocol
▶ SNMP: Simple Network Management Proto-
col
▶ IRC: Internet Relay Chat
▶ BGP: Border Gateway Protocol
▶ TCP: Transmission Control Protocol
▶ UDP: User Datagram Protocol
These protocols act on the different layers of the
OSI model. For example, HTTP, HTTPS, FTP, SSH,
etc. are application layer protocols, while TCP, UDP,
etc. are transport layer protocols.
3.1.10 Firewalls
Definition 3.1.6 (Firewall) A firewall is a net-
work security system that monitors and controls
incoming and outgoing network traffic based
on predetermined security rules.
A fireewall acts as a barrier between your computer
and the internet. It monitors the incoming and out-
going traffic and blocks any traffic that does not
meet the security rules. This is useful to prevent
unauthorized access to your computer and to pre-
vent malware from entering your computer and/or
communicating with the outside world.
1 $ sudo ufw enable # Enable the firewall
2 $ sudo ufw allow 22 # Allow SSH
3 $ sudo ufw allow 80 # Allow HTTP
4 $ sudo ufw allow 443 # Allow HTTPS
5 $ sudo ufw status # Check the status of the
firewall
158 3 Networking and SSH
3.1.11 SELinux
We can have additional security by using SELinux
in addition to the firewall. SELinux is a security
module that provides access control security poli-
cies. SELinux is short for Security-Enhanced Linux.
It provides a flexible Mandatory Access Control
(MAC) that restricts the access of users and pro-
cesses to files and directories.
Least Privilege Principle
Definition 3.1.7 (Least Privilege Principle) The
principle of least privilege (POLP) is an impor-
tant concept in computer security, promoting
minimal user profile privileges on computers
based on users’ job necessity.
This principle states that a user should have only
the minimum privileges required to perform their
job. This principle is applied throughout linux, and
also in SELinux.
You can check if SELinux is enabled by running
1 $ sestatus
If SELinux is enabled, you can check the context of
a file or a directory using
1 $ ls -lZ
However, if SELinux is not enabled, it will show a ?
in the context.
If SELinux is enabled, you can set the context of a
file or a directory using the chcon command.
RBAC Items
3.1 Networking 159
Definition 3.1.8 (Role-Based Access Control
(RBAC)) Role-Based Access Control (RBAC) is
a policy-neutral access control mechanism de-
fined around roles and privileges. The compo-
nents of RBAC such as role-permissions, user-
role, and role-role relationships make it simple
to perform user assignments.
SELinux uses the concept of RBAC to control the
access of users and processes to files and directo-
ries.
There are four components in the SELinux context
that are used to control the access of users and
processes to files and directories, as shown in Figure
3.6.
Figure 3.6: SELinux Context
▶ User: The user who is trying to access the file
or directory.
▶ Role: The role of the user, which defines the
permissions of the user.
▶ Type: The type of the file or directory.
▶ Domain: The domain 9 of the process trying 9: or type or sensitivity
to access the file or directory.
Modes of SELinux
▶ Enforcing: In this mode, SELinux is enabled
and actively enforcing the security policies.
▶ Permissive: In this mode, SELinux is enabled
but not enforcing the security policies. It logs
the violations but does not block them.
160 3 Networking and SSH
▶ Disabled: In this mode, SELinux is disabled
and not enforcing any security policies.
You can change the mode of SELinux by editing the
/etc/selinux/config file.
1 $ sudo vim /etc/selinux/config
Tools for SELinux
▶ sestatus: Check the status of SELinux.
▶ semamange: Manage the SELinux policy.
▶ restorecon: Restore the context of files and
directories.
3.1.12 Network Tools
There are a lot of tools in GNU/Linux used for man-
aging, configuring, and troubleshooting networks.
Some of the important tools are listed in Table 3.3.
Table 3.3: Network Tools
Tool Description
ip Show / manipulate routing, devices, policy routing and tunnels
ping To see if the remote machine is up
traceroute Diagnostics the hop timings to the remote machine
nslookup Ask for conversion of IP address to name
dig DNS lookup utility
netstat Print network connections
mxtoolbox Public accessibility of your server
whois Information about the domain
nmap Network port scanner
wireshark Network protocol analyzer and packet sniffer
ip
To find out the private IP address of the NICs of
your system, you can run the ip addr command.
10: ip a also works. 10
3.1 Networking 161
1 $ ip addr
2 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc
noqueue state UNKNOWN group default qlen
1000
3 link/loopback 00:00:00:00:00:00 brd
00:00:00:00:00:00
4 inet 127.0.0.1/8 scope host lo
5 valid_lft forever preferred_lft forever
6 inet6 ::1/128 scope host noprefixroute
7 valid_lft forever preferred_lft forever
8 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu
1500 qdisc fq_codel state UP group
default qlen 1000
9 link/ether 1c:1b:0d:e1:5d:61 brd ff:ff:ff:
ff:ff:ff
10 altname enp3s0
11 inet 192.168.0.109/24 brd 192.168.0.255
scope global dynamic eno1
12 valid_lft 7046sec preferred_lft 7046sec
13 inet6 fe80::68e2:97e0:38ec:4abc/64 scope
link noprefixroute
14 valid_lft forever preferred_lft forever
Here you can see there are two interfaces, lo and
eno1. The lo interface is the loopback interface, and
the eno1 interface is the actual network interface.
The IP address of the lo interface is usually always
127.0.0.1. This address is used to refer to the same
system in terms of IP address without knowing the
actual private IP of the system in the LAN.
The IP address of the eno1 interface is the private
IP address allocated by your router. This is not
your public IP address, which is the address of
your router on the internet. Usually public IPs are
statically assigned by ISPs and are not changed
often. It is configured in your router.
Private IPs however often needs to be assigned dy-
namically since devices can connect and disconnect
162 3 Networking and SSH
from the network at any time. This is done by the
DHCP server in your router.
Remark 3.1.2 The NIC name can be different in
different systems. For a ethernet connection, it
is usually eno1 or eth0 which is the legacy name.
For a wifi NIC, it is usually wlan0.
Remark 3.1.3 Earlier the tool used to check the
network status was ifconfig. However, this tool
is deprecated now and should not be used. The
new tool to check the network status is ip.
ping
The ping command is used to check if a remote
11: Internet Control Mes-
sage Protocol (ICMP) is a server is up and running. It sends an ICMP 11 packet
supporting protocol in the to the remote server and waits for a response.
Internet protocol suite. It is
used by network devices, in-
cluding routers, to send error Remark 3.1.4 Only a positive response from the
messages and operational in- server indicates that the server is up and run-
formation indicating success
ning. A negative response does not necessarily
or failure when communicat-
ing with another IP address. mean that the server is down. Servers can be
configured to not respond to ICMP packets.
1 $ ping -c 4 google.com # Send 4 ICMP packets
to google.com
2 PING google.com (172.217.163.206) 56(84) bytes
of data.
3 64 bytes from maa05s06-in-f14.1e100.net
(172.217.163.206): icmp_seq=1 ttl=114 time
=45.6 ms
4 64 bytes from maa05s06-in-f14.1e100.net
(172.217.163.206): icmp_seq=2 ttl=114 time
=45.4 ms
5 64 bytes from maa05s06-in-f14.1e100.net
(172.217.163.206): icmp_seq=3 ttl=114 time
3.1 Networking 163
=45.3 ms
6 64 bytes from maa05s06-in-f14.1e100.net
(172.217.163.206): icmp_seq=4 ttl=114 time
=45.8 ms
7
8 --- google.com ping statistics ---
9 4 packets transmitted, 4 received, 0% packet
loss, time 3004ms
10 rtt min/avg/max/mdev =
45.316/45.524/45.791/0.181 ms
The response of the ping command shows the time
taken for the packet to reach the server and also the
resolved IP address of the server.
nslookup
Another tool to lookup the associated IP address of
a domain name is the nslookup command.
1 $ nslookup google.com
2 Server: 192.168.0.1
3 Address: 192.168.0.1#53
4
5 Non-authoritative answer:
6 Name: google.com
7 Address: 172.217.163.206
8 Name: google.com
9 Address: 2404:6800:4007:810::200e
Here you can see the resolved IP address of the
domain is 172.217.163.206. If you copy this IP ad-
dress and paste it in your browser, you can see that
the website of google opens up. The second address
returned is the IPv6 IP Address.
The first lines mentioning the Server is the DNS
Server which returned the resolution of the IP
address from the queried domain name.
164 3 Networking and SSH
Remark 3.1.5 Notice that the DNS Server men-
tioned in the above output is actually a private
IP. This is the IP address of the router in the LAN
which acts as the DNS Server cache. However
if you type the domain of a website which you
have not visited, or have visited long ago into
nslookup, then the DNS Server mentioned will
be the public address of the DNS Server, which
might be your ISP’s DNS Server, or some other
public DNS Server.
You can also use mxtoolbox to check the IP address
of your server from the public internet.
dig
Another tool to lookup the associated IP address
of a domain name is the dig command. It can also
reverse lookup the IP address to find the associated
domain name.
1 $ dig google.com
2
3 ; <<>> DiG 9.18.27 <<>> google.com
4 ;; global options: +cmd
5 ;; Got answer:
6 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR
, id: 31350
7 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1,
AUTHORITY: 4, ADDITIONAL: 9
8
9 ;; OPT PSEUDOSECTION:
10 ; EDNS: version: 0, flags:; udp: 4096
11 ; COOKIE: 3
e5ff6a57c0fe2b3b5ce91b3666ae859ec9b6471261cecef
(good)
12 ;; QUESTION SECTION:
13 ;google.com. IN A
14
15 ;; ANSWER SECTION:
3.1 Networking 165
16 google.com. 50 IN A 172.217.163.206
17
18 ;; AUTHORITY SECTION:
19 google.com. 162911 IN NS ns1.google.com
.
20 google.com. 162911 IN NS ns3.google.com
.
21 google.com. 162911 IN NS ns4.google.com
.
22 google.com. 162911 IN NS ns2.google.com
.
23
24 ;; ADDITIONAL SECTION:
25 ns2.google.com. 163913 IN A
216.239.34.10
26 ns4.google.com. 163913 IN A
216.239.38.10
27 ns3.google.com. 337398 IN A
216.239.36.10
28 ns1.google.com. 340398 IN A
216.239.32.10
29 ns2.google.com. 163913 IN AAAA
2001:4860:4802:34::a
30 ns4.google.com. 163913 IN AAAA
2001:4860:4802:38::a
31 ns3.google.com. 2787 IN AAAA
2001:4860:4802:36::a
32 ns1.google.com. 158183 IN AAAA
2001:4860:4802:32::a
33
34 ;; Query time: 3 msec
35 ;; SERVER: 192.168.0.1#53(192.168.0.1) (UDP)
36 ;; WHEN: Thu Jun 13 18:18:52 IST 2024
37 ;; MSG SIZE rcvd: 331
And we can then feed the IP address to dig again,
to find the domain name associated with the IP
address.
1 $ dig -x 172.217.163.206
2
166 3 Networking and SSH
3 ; <<>> DiG 9.18.27 <<>> -x 172.217.163.206
4 ;; global options: +cmd
5 ;; Got answer:
6 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR
, id: 15781
7 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1,
AUTHORITY: 4, ADDITIONAL: 9
8
9 ;; OPT PSEUDOSECTION:
10 ; EDNS: version: 0, flags:; udp: 4096
11 ; COOKIE: 78
a3c6e4e4b103f501380f62666ae89b3a3b52e8be388fe0
(good)
12 ;; QUESTION SECTION:
13 ;206.163.217.172.in-addr.arpa. IN PTR
14
15 ;; ANSWER SECTION:
16 206.163.217.172.in-addr.arpa. 83966 IN PTR
maa05s06-in-f14.1e100.net.
17
18 ;; AUTHORITY SECTION:
19 217.172.in-addr.arpa. 78207 IN NS ns4.
google.com.
20 217.172.in-addr.arpa. 78207 IN NS ns2.
google.com.
21 217.172.in-addr.arpa. 78207 IN NS ns1.
google.com.
22 217.172.in-addr.arpa. 78207 IN NS ns3.
google.com.
23
24 ;; ADDITIONAL SECTION:
25 ns1.google.com. 340332 IN A
216.239.32.10
26 ns2.google.com. 163847 IN A
216.239.34.10
27 ns3.google.com. 337332 IN A
216.239.36.10
28 ns4.google.com. 163847 IN A
216.239.38.10
29 ns1.google.com. 158117 IN AAAA
2001:4860:4802:32::a
3.1 Networking 167
30 ns2.google.com. 163847 IN AAAA
2001:4860:4802:34::a
31 ns3.google.com. 2721 IN AAAA
2001:4860:4802:36::a
32 ns4.google.com. 163847 IN AAAA
2001:4860:4802:38::a
33
34 ;; Query time: 3 msec
35 ;; SERVER: 192.168.0.1#53(192.168.0.1) (UDP)
36 ;; WHEN: Thu Jun 13 18:19:58 IST 2024
37 ;; MSG SIZE rcvd: 382
Note that the answer we got after running google.com
through dig and then through dig -x (maa05s06-in-
f14.1e100.net) is different from the original domain
name.
This is because the domain name is resolved to an
IP address, and then the IP address is resolved to
a different domain name. This is because the the
domain name is actually an alias to the cannonical
name.
Remark 3.1.6 The IP address you would get
by running dig or nslookup on google would
be different from the IP address you get when
using mxtoolbox. This is because google is a
large company and they have multiple servers
which are load balanced. So someone in India
might get a different IP address compared to
someone in the US.
To get the output of dig in a more readable and
concise format, you can use the +short or +noall
option.
1 $ dig +noall +answer google.com
2 google.com. 244 IN A 172.217.163.206
netstat
168 3 Networking and SSH
The netstat command is used to print network
connections, routing tables, interface statistics, mas-
querade connections, and multicast memberships.
It is useful to find what connections are open on
your system, and what ports are being used by
which applications.
1 $ netstat | head
2 Active Internet connections (w/o servers)
3 Proto Recv-Q Send-Q Local Address
Foreign Address State
4 tcp 0 0 rex:53584
24.224.186.35.bc.:https TIME_WAIT
5 tcp 0 0 rex:56602
24.224.186.35.bc.:https TIME_WAIT
6 tcp 0 0 localhost:5037
localhost:43267 TIME_WAIT
7 tcp 0 0 localhost:5037
localhost:46497 TIME_WAIT
8 tcp 0 0 rex:35198
24.224.186.35.bc.:https TIME_WAIT
9 tcp 0 0 rex:44302
24.224.186.35.bc.:https TIME_WAIT
10 tcp 0 0 localhost:5037
localhost:55529 TIME_WAIT
11 tcp 0 0 localhost:5037
localhost:38005 TIME_WAIT
3.2 SSH 169
3.2 SSH
3.2.1 What is SSH?
The Secure Shell (SSH) Protocol is a protocol for se-
cure communication between two computers over 12: like the internet.
a compromised or untrusted network. 12 SSH uses
encryption and authentication to secure the com-
munication between the two computers.
SSH is now the ubiquitous protocol for secure re-
mote access to servers, and is used by system ad-
ministrators all over the world to manage their
servers.
SSH lets a user of a computer to log into the com-
puter from another computer over the network, to
execute any command in the terminal that they
have access to.
SSH can also be used to transfer files between two
computers using the scp command.
3.2.2 History
It was initially developed by Tatu Ylönen in 1995
as a replacement for the insecure Telnet and FTP
protocols when he found that someone had installed
a packet sniffer on the server of his university.
There are multiple implementations of the SSH pro-
tocol, the most popular being OpenSSH, developed
by the OpenBSD project. This is the implementation
that is used in most of the linux distributions as
well. Figure 3.7: Symmetric En-
cryption
170 3 Networking and SSH
3.2.3 How does SSH work?
SSH works by using symmetric and asymmetric
encryption. The data packets sent over the network
are enctypted, usually using AES symmetric en-
cryption. This ensures that even if the data packets
are intercepted by a man-in-the-middle attacker,
they cannot be read since they are encrypted.
To login into a remote server, all you need to do
is provide the username and the IP address or the
domain name of the server to the ssh command.
1 $ ssh username@ipaddress
OR
Figure 3.8: Symmetric En-
cryption
1 $ ssh username@domainname
SSH allows user to login to a remote server using
their username and password, but this is not en-
couraged since it lets the user to be vulnerable to
brute-force attacks.
Another way to authenticate is by using public-
private key pairs.
3.2.4 Key-based Authentication
One of the most powerful features of SSH is its
ability to use public-private key pairs for authenti-
cation. In our course, we emphasize the importance
Figure 3.9: Asymmetric En-
cryption of this method. Instead of relying on passwords,
which can be vulnerable to brute-force attacks, a
pair of cryptographic keys is used. The public key
is stored on the server, while the private key is kept
secure on your local machine. This ensures a highly
secure and convenient way of accessing remote
3.2 SSH 171
servers without the need for constantly entering
passwords.
3.2.5 Configuring your SSH keys
For this course, it is a must for you to not only create,
but also understand SSH keys. Let us quickly see
how to create a ssh key-pair which can be used to
login into a remote server.
We need to use the ssh-keygen command to create
a new public-private key pair.
1 $ ssh-keygen
2 Generating public/private ed25519 key pair.
3 Enter file in which to save the key (/home/
test1/.ssh/id_ed25519):
Here you have to either simply press enter to con-
tinue with the default location, or you can also
type a custom location where you want to save
the key. If this is your first time creating keys, it is
recommended to use the default location.
Remark 3.2.1 There are multiple algorithms
that can be used to generate a key pair. The most
common ones are RSA, DSA, and ED25519. The
ED25519 algorithm is the new default algorithm
used by OpenSSH since it is shorter yet more
secure than RSA. If you have an outdated version
of OpenSSH, you might get the default RSA
algorithm. To change the algorithm, you can use
the -t flag along with the ssh-keygen command.
1 $ ssh-keygen -t rsa
will create a RSA key pair and using
1 $ ssh-keygen -t ed25519
will create a ED25519 key pair.
172 3 Networking and SSH
Next, it will ask you to enter a passphrase. You can
enter a passphrase for added security, or you can
simply press enter to continue without a passphrase.
If you do add a passphrase, you will have to always
enter the passphrase whenever you use the key.
We can continue without a passphrase for now by
pressing enter.
1 Enter passphrase (empty for no passphrase):
2 Enter same passphrase again:
3
4 Your identification has been saved in /home/
username/.ssh/id_ed25519
5 Your public key has been saved in /home/
username/.ssh/id_ed25519.pub
6 The key fingerprint is:
7 SHA256:
n4ounQd6v9uWXAtMyyq7CdncMsh1Zuac5jesWXrndeA
test1@rex
8 The key’s randomart image is:
9 +--[ED25519 256]--+
10 | |
11 | |
12 | |
13 | . |
14 | . S+ . . |
15 | . *.O o=... . |
16 | =o=o*+++ .E .|
17 | o.=Bo*O o. . |
18 | +**XB.+. |
19 +----[SHA256]-----+
Our key pair has been generated. The private key
is stored in /home/username/.ssh/id\_ed25519 and
the public key is stored in /home/username/.ssh/id
\_ed25519.pub.
Make sure to never share your private key with
anyone. Ideally, you dont even need to see the
private key yourself. You should only share the
public key with the server you want to login to.
3.2 SSH 173
3.2.6 Sharing your public key
ssh-copy-id
Finally, to share the public key with the server, there
are usually multiple ways. If the server allows you
to login using a password, you can simply use the
ssh-copy-id command. This command will take
your username and password to login to the server,
and then copy the public key which you provide to
the server.
1 $ ssh-copy-id -i /key/to/public/key
username@ipaddress
Remark 3.2.2 The -i flag is used to specify the
path to the public key. You can drop the .pub
from the path as well (making it the path to the
private key), since ssh-copy-id will automati-
cally look for the public key. However, this flag
is not required if you are using the default lo-
cation. This is why using the default location
is recommended for beginners. The simplified
syntax then becomes
1 $ ssh-copy-id username@ipaddress
The same applies for logging into the server
using the ssh command.
manual install
However, most servers do not allow password login
at all, since it defeats the purpose of using a public-
private key pair. In such cases, you need to somehow
copy the public key to the server.
If you have physical access to the server, you can
simply copy the public key to the server in the
~/.ssh/authorized\_keys file of the server.
174 3 Networking and SSH
1 $ file ~/someoneskey.pub
2 ~/someoneskey.pub: OpenSSH ED25519 public key
3 $ cat ~/someoneskey.pub >> ~/.ssh/
authorized_keys
Remark 3.2.3 Make sure to use the >> operator
and not the > operator. The >> operator appends
the contents of the file to the end of the file,
while the > operator overwrites the file, we do
not want that.
System Commands Course
However, in case of our course, you do not have
access to the server as well. To submit your public
key, you have to login into the website https://
se2001.ds.study.iitm.ac.in/passwordless us-
ing your institute credentials, and then submit your
public key in the form provided.
You can print out the contents of the public key
using the cat command and copy the contents into
the form.
1 [test1@rex ~]$ cat .ssh/id_ed25519.pub
2 ssh-ed25519
AAAAC3NzaC1lZDI1NTE5AAAAIDxh5EuvzQkGvsqlMQW3rOkY
+wyo+2d6Y5CSqNGlLs2a test1@rex
You should copy the entire contents of the file,
including your username and hostname.
3.2.7 How to login to a remote server
You can then login into the server using the ssh
command.
1 $ ssh rollnumber@se2001.ds.study.iitm.ac.in
3.2 SSH 175
OR, if not using the default location of key
1 $ ssh -i /path/to/private/key
rollnumber@se2001.ds.study.iitm.ac.in
If successful, you will be logged into the server and
the prompt will change to the server’s prompt.
1 [test1@rex ~]$ ssh 29f1001234@se2001.ds.study.
iitm.ac.in
2 Last login: Mon Jun 3 07:43:22 2024 from
192.168.2.3
3 29f1001234@se2001:~$
Notice that the prompt has changed from test1@rex
which was the prompt of your local machine,
to rollnumber@se2001 which is the prompt of the
server.
3.2.8 Call an exorcist, there’s a daemon
in my computer
What is sshd? It is a daemon.
Definition 3.2.1 (Daemon) In multitasking com-
puter operating systems, a daemon is a com-
puter program that runs as a background pro-
cess, rather than being under the direct control
of an interactive user.
There are many daemons running in your computer.
You can use systemctl status to see the loaded and
active daemons in your computer.
1 $ systemctl status
2 * rex
3 State: running
176 3 Networking and SSH
4 Units: 419 loaded (incl. loaded aliases)
5 Jobs: 0 queued
6 Failed: 0 units
7 Since: Thu 2024-06-13 12:55:42 IST; 7h ago
8 systemd: 255.6-1-arch
9 CGroup: /
10 |-init.scope
11 | ‘-1 /usr/lib/systemd/systemd --
switched-root --system --deserialize=43
12 |-system.slice
13 | |-NetworkManager.service
14 | | ‘-547 /usr/bin/NetworkManager
--no-daemon
15 | |-adb.service
16 | | ‘-558 adb -L tcp:5037 fork-
server server --reply-fd 4
17 | |-avahi-daemon.service
18 | | |-550 "avahi-daemon: running [
rex.local]"
19 | | ‘-557 "avahi-daemon: chroot
helper"
20 | |-cronie.service
21 | | ‘-621 /usr/sbin/crond -n
22 | |-cups.service
23 | | ‘-629 /usr/bin/cupsd -l
24 | |-dbus-broker.service
25 | | |-545 /usr/bin/dbus-broker-
launch --scope system --audit
Here you can see some of the important daemons
running, such as NetworkManager which is used to
manage the network connections, cronie which is
used to run scheduled tasks, cups which is used to
manage printers, etc.
sshd
sshd is the daemon that runs on the server and
listens to any incoming SSH connections. It is the
daemon that lets you login into the server using the
SSH protocol.
3.2 SSH 177
Your own system might not be running the sshd
daemon, since you are not running a server. How-
ever, you can check if the sshd daemon is running
using the systemctl command.
1 $ systemctl status sshd
2 * sshd.service - OpenSSH Daemon
3 Loaded: loaded (/usr/lib/systemd/system/
sshd.service; disabled; preset: disabled)
4 Active: inactive (dead)
Here you can see that the sshd daemon is currently
inactive. This is because I am not running a server
and don’t usually login remotely to my system.
However, the output of the same command would
be something like as shown below if it is enabled
on your system.
1 $ systemctl status sshd
2 * sshd.service - OpenSSH Daemon
3 Loaded: loaded (/usr/lib/systemd/system/
sshd.service; disabled; preset: disabled)
4 Active: active (running) since Thu
2024-06-13 19:48:44 IST; 12min ago
5 Main PID: 3583344 (sshd)
6 Tasks: 1 (limit: 9287)
7 Memory: 2.1M (peak: 2.3M)
8 CPU: 8ms
9 CGroup: /system.slice/sshd.service
10 ‘-3583344 "sshd: /usr/bin/sshd -D
[listener] 0 of 10-100 startups"
11
12 Jun 13 19:48:44 rex systemd[1]: Started
OpenSSH Daemon.
13 Jun 13 19:48:45 rex sshd[3583344]: Server
listening on 0.0.0.0 port 22.
14 Jun 13 19:48:45 rex sshd[3583344]: Server
listening on :: port 22.
If we run the same command on the server, we
can see that it is running. However, we wont be
178 3 Networking and SSH
able to read the logs of the server, since we are not
authorized.
I have set the LC_ALL envi-
ronment variable to the lo- 1 $ ssh username@se2001.ds.study.iitm.ac.in
cale C while generating the
2 username@se2001:~$ systemctl status sshd
above outputs to prevent la-
3 * ssh.service - OpenBSD Secure Shell server
tex errors. Ideally if you run
the command, you will see a 4 Loaded: loaded (/lib/systemd/system/ssh.
prettier unicode output. service; enabled; vendor preset: enabled)
5 Active: active (running) since Thu
2024-06-13 12:32:47 UTC; 1h 57min ago
6 Docs: man:sshd(8)
7 man:sshd_config(5)
8 Process: 732 ExecStartPre=/usr/sbin/sshd -
t (code=exited, status=0/SUCCESS)
9 Main PID: 745 (sshd)
10 Tasks: 1 (limit: 4557)
11 Memory: 22.0M
12 CPU: 8.769s
13 CGroup: /system.slice/ssh.service
14 ‘-745 "sshd: /usr/sbin/sshd -D [
listener] 0 of 10-100 startups"
15
16 Warning: some journal files were not opened
due to insufficient permissions.
Remark 3.2.4 Notice that there are some dif-
ferences in the output when run from my local
system and from the system commands server.
Such as, the name of the service is ssh on the
server, while it is sshd on my local system. Also
the full name is OpenBSD Secure Shell server
on the server, while it is OpenSSH Daemon on my
local system. The path of the service file is also
different. This is because the server is running
a ubuntu distribution whereas my local system
runs an arch distribution. They have different
packages for ssh, and hence the differences.
3.2 SSH 179
3.2.9 SCP
scp is a command used to copy files between remote
servers. It uses the ssh protocol to copy files in an
encrypted manner over the network.
The syntax of the scp command is similar to the cp
command.
1 $ scp username@ipaddress:/path/to/file /path/
to/destination
This will copy a file from the remote server to your
local machine.
1 $ scp /path/to/file username@ipaddress:/path/
to/destination
This will copy a file from your local machine to the
remote server.
Process Management 4
4.1 What is sleep?
sleep is a command that is used to delay the exe-
cution of a process for a specified amount of time.
sleep itself is a no-op command, ‗ but it takes a vari-
able amount of time to execute, depending on the
argument of the command. This is useful when you
want to delay the execution of another command or
chain of commands by a certain amount of time.
4.1.1 Example
1 $ sleep 5
2 $ echo "Hello, World!"
3 Hello, World!
4.1.2 Scripting with sleep
If you run the above snippet, you will see that
the output is delayed by 5 seconds. Moreover, the
prompt itself will not be available for 5 seconds, as
the shell is busy with executing the sleep command.
To run the entire snippet as one process, simply put
the two commands on separate lines of a file (say,
hello.sh), and run the file as a script.
1 $ cat hello.sh
2 sleep 5
3 echo "Hello, World!"
4 $ bash hello.sh
5 Hello, World!
‗ NO-OP stands for No Operation. It is a command that does
nothing. More reading on NO-OP can be found here.
182 4 Process Management
We will be using sleep in the examples throughout
this chapter to demonstrate process management
since it is a simple command that can be used
to quickly spawn an idempotent process for any
arbitrary amount of time.
4.1.3 Syntax and Synopsis
1 sleep NUMBER[SUFFIX]...
Here the NUMBER is the amount of time to sleep. The
SUFFIX can be s for seconds, m for minutes, h for
hours, and d for days.
4.2 Different ways of running a process 183
4.2 Different ways of running a
process
4.2.1 What are processes?
Definition 4.2.1 (Process) A process is an in-
stance of a program that is being executed. It
contains the program code and its current ac-
tivity. Depending on the operating system (OS),
a process may be made up of multiple threads
of execution that execute instructions concur-
rently. Several processes may be associated with
the same program; for example, opening up
several instances of the same program often
means more than one process is being executed.
Each process has its own ‘process id‘ or PID to
uniquely identify it.
Whenever we run an application, or even a com-
mand on the linux shell, it spawns a process. Pro- 1: Other than the very first
cesses are always created by an already existing process, which is always the
process 1 This creates a tree-like structure of pro- init process. In most distri-
butions, this is done by sys-
cesses, where each process has a parent process and temd, which is an init system
can have multiple child processes. When the parent that does a lot of other things
of a process dies, the child processes are adopted by as well. You can learn more
about systemd and what all
the init process. init is thus the root of the process
it does here.
tree.
Figure 4.1: Example of a pro-
cess tree
184 4 Process Management
4.2.2 Process Creation
In linux systems, processes are managed by the ker-
nel. The kernel is responsible for creating, schedul-
ing, and destroying processes. The user can interact
with the kernel using system calls to create, manage,
and destroy processes. Creating processes is simple,
and can be done using the fork() system call. This
is used when any process wants to create a new
process.
To simply create a new process for a command,
we can simply type in the command and press
enter. This will not only fork a new process from
the terminal or terminal emulator as the parent
process, but also tie the standard input, standard
2: Standard Input, Output, output, and standard error of the child process to
and Error are the default the terminal or terminal emulator. 2
streams that are used by the
shell to interact with the user. 1 $ sleep 5
Standard Input is used to
take input from the user, This will create a new process that will sleep for 5
Standard Output is used to
display output to the user, seconds.
and Standard Error is used
to display errors to the user. Remember that each process has a unique process
We will cover these in details id (PID). Each process also has a parent process id
in the next chapter.
(PPID), which is the PID of the parent process. If
a process is created by the shell, the shell will be
the parent process. If the shell’s process is killed,
the child process will also be killed, as the child
process is owned by the shell.
4.2.3 Process Ownership
If you are using a linux operating system with a
GUI server (X or Wayland), try the following to
understand how process ownership works.
4.2 Different ways of running a process 185
Open two terminals, in the first one, run echo \$\$
to see the process ID of that shell. It should print
out a random string of digits, that is the PID of the
shell. Then run a GUI application, such as firefox.
3
This will block your terminal and open a new 3: Make sure you are run-
window of firefox. ning something that is not
running already.
1 $ echo $$
2 2277503
3 $ firefox
Now in the other terminal, which is free, run pgrep
4: or whatever was your
firefox 4 It should print out another random string process’s name
of digits, it is the PID of firefox.
Now you can use the following command to find
the parent process’s process ID (PPID) to verify it is
the same as the output of $$ in the first terminal.
1 $ pgrep firefox
2 2278276
3 $ ps -efj | awk ’$2==2278276;NR==1’
4 UID PID PPID PGID SID C
STIME TTY TIME CMD
5 sayan 2278276 2277503 2278276 2277503 12
16:59 pts/5 00:00:03 /usr/lib/firefox/
firefox
Here we can see that the PPID of firefox is the PID
of the shell.
Note that the second command should put the
PID of firefox, which we got from the previous
command. This can also be done in a single com-
mand which you can directly copy and paste in
your terminal.
1 $ ps -efj | awk "\$2==$(pgrep firefox);NR==1"
2 UID PID PPID PGID SID C
STIME TTY TIME CMD
186 4 Process Management
3 sayan 2278276 2277503 2278276 2277503 1
16:59 pts/5 00:00:04 /usr/lib/firefox/
firefox
Now, what happens if we kill the parent process?
To kill a process all we need to use is use the kill
command with the PID of the process.
1 $ kill -9 2277503
The -9 flag is used to send
a SIGKILL signal to the pro- If you have been following along, you will see that
cess. This signal is used to both the terminal and firefox dissapear from your
kill a process immediately.
We will cover signals later.
screen. You will also notice that if you run the same
command to print the PID and PPID of firefox,
it does not show anything. This is because the
process is killed and the process tree is destroyed,
so even firefox, being the child of the shell process,
is killed.
4.2.4 Don’t kill my children
However, there are also ways to create a new pro-
cess in the background. The easiest way to do this
is to append an ampersand (&) to the end of the
command. This is a shell syntax that tells the shell
to fork the command as a child process and run it in
the background. What this means is the shell will
not wait for the command to finish, and will return
the prompt to the user immediately. However, the
standard output and standard error may still be
tied to the terminal or terminal emulator. So if the
process writes something to the standard output or
standard error, it will be displayed on the terminal.
Furthermore, the process is still owned by the shell,
and if the shell is killed, the process’s parent will
be changed to the init process.
Lets try the same exercise as earlier, but now with
the & at the end.
4.2 Different ways of running a process 187
Open two terminals, and in the first one, execute
the following command.
1 $ echo $$
2 2400520
3 $ firefox &
4 [1] 2401297
5 $ echo "hello"
6 hello
7 $
8 ATTENTION: default value of option
mesa_glthread overridden by environment.
9 $
You can observe that the firefox window opens up
similar to last time, but now the prompt returns
immediately. You can also see that the output of the
echo command is displayed on the terminal.
If you try to perform some operations in the browser,
it may also print some messages to the terminal
screen, even though it is not waiting for the com-
mand to finish. The "ATTENTION" message is an
example of this.
Also observe that as soon as we launched firefox,
it printed out two numbers, [1] and 2401297. The
number in the square brackets is the job id of the
process, and the number after that is the PID of the
process. So now we dont even need to use pgrep to
find the PID of the process.
Now in the other terminal, run the following com-
mand.
1 $ ps -efj | awk "\$2==$(pgrep firefox);NR==1"
2 UID PID PPID PGID SID C
STIME TTY TIME CMD
3 sayan 2401297 2400520 2401297 2400520 3
17:13 pts/5 00:00:08 /usr/lib/firefox/
firefox
188 4 Process Management
Still we can see that the PPID of firefox is the PID
of the shell.
Now, if we kill the parent process, the child process
will be adopted by the init process, and will continue
to run.
1 $ kill -9 2400520
If you re-run the command to print the PID and
PPID of firefox, you will see that the PPID of fire-
fox is now set to 1, which is the PID of the init
command.
1 $ ps -efj | awk "\$2==$(pgrep firefox);NR==1"
2 UID PID PPID PGID SID C
STIME TTY TIME CMD
3 sayan 2401297 1 2401297 2400520 3
17:13 ? 00:00:09 /usr/lib/firefox/
firefox
You can also see that the TTY column is now set to
?, which means that the process is no longer tied to
the terminal.
However, if instead of killing the parent process
using the SIGKILL signal, if you sent the SIGHUP
signal to the parent, the child process will still be
terminated, as it will propagate the hangup signal
to the child process.
4.2.5 Setsid
So how do we start a process directly in a way that
it is not tied to the terminal? Many times we would
require to start a process in the background to run
asynchronously, but not always do we want to see
the output of the process in the terminal from where
we launched it. We may also want the process to be
owned by the init process from the get go.
4.2 Different ways of running a process 189
To do this, we can use the setsid command. This
command is used to run a command in a new
session. This will create a new process group and
set the PPID of the process to the init process. The
TTY will also be set to ?.
Lets try the same exercise with the setsid com-
mand. Open two terminals, in one of them, run the
following command.
1 $ echo $$
2 2453741
3 $ setsid -f firefox
4 $
Observe that firefox will open up, but the prompt
will return immediately.
In another terminal, run the following command.
1 $ ps -efj | awk "\$2==$(pgrep firefox);NR==1"
2 UID PID PPID PGID SID C
STIME TTY TIME CMD
3 sayan 2454452 1 2454452 2454452 2
17:19 ? 00:00:07 /usr/lib/firefox/
firefox
Observe that even without killing the parent pro-
cess, the PPID of firefox is already set to 1, which is
the PID of the init process. So the process will not
be killed if the shell is killed.
This is called a hang-up signal. We can still artifi-
cially send the SIGHUP signal, which tells firefox
that its parent has stopped by using the kill -1
command.
1 $ kill -1 2454452
This will still close firefox, even though the parent
process (init) didn’t actually get killed.
190 4 Process Management
4.2.6 Nohup
If you do not want to give up the ownership of
a child process, and also don’t really need to get
the prompt back, but you do not want to see the
output of the command in your terminal. You can
use the nohup command followed by the command
you want to run. It will still be tied to the terminal,
and you can use Ctrl+C to stop it, Ctrl+Z to pause it,
etc. The prompt will also be blocked till the process
runs. However, the input given to the terminal will
not be sent to the process, and the output of the
process will not be shown on the terminal. Instead,
the output will be saved in a file named nohup.out
in the current directory.
However, this is different from simply running the
command with a redirection operator (>) at the
5: We will cover redirection
end, 5 because the nohup command also makes the
operators in the next chapter.
process immune to the hang-up signal.
Exercise 4.2.1 Try the same exercise as before,
but this time use the nohup to run firefox, then
in another terminal, find the PID and PPID of
firefox. Then try to kill the parent process and
see if firefox dies or not.
4.2.7 coproc
The coproc command is used to run a command
in the background and tie the standard input and
standard output of the command to a file descriptor.
This is useful when you want to run a command
in the background, but still want to interact with it
using the shell. This creates a two way pipe between
the shell and the command.
4.2 Different ways of running a process 191
Syntax
1 $ coproc [NAME] command [redirections]
This creates a coprocess named NAME and runs the
command in the background. If the NAME is not
provided, the default name is COPROC.
However, the recommended way to use coproc is to
use it in a subshell, so that the file descriptors are
automatically closed when the subshell exits.
1 $ coproc [NAME] { command; }
coproc can execute simple commands or compound
commands. For simple commands, name is not
possible to be specified. Compound commands like
loops or conditionals can be executed using coproc
in a subshell.
The name that is set becomes a array variable in the
shell, and can be used to access the file descriptors
for stdin, stdout, and stderr.
For example, to provide input to the command, you
can use echo and redirection operators to write to
the file descriptor.
Similarly you can use the read command to read
from the file descriptor.
1 $ coproc BC { bc -l; }
2 $ jobs
3 [1]+ Running coproc BC { bc -
l; } &
4 $ echo 22/7 >&"${BC[1]}"
5 $ read output <&"${BC[0]}"
6 $ echo $output
7 3.14285714285714285714
This uses concepts from redirection and shell vari-
ables, which we will cover in later weeks.
192 4 Process Management
4.2.8 at and cron
Processes can also be scheduled to be launched at a
later time. This is usually done using cron or the at
command. We will cover these in depth later.
4.2.9 GNU parallel
GNU parallel is a shell tool for executing jobs in
parallel using one or more computers. A job can be
a single command or a small script that has to be
run for each of the lines in the input. The typical
input is a list of files, a list of hosts, a list of users,
a list of URLs, or a list of tables. A job can also be
a command that reads from a pipe. GNU parallel
can then split the input into blocks and pipe a block
into each command in parallel.
4.2.10 systemd services
Finally, the best way to run a background process
or a daemon is to use systemd services. systemd
is an init system that is used by most modern
linux distributions. You can create a service file
declaring the process, the command line arguments,
the environment variables, and the user and group
that the process should run as. You can also specify
if the process should be restarted if it crashes, or if
it should be started at boot time.
4.3 Process Management 193
4.3 Process Management
4.3.1 Disown
Disown is a shell builtin command that is used to
remove a job from the shell’s job table. This is useful
when you have started a process in the background
and you want to remove it from the shell’s job table,
so that it is not killed when the shell is killed. What
it means is that if the parent process recieves a hang-
up signal, it will not propagate it to the child job if
it is removed from the job table. This is applicable
only for processes started from a shell.
Open two terminals, in one, open firefox in back-
ground using the &
1 $ firefox &
2 $
and then in the other terminal, run the following
command.
1 $ ps -efj | awk "\$2==$(pgrep firefox);NR==1"
2 UID PID PPID PGID SID C
STIME TTY TIME CMD
3 sayan 3216429 3215856 3216429 3215856 69
18:45 pts/5 00:00:02 /usr/lib/firefox/
firefox
4 $ kill -1 3215856
Observe that firefox will close, even though it was
running in the background. This is because the
shell will propagate the hang-up signal to the child
process. If the parent shell was forcefully killed
using the SIGKILL signal, then it wont have the
opportunity to propagate the hang-up signal to
the child process. This is a separate process than
the natural killing of firefox running in foreground
even when shell is killed with SIGKILL signal.
194 4 Process Management
Now, to fix this, we can simply run the disown
command in the terminal where we started the
firefox process.
Again open a terminal emulator and run the follow-
ing command.
1 $ firefox &
2 $ disown
3 $
Now, in the other terminal, run the following com-
mand.
1 $ ps -efj | awk "\$2==$(pgrep firefox);NR==1"
2 UID PID PPID PGID SID C
STIME TTY TIME CMD
3 sayan 3216429 3215856 3216429 3215856 69
18:45 pts/5 00:00:02 /usr/lib/firefox/
firefox
4 $ kill -1 3215856
5 $ ps -efj | awk "\$2==$(pgrep firefox);NR==1"
6 UID PID PPID PGID SID C
STIME TTY TIME CMD
7 sayan 3216429 1 3216429 3215856 10
18:45 ? 00:00:03 /usr/lib/firefox/
firefox
Firefox does not close anymore, even when the
parent process is hanged up.
4.3.2 Jobs
To list the jobs that are running in a shell, you can
use the jobs command.
1 $ firefox &
2 $ sleep 50 &
3 $ jobs
4 [1]- Running firefox &
5 [2]+ Running sleep 50 &
4.3 Process Management 195
Here + denotes the current job, and - denotes the
previous job. The first column is the job number, it
can also be used to refer to the job inside that same
shell. The process ID of a process can be used to
refer to the process from anywhere, but the job ID
is only valid in the shell where it is created.
The process id can be listed using the jobs -l com-
mand.
1 $ jobs -l
2 [1]- 3303198 Running firefox &
3 [2]+ 3304382 Running sleep 50
&
Using disown removes the job from this table. We
can selectively remove only some jobs from the
table as well.
1 $ jobs
2 [1]- Running firefox &
3 [2]+ Running sleep 50 &
4 disown %1
5 $ jobs
6 [2]+ Running sleep 50 &
Whereas using disown -a will remove all jobs from
the table. disown -r will remove only running jobs
from the table.
If you dont really want to lose the job from the table,
but you want to prevent it from being killed when
the shell is killed, you can use disown -h to mark
the jobs to be ignored by the hang-up signal. It will
have same effect as last exercise, but it will still be
present in the output of the jobs command.
4.3.3 Suspending and Resuming Jobs
Sometimes you may want to pause a job and re-
sume it later. This is supported directly by the linux
196 4 Process Management
kernel. To pause any process you can send it the
6: The difference between SIGSTOP or SIGTSTP signal. 6 This can be done
SIGSTOP and SIGTSTP is using the same kill command. The signal number
that SIGSTOP is a signal
that cannot be caught or ig-
for SIGSTOP is 19, and for SIGTSTP is 20.
nored by the process, so the
To resume the process, you can send it the SIG-
process will be paused imme-
diately. SIGTSTP is a signal CONT signal. The signal number for SIGCONT is
that can be caught or ignored 18.
by the process, so the process
can do some cleanup before
pausing. The default action Exercise 4.3.1 Try to pause a job using the
of SIGTSTP is to pause the SIGSTOP signal, then resume it using the SIG-
process. CONT signal. Open firefox from a terminal us-
ing firefox \& and note the PID, then pause it
using kill -19 <PID>, try to click on the firefox
window, and see if it responds. Then resume it
using kill -18 <PID>. Does the firefox window
respond now?
If you start a command from the shell without using
the & operator, you can pause the command using
Ctrl+Z and resume it using the fg command. This
sends the same signals as above, and uses the shell’s
job table to keep track of the jobs.
Remark 4.3.1 Just like disown, the fg command
can also take the job number as an argument to
bring that job to the foreground. The default job
is the current job. (Marked with a + in the jobs
command))
You can also use the bg command to resume a job,
but in the background. This has same effect as using
the & operator at the end of the command.
Remark 4.3.2 Since the disown, fg, and bg com-
mands work on the shell’s job table, they are
4.3 Process Management 197
shell builtins, and not a executable binary. You
can verify this using the type command.
You cannot perform job control on a process that is
not started from the shell, or if you have disowned
the process.
4.3.4 Killing Processes
We have been using the kill command to send
signals to processes. The kill command is a shell
builtin and an executable command that is used to
send signals to processes. The default signal that
is sent is the SIGTERM signal, which is signal
number 15. The kill command is the user-land way
to communicate with the kernel that some process
needs to be given a certain signal.
Syntax
1 $ kill [-signal|-s signal] PID|name...
Let us also briefly discuss what the synopsis of the
command means, and how to interpret it.
The first word is the name of the command, which is
kill in this case. The argument signal inside square
brackets means that it is optional. The argument
PID is the process ID of the process that you want to
kill. The argument name is the name of the process
that you want to kill. The pipe between PID and
name means that you can provide either the PID
or the name of the process. The ellipsis (...) after
PID|name means that you can provide as many
PIDs or names as you want.
Remark 4.3.3 As mentioned, kill is also a shell
builtin command. This means that the synopsis
198 4 Process Management
seen in the man page of kill is not the same as
the synopsis of the builtin. The bash builtin of
kill does not support providing names of the
processes, only the PIDs. Hence if you want to
kill a process by its name, you will either have to
use the path of the kill binary, or use the pkill
command.
So for example, we can run kill in the following
manners.
1 $ kill 2452
2 $ kill -9 2452
3 $ kill -9 2452 62
4 $ kill -SIGKILL 2525
5 $ kill -SIGKILL 2525 732
The kill command can also be used to send other
non-terminating signals to the process. For example,
the SIGSTP signal can be used to pause a process,
and the SIGCONT signal can be used to resume a
process. Similarly, there are some undefined signals
that can be used to send custom signals to the
process.
SIGUSR1 and SIGUSR2 are signals that do not
have any predefined behaviour, and can be used
by the user to send custom signals to the process.
The behaviour of the process on receipt of these
signals is decided by the process and told to the
user by the process documentation. The user can
then send these signals to the process using the kill
command. This helps user interact with processes
that are not running directly in the foreground of a
shell.
Processes can also trap signals, which means that
they can catch a signal and run a custom handler
function. This is useful when you want to do some
cleanup before the process is killed. This can also be
4.3 Process Management 199
used to totally change how the process behaves on
receipt of a signal. However, to prevent malicious
code from running, the SIGKILL signal cannot
be trapped, and the process will be killed imme-
diately. Similarly the SIGSTOP signal, which is
similar in definition to the SIGSTP signal, cannot
be trapped.
To see the list of the signals, we can run kill -l.
1 $ kill -l
2 1) SIGHUP 2) SIGINT 3) SIGQUIT
4) SIGILL 5) SIGTRAP
3 6) SIGABRT 7) SIGBUS 8) SIGFPE
9) SIGKILL 10) SIGUSR1
4 11) SIGSEGV 12) SIGUSR2 13) SIGPIPE
14) SIGALRM 15) SIGTERM
5 16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT
19) SIGSTOP 20) SIGTSTP
6 21) SIGTTIN 22) SIGTTOU 23) SIGURG
24) SIGXCPU 25) SIGXFSZ
7 26) SIGVTALRM 27) SIGPROF 28) SIGWINCH
29) SIGIO 30) SIGPWR
8 31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1
36) SIGRTMIN+2 37) SIGRTMIN+3
9 38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6
41) SIGRTMIN+7 42) SIGRTMIN+8
10 43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN
+11 46) SIGRTMIN+12 47) SIGRTMIN+13
11 48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX
-14 51) SIGRTMAX-13 52) SIGRTMAX-12
12 53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9
56) SIGRTMAX-8 57) SIGRTMAX-7
13 58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4
61) SIGRTMAX-3 62) SIGRTMAX-2
Some of the important signals are:
▶ SIGHUP - Hangup signal. This is sent to a
process when the terminal is closed. This is
used to tell the process that the terminal is no
200 4 Process Management
longer available.
▶ SIGINT - Interrupt signal. This is sent to a
process when the user presses Ctrl+C. This
is used to tell the process to stop what it is
doing and exit.
▶ SIGKILL - Kill signal. This is used to kill a
process immediately. This signal cannot be
caught or ignored by the process.
▶ SIGTERM - Terminate signal. This is used to
tell the process to exit gracefully. The process
can catch this signal and do some cleanup
before exiting. This is the default signal sent
by the kill command.
▶ SIGSTP - Stop signal. This is used to pause
a process. This is sent when the user presses
Ctrl+Z. This signal can be caught or ignored
by the process.
▶ SIGSTOP - Stop signal. This is used to pause
a process. This signal cannot be caught or
ignored by the process.
▶ SIGCONT - Continue signal. This is used to
resume a process that has been paused using
the SIGSTOP signal. This is sent when the
user presses fg or bg.
▶ SIGUSR1 - User defined signal 1. This is a
signal that can be used by the user to send a
custom signal to the process.
▶ SIGUSR2 - User defined signal 2. This is a
signal that can be used by the user to send a
custom signal to the process.
▶ SIGCHLD - Child signal. This is sent to the
parent process when a child process exits.
This is used to tell the parent process that the
child process has exited.
▶ SIGSEGV - Segmentation fault signal. This
is sent to a process when it tries to access
memory that it is not allowed to access.
▶ SIGPIPE - Pipe signal. This is sent to a process
4.3 Process Management 201
when it tries to write to a pipe that has been
closed.
202 4 Process Management
4.4 Finding Processes
As we saw, managing a process is easy once we
know the PID of the process. However, it is not
always easy to find the PID of a process. To do this,
there are multiple tools in linux that can be used.
4.4.1 pgrep
The pgrep command is used to find the PID of a
process based on its name. It can take the name of
the process as an argument, and will print the PID of
the processes that match the name. The search can
7: We will discuss regex pat- also be a regex pattern. 7 We have already seen how
terns in the next chapter.
pgrep can be used to find the PID of a process.
1 $ pgrep firefox
2 526272
We can also use it for any process run from the
terminal.
1 $ sleep 50 &
2 $ sleep 20 &
3 $ sleep 10 &
4 $ pgrep sleep
5 98963
6 99332
7 99526
4.4.2 pkill
Similarly, we also have the pkill command, which
is used to kill a process based on its name. It can
take the name of the process as an argument, and
will send the SIGTERM signal to the processes that
match the name. Other signals can also be sent using
the -signal flag, similar to the kill command.
4.4 Finding Processes 203
1 $ pkill firefox
4.4.3 pidwait
The pidwait command is used to wait for a process
to exit. It searches for the process using its name,
a part of its name, or any regex matching its name,
and waits for the process to exit.
Exercise 4.4.1 Open firefox from a terminal in
the background, then use the pidwait to wait
till the process exits. After some time, close the
firefox window and observe the terminal.
204 4 Process Management
4.5 Listing Processes
Sometimes we may not even know the name of the
process, and we may want to list all the processes
running on the system. This can be done using the
many commands.
4.5.1 ps
8: It exists in the Unix V7 ps is an ancient command 8 that is used to list the
manual, which was released
in 1979. It has BSD-like op-
processes running on the system.
tions, GNU-like options, and
There are a lot of options and flags that can be used
System V-like options.
with the ps command. The flags are of multiple
types, and some flags perform the same function
but are named differently. This is because the ps
command has been around for a long time, and
has been implemented in multiple ways in different
systems. There are also different formats in which
the output can be displayed.
The most common flags used with the ps command
are:
▶ ps - This will get a snapshot of the processes
owned by the user tied to the TTY.
▶ ps -e - This will show all the processes.
▶ ps -f - This will show full format listing.
▶ ps -l - This will show long format listing.
▶ ps u - This will show user-oriented format
listing.
▶ ps x - This will show processes without con-
trolling terminals.
▶ ps -A - This will show all processes.
▶ ps aux - This is a common command to see
all processes owned by all users with and
without TTY associations and showing the
user who owns them.
4.5 Listing Processes 205
▶ ps –forest - This will show the processes in a
tree form.
There are hundreds of flags that can be used with
the ps command.
Exercise 4.5.1 Try to use the ps command with
the flags mentioned above, and see the output
of the command.
4.5.2 pstree
The pstree command is used to display the pro-
cesses in a tree form. Although the ps command
can also display the processes in a tree form us-
ing the --forest flag, the pstree command is more
suited for this purpose. It has many features that
ps lacks, such as collapsing branches of identical
processes, better ASCII art, Unicode support, etc.
If the system and the terminal supports unicode,
the pstree command will automatically use VT100
box-drawing characters to make the tree look better.
We can still force it to use ASCII with the -A flag.
We can also disable the clubbing of identical pro-
cesses using the -c flag.
The pstree command optionally takes a PID as an
argument, and will display the tree rooted at that
PID. If no PID is provided, it will display the tree
rooted at the init process.
I can find the PID of the tmux server I am running
to develop this book using the pgrep command.
1 $ pgrep tmux
2 62957
206 4 Process Management
And then use the pstree command to display the
tree rooted at that PID.
1 $ pstree -A 62957
2 tmux: server-+-bash---nvim-+-nvim-+-node
---14*[{node}]
3 | | |-texlab
---9*[{texlab}]
4 | | |-xsel
5 | | ‘-9*[{nvim}]
6 | ‘-2*[{nvim}]
7 |-bash---watch.sh---entr
8 |-bash---zathura---8*[{zathura}]
9 |-3*[bash]
10 |-2*[bash---man---less]
11 |-bash---nvim-+-nvim-+-2*[node
---9*[{node}]]
12 | | ‘-{nvim}
13 | ‘-2*[{nvim}]
14 ‘-bash-+-pstree
15 ‘-xsel
This helps us easily find out which processes are
running under which process, and helps us under-
stand the process tree.
4.5.3 top
The top command is used to display the processes
that are running in real time. It is an interactive
command that displays the processes in a table
format, and updates the table every few seconds.
It also displays the CPU and memory usage of the
processes.
The top command is very useful when you want to
monitor the processes and the resources they are
using in real time. It is also useful when you want
to find out which process is using the most CPU or
memory.
4.5 Listing Processes 207
Since it is an interactive command, it has keyboard
shortcuts as well, along with runtime options that
can be used to change the behaviour of the com-
mand.
Note: The output of the top
1 $ top command is not static, and
it contains control characters
2 top - 15:44:55 up 40 min, 1 user, load
and ANSI escape codes to
average: 2.03, 1.37, 1.02
make the output beautiful
3 Tasks: 271 total, 1 running, 270 sleeping, and interactive. This is why
0 stopped, 0 zombie the output is not suitable for
4 %Cpu(s): 9.8 us, 4.9 sy, 0.0 ni, 85.4 id, use in scripts, and is only
0.0 wa, 0.0 hi, 0.0 si, 0.0 st meant for human consump-
5 MiB Mem : 7764.2 total, 604.5 free, tion. I have removed the con-
trol characters and ANSI es-
5663.2 used, 1919.6 buff/cache
cape codes from the output
6 MiB Swap: 20002.0 total, 19901.2 free, shown here.
100.8 used. 2101.0 avail Mem
7
8 PID USER PR NI VIRT RES SHR
S %CPU %MEM TIME+ COMMAND
9 631 sayan 20 0 1248084 81976 44580
S 18.2 1.0 1:39.82 Xorg
10 1072 sayan 20 0 3159356 229336 89816
S 9.1 2.9 0:44.01 spotify
11 1079 sayan 20 0 1123.5g 128252 65776
S 9.1 1.6 0:09.37 Discord
12 1 root 20 0 22076 12840 9476
S 0.0 0.2 0:03.62 systemd
13 2 root 20 0 0 0 0
S 0.0 0.0 0:00.00 kthreadd
14 3 root 20 0 0 0 0
S 0.0 0.0 0:00.00 pool_wo+
4.5.4 htop
The htop command is an interactive process viewer
for Unix systems. It is inspired from the top com-
mand, but has a lot more features such as scrolling
the process list, searching for processes, killing
processes, tree view of processes, etc.
208 4 Process Management
Exercise 4.5.2 Run (Install if not present) the
htop command and see the output. Notice how
the output is more interactive and colourful
than the top command. Run something heavy
in the background, and see how the CPU and
the memory usage changes in real time.
One such command to run to simulate heavy CPU
usage is to run the following command.
1 $ cat /dev/urandom | gzip > /dev/null &
This will compress the random data from /dev/
urandom and write it to /dev/null. This will use a lot
of CPU, and you can see the CPU usage spike up.
But this is a single core process, so the CPU usage
will be limited to only one core. However, the CPU
core used may keep changing.
Remark 4.5.1 The above command is a simple
way to generate CPU usage. It will not write any
data to disk and not eat any disk space. However,
the command should be typed carefully, since
if you forget to add the > /dev/null part, it will
write the compressed data to the terminal if -f is
given to gzip, and the terminal will be filled with
random data and get messed up. In this case
you can type tput reset to reset the terminal
after killing the process.
4.5.5 btop
The btop command is a terminal based graphi-
cal process viewer. It is inspired from the htop
command, but has a more graphical interface. It is
4.5 Listing Processes 209
written in python, and uses the blessings library to
draw the interface.
4.5.6 glances
The glances command is a cross-platform monitor-
ing tool that is used to monitor the system resources
in real time. It is written in python, and uses the
curses library to draw the interface.
210 4 Process Management
4.6 Exit Codes
Every process that runs in linux has an exit code
when it terminates. This is a number that is returned
by the process to the parent process when it exits.
This number is used to tell the parent process if the
process exited successfully or not. The exit code is
a number between 0 and 255, and is used to tell the
parent process the status of the child process.
The exit code of the last run process is stored in a
variable called $? in the shell. Successful processes
return 0, whereas unsuccessful processes return a
non-zero number. The exact return code is decided
by the process itself, and usually has some meaning
to the process. Some common exit codes are:
▶ 0 - Success
▶ 1 - General error
▶ 2 - Misuse of shell builtins
▶ 126 - Command invoked cannot execute
▶ 127 - Command not found
▶ 128 - Invalid argument to exit
▶ 128+n - Fatal error signal "n"
▶ 130 - Script terminated by Ctrl+C
▶ 137 - Process killed with SIGKILL
▶ 255 - Exit status out of range
Remark 4.6.1 Note that 130 = 128 + 2, and 2 is
the signal number for SIGINT, thus, the exit
code for a process that is terminated by Ctrl+C
is 130. Similarly, any other signal sent to the
process causing it to exit abnormally will have
an exit code of 128 + 𝑛 , where 𝑛 is the signal
number. Similarly 137 = 128 + 9, and 9 is the
signal number for SIGKILL.
4.6 Exit Codes 211
To return any exit status from your script, you can
use the exit command.
1 $ cat myscript.sh
2 #!/bin/bash
3 echo "hello"
4 exit 25
5 $ ./script.sh
6 bash: ./script.sh: No such file or directory
7 $ echo $?
8 127
9 $ ./myscript.sh
10 bash: ./myscript.sh: Permission denied
11 $ echo $?
12 126
13 $ chmod u+x myscript.sh
14 $ ./myscript.sh
15 hello
16 $ echo $?
17 25
The exit code is how shell constructs like if, while,
and until construct their conditions. If the exit code
is 0, then it is considered true, and if the exit code
is non-zero, then it is considered false.
1 $ if ./myscript.sh; then echo "success"; else
echo "failure $?"; fi
2 hello
3 failure 25
4 $ if ls /bin/bash; then echo "success"; else
echo "failure $?"; fi
5 /bin/bash
6 success
Streams, Redirections,
Piping 5
5.1 Multiple Commands in a
Single Line
Sometimes we may want to run multiple commands
in a single line. For example, we may want to run
two commands ls and wc in a single line. We can do
this by separating the commands with a semicolon.
This helps us see the output of both the commands
without having the prompt in between.
For example, the following command will run ls
and wc in a single line.
1 $ ls; wc .bashrc
2 docs down music pics programs scripts
tmp vids
3 340 1255 11238 .bashrc
In this way of executing commands, the success or
failure of one command does not affect the other
command. Concisely, the commands are executed
independently and sequentially. Even if a command
fails, the next command will be executed.
1 $ date; ls /nonexistant ; wc .bashrc
2 Wed Jul 3 06:54:45 PM IST 2024
3 ls: cannot access ’/nonexistant’: No such file
or directory
4 340 1255 11238 .bashrc
5.1.1 Conjunction and Disjunction
We can also run multiple commands in a single line
using conjunction and disjunction. The conjunction
214 5 Streams, Redirections, Piping
operator && is used to run the second command only
if the first command is successful. The disjunction
operator || is used to run the second command
only if the first command fails. In computer science,
these operators are also known as short-circuit
1: A short-circuit logical logical AND and OR operators. 1
AND operator returns true
if both the operands are true. 1 $ ls /nonexistant && echo "ls successful"
If the first operand is false, it 2 ls: cannot access ’/nonexistant’: No such file
does not evaluate the second or directory
operand and returns false di- 3 $ ls /home && echo "ls successful"
rectly.
4 lost+found sayan test1
5 ls successful
In the first command, the ls command fails, so
the echo command is not executed. In the second
command, the ls command is successful, so the
echo command is executed.
The success or failure of a command is determinted
by the exit status of the command. If the exit status
is 0, the command is successful. If the exit status
is non-zero, the command is considered to have
failed.
The exit status of the last command can be accessed
using the special variable $?. This variable contains
the exit status of the last command executed.
1 $ ls /nonexistant
2 ls: cannot access ’/nonexistant’: No such file
or directory
3 $ echo $?
4 2
Here, the exit status of the ls command is 2, because
the file /nonexistant does not exist.
1 $ ls /home
2 lost+found sayan test1
3 $ echo $?
4 0
5.1 Multiple Commands in a Single Line 215
Here, the exit status of the ls command is 0, because
the directory /home exists, and the command is
successful.
Similarly, we can use the disjunction operator || to
run the second command only if the first command
fails.
1 $ ls /nonexistant || echo "ls failed"
2 ls: cannot access ’/nonexistant’: No such file
or directory
3 ls failed
In this case, the ls command fails, so the echo com-
mand is executed. However, since the disjunction
operator is a short-circuit operator, the echo com-
mand is executed only if the ls command fails. If
the ls command is successful, the echo command
is not executed.
1 $ ls /home || echo "ls failed"
2 lost+found sayan test1
In this case, the ls command is successful, so the
echo command is not executed.
We can also chain multiple commands using con-
junction and disjunction.
1
2 $ date && ls /hello || echo "echo"
3 Thu Jul 4 06:53:08 AM IST 2024
4 ls: cannot access ’/hello’: No such file or
directory
5 echo
In this case, the date command is successful, so the
ls is executed. The ls command fails, so the echo
command is executed.
However, even if the first command fails, and the
second command is skipped, the third command is
executed.
216 5 Streams, Redirections, Piping
1 $ ls /hello && date || echo echo
2 ls: cannot access ’/hello’: No such file or
directory
3 echo
In this case, the ls command fails, so the date com-
mand is not executed. However, the echo command
is executed as the exit status of the ls command is
non-zero.
To make the echo command execute only if the date
command is run and fails, we can use parenthe-
ses.
1 $ ls /hello && (date || echo echo)
2 ls: cannot access ’/hello’: No such file or
directory
Although the parentheses look like grouping a
mathematical expression, they are actually a sub-
shell. The commands inside the parentheses are
executed in a subshell. The exit status of the sub-
shell is the exit status of the last command executed
in the subshell.
We can see the subshell count using the echo \$BASH
\_SUBSHELL.
1 $ echo $BASH_SUBSHELL
2 0
3 $ (echo $BASH_SUBSHELL)
4 1
5 $ (:;(echo $BASH_SUBSHELL ))
6 2
7 $ (:;(:;(echo $BASH_SUBSHELL )))
8 3
Remark 5.1.1 When nesting in more than one
subshell, simply putting two parentheses side by
side will not work. This is because the shell will
5.1 Multiple Commands in a Single Line 217
interpret that as the mathematical evaluation
of the expression inside the parentheses. To
avoid this, we can use a colon : no-op command
followed by a semicolon ; to separate the two
parentheses.
Remark 5.1.2 Setting up an environment takes
up time and resources. Thus, it is better to avoid
creating subshells unless necessary.
218 5 Streams, Redirections, Piping
5.2 Streams
There are three standard streams in Unix-like oper-
ating systems:
1. Standard Input (stdin): This is the stream
where the input is read from. By default, the
standard input is the keyboard.
2. Standard Output (stdout): This is the stream
where the output is written to. By default, the
standard output is the terminal.
3. Standard Error (stderr): This is the stream
where the error messages are written to. By
default, the standard error is the terminal.
There are also other numbered streams, such as 3,
4, etc., which can be used to read from or write to
files.
However, sometimes a process may need to take
input from a file or send output to a file. When this
is required, the standard stream is mapped to a file.
This is known as redirection.
To maintain which file or resource is mapped to
which stream, the operating system maintains a
Figure 5.1: Standard Streams table known as the file descriptor table. The file de-
scriptor table is a table that maps the file descriptors
to the files or resources.
Definition 5.2.1 (File Descriptor) A file descrip-
tor is a process-unique identifier that the operat-
ing system assigns to a file or resource. The file
descriptor is an integer that is used to identify
the file or resource.
The default file descriptors are:
1. Standard Input (stdin): File descriptor 0
5.2 Streams 219
2. Standard Output (stdout): File descriptor 1
3. Standard Error (stderr): File descriptor 2
In the traditional implementation of Unix, file de-
scriptors index into a per-process file descriptor
table maintained by the kernel, that in turn indexes
into a system-wide table of files opened by all pro-
cesses, called the file table. This table records the
mode with which the file (or other resource) has
been opened: for reading, writing, appending, and
possibly other modes. It also indexes into a third
2: We have covered inodes
table called the inode table 2 that describes the ac- and inode tables in detail in
tual underlying files. To perform input or output, the Chapter 1 chapter.
the process passes the file descriptor to the kernel
through a system call, and the kernel will access
the file on behalf of the process. The process does
not have direct access to the file or inode tables.
The file descriptors of a process can be inspected in
the /proc directory. The /proc directory is a pseudo-
filesystem that provides an interface to kernel data
Figure 5.2: File Descriptor Ta-
structures. The /proc/PID/fd directory contains the
ble
file descriptors of the process with the PID.
Most GNU core utilities that accept a file as an
argument will also work without the argument and
will read from the standard input. This behaviour
lets us chain commands together using pipes easily
without any explicit file handling.
1 $ cat
2 hello
3 hello
4 This command is repeating the input
5 This command is repeating the input
6 Press Ctrl+D to exit
7 Press Ctrl+D to exit
The cat command is usually used to print the con-
tents of one or more files. However, when no file
220 5 Streams, Redirections, Piping
is provided, it reads from the standard input. In
this case, the standard input is the keyboard. This
is very useful when we want to repeat the input or
when we want to read from the standard input.
Question 5.2.1 Can we also use cat to write to a
file?
Answer 5.2.1 Yes, we can use the cat command
to write to a file. When no file is provided, the
cat command reads from the standard input. So
the input will be printed back to the standard
output. However, if we can somehow change
the standard output to a file, then the input will
be written to the file.
5.3 Redirection
Redirection is the process of changing the standard
streams of a process. This is done by the shell before
the process is executed. The shell uses the <, >, and
2> operators to redirect the standard input, standard
output, and standard error streams of a process.
As the shell is responsible for redirection, the pro-
cess is not aware of the redirection. The process
reads from or writes to the file descriptor it is given
by the shell. The process does not know whether
the file descriptor is a file, a terminal, or a pipe.
However, there are ways for a process to guess
whether the file descriptor is a terminal or a file.
This is done by using the isatty() function. The
isatty() function returns 1 if the file descriptor is a
terminal, and 0 if the file descriptor is a file.
5.3 Redirection 221
5.3.1 Standard Output Redirection
The standard output of a process can be redirected
to a file using the > operator. Observe the following
example.
1 $ date > date.txt
2 $ cat date.txt
3 Thu Jul 4 07:37:27 AM IST 2024
Here, the output of the date command is redirected
to the file date.txt. The cat command is then used
to print the contents of the file date.txt.
The process did not print the output to the terminal.
Instead, the shell changed the standard output of
the process to the file date.txt.
We can also use redirection to create a file. If the file
does not exist, the shell will create the file. If the file
exists, the shell will truncate the file.
1 $ > empty.txt
This command creates an empty file empty.txt. The
> operator is used to redirect the output of the
command to the file empty.txt. Since there is no
output, the file is empty.
We can also use redirection along with the echo
command to create a file with some content.
1 $ echo "hello" > hello.txt
2 $ cat hello.txt
3 hello
Here, the output of the echo command is redirected
to the file hello.txt. The cat command is then used
to print the contents of the file hello.txt.
Recall that the cat command will read from the
standard input if no file is provided. We can use
222 5 Streams, Redirections, Piping
this to create a file with the input from the standard
input.
1 $ cat > file.txt
2 hello, this is input typed from the keyboard
3 I am typing more input
4 all this is being written to the file
5 to stop, press Ctrl+D
6 $ cat file.txt
7 hello, this is input typed from the keyboard
8 I am typing more input
9 all this is being written to the file
10 to stop, press Ctrl+D
It is important to note that the process of creating
the file if it does not exist, and truncating the file
if it exists, along with redirecting the output, is
done by the shell, not the process. All of this is
done before the process is executed. This creates an
interesting exemplar when using redirection with
the ls command.
Output contains output
1 $ ls
2 hello
3 $ ls > output.txt
4 $ cat output.txt
5 hello
6 output.txt
Since the file is created even before the process is
executed, the file is also a part of the output of the
ls command. This is why the file output.txt is also
printed by the cat command.
Output format changes
Another interesting example using the ls command
is when the directory contains more than one file.
The output of ls is usually formatted to fit the
terminal. However, when the output is redirected
5.3 Redirection 223
to a file, the output is not in a column format.
Instead, each file is printed on a new line.
1 $ ls
2 hello output.txt
3 $ ls > output.txt
4 $ cat output.txt
5 hello
6 output.txt
Observe that the output of the ls command is not
in a column when redirected to a file. But how does
ls know that the output is not a terminal? It uses
the file descriptor to guess whether the output is a
terminal or a file. If the file descriptor is a terminal,
then the output is formatted to fit the terminal. If
the file descriptor is a file, then the output is not
formatted.
However, we can always force the output to be
single-column using the -1 option.
1 $ ls -1
2 hello
3 output.txt
This behaviour is not limited to the ls command.
Most commands usually strip out the formatting
when the output is redirected to a file.
3: ANSI escape codes are
If you are using a terminal that supports ANSI es- special sequences of charac-
cape codes 3 , you can use the ls command with the ters that are used to control
the terminal. For example,
--color=auto option to get colored output. However,
to change the color of the
when the output is redirected to a file, the ANSI text, the ANSI escape code
escape codes are not printed. This is because the [31m is used. Similarly other
ANSI escape codes are not printable characters. ANSI escape codes are used
to change the text style, and
They are control characters that tell the terminal to the position and state of the
change the color of the text. cursor.
Output and Input from same file
224 5 Streams, Redirections, Piping
If you try to redirect the output of a command to
a file, and then also try to read from the same file,
you will get an empty file. This is because the file is
truncated before the command is executed.
1 $ echo "hello" > output.txt
2 $ cat output.txt
3 hello
4 $ cat output.txt > output.txt
5 $ cat output.txt
6 $
Remark 5.3.1 Although we can simply use >
to redirect the output to a file, the full syntax
is 1>. This is because the file descriptor for the
standard output is 1. The 1> is used to redirect
the standard output to a file. However, since the
standard output is the default output, we can
omit the 1 and use only >.
5.3.2 Standard Error Redirection
The standard error of a process can be redirected to
a file using the 2> operator. Observe the following
example.
1 $ ls /nonexistant 2> error.txt
2 $ cat error.txt
3 ls: cannot access ’/nonexistant’: No such file
or directory
Here, the error message of the ls command is
redirected to the file error.txt. The cat command
is then used to print the contents of the file error.
txt.
It is important to realise that each process has two
streams to output to, the standard output and the
5.3 Redirection 225
standard error. The standard output is usually used
to print the output of the process, while the standard
error is used to print the error messages.
This helps us differentiate between the output and
the error messages. Also, if the output of a process
is redirected to a file, the error will still be printed
to the terminal. This is because the standard error
is not redirected to the file. This makes debugging
easier, as the error messages are not lost.
1 $ ls -d /home /nonexistant > output.txt
2 ls: cannot access ’/nonexistant’: No such file
or directory
3 $ cat output.txt
4 /home
Here, the output of the ls command is redirected
to the file output.txt. However, the error message
is still printed to the terminal.
We can redirect both the standard output and the
standard error to files using the > and 2> opera-
tors.
1 $ ls -d /home /nonexistant > output.txt 2>
error.txt
2 $ cat output.txt
3 /home
4 $ cat error.txt
5 ls: cannot access ’/nonexistant’: No such file
or directory
Redirecting both streams to the same file
Lets try to redirect both the standard output and
the standard error to the same file.
1 $ ls -d /home /nonexistant > output.txt 2>
output.txt
2 $ cat output.txt
3 /home
226 5 Streams, Redirections, Piping
4 nnot access ’/nonexistant’: No such file or
directory
Why did the error message get mangled? This is
because the shell truncates the file before the process
is executed. So the error is written to the file, and
then the output message is written to the same
file, overwriting the error partially. Observe that
4: The output message is / only the first six characters of the error message are
home, followed by a newline mangled, the same size of the output. 4
character. This makes the
output message 6 characters
long. The shell first writes The correct way to redirect both the standard output
the error message to the file
and the standard error to the same file is to use the
(because that is printed first
by the ls command), and 2>\&1 operator. This means that the standard error
then overwrites the first 6 is redirected to the standard output. Here, the 1 is
bytes of the file with the out- the file descriptor for the standard output. The & is
put message. Since the 6th
byte is a newline character, it
used to tell the shell that the 1 is a file descriptor,
looks like there are two lines not a file.
in the file.
1 $ ls -d /home /nonexistant > output.txt 2>&1
2 $ cat output.txt
3 ls: cannot access ’/nonexistant’: No such file
or directory
4 /home
However, the order is important. The 2>\&1 operator
should be placed at the end of the command. If it
is placed at the beginning, then the standard error
will be redirected to the standard output, which,
at that point, is the terminal. Then the standard
output will be redirected to the file. Thus, only the
standard output will be redirected to the file.
1 $ ls -d /home /nonexistant 2>&1 > output.txt
2 ls: cannot access ’/nonexistant’: No such file
or directory
3 $ cat output.txt
4 /home
5.3 Redirection 227
5.3.3 Appending to a File
The > operator is used to redirect the output to a
file. If the file does not exist, the shell will create the
file. If the file exists, the shell will truncate the file.
However, if we want to append the output to the
file, we can use the >> operator.
1 $ echo "hello" > hello.txt
2 $ cat hello.txt
3 hello
4 $ echo "world" > hello.txt
5 $ cat hello.txt
6 world
7 $ echo "hello" > hello.txt
8 $ echo "world" >> hello.txt
9 $ cat hello.txt
10 hello
11 world
Observe that the > operator truncates the file, while
the >> operator appends to the file.
We can also append the standard error to a file
using the 2>>
1 $ ls /nonexistant 2> error.txt
2 $ cat error.txt
3 ls: cannot access ’/nonexistant’: No such file
or directory
4 $ daet 2>> error.txt
5 $ cat error.txt
6 ls: cannot access ’/nonexistant’: No such file
or directory
7 bash: daet: command not found
This is useful when we want to append the error
messages to a file, like in a log file.
Circular Redirection
If we try to redirect the output of a command to the
same file that we are reading from, we will get an
228 5 Streams, Redirections, Piping
empty file. However, if we append the output to the
file, we will get an infinite loop of the output. This
should not be done, as it will fill up the disk space.
However, GNU core utilities cat is smart enough
to detect this and will not read from the file.
1 $ echo hello > hello
2 $ cat hello >> hello
3 cat: hello: input file is output file
4 $ cat < hello > hello
5 cat: -: input file is output file
5: We will cover pipes in the However, it can still be tricked by using a pipe. 5
next section.
1 $ echo hello > hello
2 $ cat hello | cat >> hello
3 $ cat hello
4 hello
5 hello
Here we dont get into an infinite loop because the
first cat command reads from the file and writes to
the pipe. The second cat command reads from the
pipe and writes to the file. Since the file is not read
from and written to at the same time, we dont get
into an infinite loop.
However, BSD utilities like cat do not have this
check, thus giving the same input and output file
will result in an infinite loop.
5.3.4 Standard Input Redirection
The standard input of a process can be redirected
from a file using the < operator. Although most
commands accept a filename in the argument to
read from, so we can directly provide the file in
the argument. However, some commands do not
accept a filename and only read from the standard
5.3 Redirection 229
input. In such cases, we can use the < operator to
redirect the standard input from a file.
1 $ wc < ~/.bashrc
2 340 1255 11238
Although the wc command accepts a filename as an
argument, we can also redirect the standard input
from a file using the < to the command. However,
now the output of the wc command is different than
when we provide the filename as an argument.
1 $ wc ~/.bashrc
2 340 1255 11238 /home/sayan/.bashrc
When we provide the filename as an argument,
the wc command prints the number of lines, words,
and characters in the file, followed by the filename.
However, when we redirect the standard input
from a file, the wc command prints only the number
of lines, words, and characters in the file. This is
because it does not know that the input is coming
from a file. For wc, the input is just a stream coming
from the standard input.
1 $ wc
2 Hello, this is input from the keyboard
3 I can simply type more input here
4 All of this is taken as standard input
5 by wc command
6 Stop by pressing Ctrl+D
7 5 29 150
Another example where giving filename is not pos-
sible is the read command. The read command
reads a line from the standard input and assigns it
to a variable. The read command does not accept
a filename as an argument. So we have to use the
< operator to redirect the standard input from a
file.
1 $ read myline < /etc/hostname
230 5 Streams, Redirections, Piping
2 $ echo $myline
3 rex
Here, the read command reads a line from the file
/etc/hostname and assigns it to the variable myline.
The echo command is then used to print the value
of the variable myline. To access the value of the
6: This will be covered in variable, we use the $ operator. 6
depth in the Chapter 8 chap-
ter.
5.3.5 Here Documents
A here document is a way to provide input to a
command without using a file. The input is pro-
vided directly in the command itself. This is done
by using the << operator followed by a delimiter.
1 $ wc <<EOF
2 This is acting like a file
3 however the input is directly provided
4 to the shell or the script
5 this is mostly used in scripts
6 we can use any delimiter instead of EOF
7 to stop, type the delimiter on a new line
8 EOF
9 6 41 206
We can optionally provide a hyphen - after the <<
to ignore leading tabs. This is useful when the here
document is indented.
1 $ cat <<EOF
2 > hello
3 > this is space
4 > this is tab
5 > EOF
6 hello
7 this is space
8 this is tab
9 $ cat <<-EOF
10 hello
5.3 Redirection 231
11 this is space
12 this is tab
13 EOF
14 hello
15 this is space
16 this is tab
The delimiter can be any string. However, it is
common to use EOF as the delimiter. The delimiter
should be on a new line.
1 $ wc <<END
2 If I want to have
3 EOF
4 as part of the input
5 then I can simply change the delimiter
6 END
7 4 18 82
Here-documents also support variable expansion.
The shell will expand the variables before passing
the input to the command. Thus it is really useful
in scripts if we want to provide a template which is
filled with the values of the variables.
1 $ name="Sayan"
2 $ cat <<EOF
3 Hello, $name
4 This is a here document
5 EOF
6 Hello, Sayan
7 This is a here document
5.3.6 Here Strings
A here string is a way to provide input to a com-
mand without using a file. The input is provided
directly in the command itself. This is done by us-
ing the <<< operator followed by the input. It is
very similar to the here document, but the input is
232 5 Streams, Redirections, Piping
provided in a single line. Due to this, a delimiter is
not required.
1 $ wc <<< "This is a here string"
2 1 5 22
It can also perform variable expansion.
1 $ cat <<< "Hello, $USER"
2 Hello, sayan
Remark 5.3.2 Both heredocs and herestrings
are simply syntactic sugar. They are similar to
using echo and piping the output to a command.
However, it requires one less process to be exe-
cuted.
5.4 Pipes
Pipes are the holy grail of Unix-like operating sys-
tems. They are the most important concept to un-
derstand in Unix-like operating systems.
Definition 5.4.1 (Pipe) A pipe is a way to con-
nect the standard output of one process to the
standard input of another process. This is done
Figure 5.3: Pipes
by using the | operator.
Think of the shell as a factory, and the commands as
machines in the factory. Pipes are like conveyor belts
that connect the machines. It would be a pain if we
had to manually collect the produce of one machine
and then feed it to the next machine. Conveyors
make it easy by automatically taking the output of
one machine and feeding it to the next machine.
Pipes are the same, but for processes in shell.
5.4 Pipes 233
1 $ date
2 Thu Jul 4 09:05:30 AM IST 2024
3 $ date | wc
4 1 7 32
5.4.1 UNIX Philosophy
Each process simply takes the standard input and
writes to the standard output. How those streams
are connected is not the concern of the process.
This way of simply doing one thing and doing it
well is the Unix philosophy 7 which says that each 7: The Unix philosophy,
originated by Ken Thomp-
program should do one thing and do it well. Each son, is a set of cultural
process should take input from the standard input norms and philosophical ap-
and write to the standard output. The output of the proaches to minimalist, mod-
ular software development.
process should be easy to parse by another process.
It is based on the experience
The output of the process should not contain un- of leading developers of the
necessary information. This makes it easy to chain Unix operating system. Read
commands together using pipes. more online.
5.4.2 Multiple Pipes
There is no limit to the number of pipes that can be
chained together. It simply means that the output
of one process is fed to the input of the next process.
This simple but powerful construct lets the user do
any and all kinds of data processing.
Imagine you have a file which contains a lot of
words, you want to find which word is present
the most number of times. You can either write a
program in C or Python, etc., or you can use the
power of pipes to do it in a single line using simple
GNU coreutils.
I have a file alice\_in\_wonderland.txt which con-
tains the entire text of the book Alice in Wonderland.
234 5 Streams, Redirections, Piping
I want to find the words that are present the most
number of times.
There are some basic preprocessing you would do
even if you were writing a program. You would
convert all the words to lowercase, and remove
any punctuation. This can be done using the tr
command. Then you would split the text into each
word. This can be done using the tr command.
Then you would find the count of each word. There
are two ways of doing this, either you can use a
8: A dictionary in python is
dictionary 8 to store the frequency of each word,
a hash map. It is a data struc-
ture that maps keys to values. and increase it as you iterate over the entire text
The keys are unique, and the once. Or you can first sort the words, then simply
values can be accessed us- count the number of times each word is repeated.
ing the keys. It has amor-
tized constant time complex-
Since repeated words would always be consecutive,
ity for insertion, deletion, you can simply count the number of repeated words
and lookup. without having to store the frequency of each word.
This can be done using the sort and uniq commands.
Finally, you would sort the words based on the
frequency and print the top 10 words. This can be
done using the sort and head commands.
Lets see how we actually do this using pipes.
1 $ ls alice_in_wonderland.txt
2 alice_in_wonderland.txt
3 $ tr ’A-Z’ ’a-z’ < alice_in_wonderland.txt |
tr -cd ’a-z ’ | tr ’ ’ ’\n’ | grep . |
sort | uniq -c | sort -nr | head
4 1442 the
5 713 and
6 647 to
7 558 a
8 472 she
9 463 it
10 455 of
11 435 said
12 357 i
13 348 alice
5.4 Pipes 235
Let’s go over each command one by one and see
what it does.
tr ’A-Z’ ’a-z’
tr is a command that translates characters. The first
argument is the set of characters to be translated,
and the second argument is the set of characters to
translate to. Here, we are translating all uppercase
letters to lowercase. This command converts all
uppercase letters to lowercase. This is done to make
the words case-insensitive.
1 $ tr ’A-Z’ ’a-z’ < alice_in_wonderland.txt |
head -n20
2 alice’s adventures in wonderland
3
4 alice’s adventures in
wonderland
5
6 lewis carroll
7
8 the millennium fulcrum edition
3.0
9
10
11
12
13 chapter i
14
15 down the rabbit-hole
16
17
18 alice was beginning to get very tired of
sitting by her sister
19 on the bank, and of having nothing to do:
once or twice she had
20 peeped into the book her sister was reading,
but it had no
21 pictures or conversations in it, ‘and what is
the use of a book,’
236 5 Streams, Redirections, Piping
Remark 5.4.1 Since the text is really big, I am
going to filter all the intermediate outputs also
through the head command.
tr -cd ’a-z ’
The -c option is used to complement the set of char-
acters. The -d option is used to delete the characters.
Here, we are telling the tr command to delete all
characters except lowercase letters and spaces. This
is done to remove all punctuation and special char-
acters. Observe the difference in output now.
1 $ tr ’A-Z’ ’a-z’ < alice_in_wonderland.txt |
tr -cd ’a-z ’ | head -c500
2 alices adventures in wonderland
alices adventures in wonderland
lewis carroll
the millennium fulcrum edition
chapter i
down the rabbithole alice was
beginning to get very tired of sitting by
her sisteron the bank and of having
nothing to do once or twice she hadpeeped
into the book her sister was reading but
it had nopictures or conversations in it
and what is the use of a bookthought alice
w
Remark 5.4.2 After we have removed all the
punctuation, the text is now only a single line.
This is because we have removed all the char-
acters, including the newline characters. Thus
to restrict the output, I am using the head -c500
instead of head -n20.
tr ’ ’ ’\n’
The tr command is used to translate characters.
5.4 Pipes 237
Here, we are translating spaces to newline charac-
ters. This is done to split the text into words. We are
doing this so that each word is on a new line. This
is helpful as the sort and uniq commands work on
lines.
1 $ tr ’A-Z’ ’a-z’ < alice_in_wonderland.txt |
tr -cd ’a-z ’ | tr ’ ’ ’\n’ | head -20
2 alices
3 adventures
4 in
5 wonderland
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 alices
grep .
Now we have each word on a new line. But observe
that there are many empty lines. This is because
of the multiple spaces between words and spaces
around punctuation. We can remove these empty
lines using the grep . command. 9 9: We will learn more about
regular expressions in the
1 $ tr ’A-Z’ ’a-z’ < alice_in_wonderland.txt | Chapter 6 chapter.
tr -cd ’a-z ’ | tr ’ ’ ’\n’ | grep . |
head -20
2 alices
3 adventures
238 5 Streams, Redirections, Piping
4 in
5 wonderland
6 alices
7 adventures
8 in
9 wonderland
10 lewis
11 carroll
12 the
13 millennium
14 fulcrum
15 edition
16 chapter
17 i
18 down
19 the
20 rabbithole
21 alice
Now we are almost there. We have each word on
a new line. Now we can pass this entire stream
of words to the sort command, this will sort the
words.
sort
Sort by default sorts the words in lexicographical
order. This is useful as repeated words would be
consecutive. This is important as the uniq command
only works on consecutive lines only.
1 $ tr ’A-Z’ ’a-z’ < alice_in_wonderland.txt |
tr -cd ’a-z ’ | tr ’ ’ ’\n’ | grep . |
sort | head -20
2 a
3 a
4 a
5 a
6 a
7 a
8 a
9 a
5.4 Pipes 239
10 a
11 a
12 a
13 a
14 a
15 a
16 a
17 a
18 a
19 a
20 a
21 a
If you run the command yourself without the head
command, you can see that the words are sorted.
The repeated words are consecutive. In the above
output we can see that the word a is repeated many
times. Now we can use the uniq command to count
the number of times each word is repeated.
uniq -c
The uniq command is used to remove consecu-
tive duplicate lines. However, it also can count the
number of times each line is repeated using the -c
option.
1 $ tr ’A-Z’ ’a-z’ < alice_in_wonderland.txt |
tr -cd ’a-z ’ | tr ’ ’ ’\n’ | grep . |
sort | uniq -c | head -20
2 558 a
3 1 abarrowful
4 1 abat
5 1 abidefigures
6 1 able
7 79 about
8 1 aboutamong
9 1 aboutand
10 1 aboutby
11 3 abouther
12 3 aboutit
13 1 aboutthis
240 5 Streams, Redirections, Piping
14 1 abouttrying
15 2 above
16 1 abranch
17 1 absenceand
18 2 absurd
19 1 acceptance
20 2 accident
21 1 accidentally
Great! Now we have the count of each word. How-
ever, the words are still sorted by the word. We
want to sort the words by the count of the word.
This can be done using the sort command.
sort -nr
Sort by default sorts in lexicographical order. How-
ever, we want to sort by the count of the word. The
-n option is used to sort the lines numerically. The
-r option is used to sort in reverse order.
Exercise 5.4.1 Try to run the same command
without the -n option. Observe how the sorting
makes sense alphabetically, but not numerically.
1 $ tr ’A-Z’ ’a-z’ < alice_in_wonderland.txt |
tr -cd ’a-z ’ | tr ’ ’ ’\n’ | grep . |
sort | uniq -c | sort -nr | head -20
2 1442 the
3 713 and
4 647 to
5 558 a
6 472 she
7 463 it
8 455 of
9 435 said
10 357 i
11 348 alice
12 332 you
13 332 in
5.4 Pipes 241
14 313 was
15 241 that
16 237 as
17 202 her
18 190 at
19 169 on
20 161 all
21 158 with
Finally, we have the top 20 words in the file alice\
_in\_wonderland.txt along with the count of each
word.
10: Such commands are
Although the above command is a single line 10 , it called one-liners.
is doing a lot of processing. Still, it is very readable.
This is the power of pipes.
Remark 5.4.3 Usually, when we come across
such one-liners, it may initially seem to com-
plicated to understand. The key to understand-
ing such one-liners is to break them down into
smaller parts and understand them one com-
ponent at a time from left to right. Feel free to
execute each command separately and observe
the output like we did above.
5.4.3 Piping Standard Error
Pipes are used to connect the standard output of one
process to the standard input of another process.
However, the standard error is not connected and
remains mapped to the terminal. This is because
the standard error is a separate stream from the
standard output.
However, we can connect the standard error to the
standard input of another process using the 2>\&1
242 5 Streams, Redirections, Piping
operator. This is useful when we want to process
the error messages of a command.
1 $ ls -d /home /nonexistant | wc
2 "/nonexistant": No such file or directory (os
error 2)
3 1 1 6
4 $ ls -d /home /nonexistant 2>&1 | wc
5 2 10 61
This is same as redirecting both the streams to a
single file as demostrated earlier.
However, there is a shorter way to do this using the
&| syntactic sugar.
1 $ ls -d /home /nonexistant |& wc
2 2 10 61
This does the exact same thing as 2>\&1 followed
by the pipe.
5.4.4 Piping to and From Special Files
As we discussed earlier, there are some special files
in the /dev directory.
/dev/null
The /dev/null file is a special file that discards all
the data that is written to it. It is like a black hole.
All the data that is not needed can be written to
this file. Usually errors are written to the /dev/null
file.
1 $ ls -d /nonexistant /home 2> /dev/null
2 /home
The error is not actually stored in any file, thus the
storage space is not wasted.
/dev/zero
5.4 Pipes 243
The /dev/zero file is a special file that provides an
infinite stream of null bytes. This is useful when
you want to provide an infinite stream of data to a
process.
1 $ head -c1024 /dev/zero > zero.txt
2 $ ls -lh zero.txt
3 -rw-r--r-- 1 sayan sayan 1.0K Jul 4 15:04
zero.txt
Here we are taking the first 1024 bytes from the
/dev/zero and writing it to the file zero.txt. The
file zero.txt is 1.0K in size. This is because the /dev
/zero file provides an infinite stream of null bytes,
of which 1024 bytes are taken.
Warning 5.4.1 Make sure to always use head or
any other limiter when working with /dev/zero
or /dev/random as they are infinite streams. For-
getting this can lead to the disk being filled
up. Head with the default parameter will also
not work, since it depends on presence of new-
line characters, which is not there in an infinite
stream of zeros. That is why we are using a byte
count limiter using the -c option. If you forget to
add the limiter, you can press Ctrl+C as quickly
as possible to stop the process and then remove
the file using rm.
/dev/random and /dev/urandom
The /dev/random and /dev/urandom files are special
files that are infinite suppliers of random bytes.
The /dev/random file is a blocking random number
generator. This means that it will block if there is
not enough entropy. The /dev/urandom file is a non-
blocking random number generator. This means
that it will not block even if there is not enough
244 5 Streams, Redirections, Piping
entropy. Both can be used to generate random num-
bers.
1 $ head -c1024 /dev/random > random.txt
2 $ ls -lh random.txt
3 -rw-r--r-- 1 sayan sayan 1.0K Jul 4 15:04
random.txt
Observe that here too the file is of size 1.0K. This is
because we are still taking only the first 1024 bytes
from the infinite stream of random bytes. However,
if we gzip the data, we can see that the zeros file is
much smaller than the random file.
1 $ gzip random.txt zero.txt
2 $ ls -lh random.txt.gz zero.txt.gz
3 -rw-r--r-- 1 sayan sayan 1.1K Jul 4 15:11
random.txt.gz
4 -rw-r--r-- 1 sayan sayan 38 Jul 4 15:10
zero.txt.gz
The random file is 1.1K in size, while the zero file is
only 38 bytes in size. This is because the random file
has more entropy and thus cannot be compressed
as much as the zero file.
5.4.5 Named Pipes
A named pipe is a special file that provides a way
to connect the standard output of one process to
the standard input of another process. This is done
by creating a special file in the filesystem. We have
already covered named pipes in the Chapter 1 chap-
ter.
Although a pipe is faster than a named pipe, a
named pipe can be used to connect processes that
are not started at the same time or from the same
shell. This is because a named pipe is a file in the
5.4 Pipes 245
filesystem and can be accessed by any process that
has the permission to access the file.
Try out the following example. First create a named
pipe using the mkfifo command.
1 $ mkfifo pipe1
Then run two processes, one that writes to the
named pipe, another that reads from the named
pipe. The order of running the processes is not
important.
Terminal 1:
1 $ cat /etc/profile > pipe1
Terminal 2:
1 $ wc pipe1
2 47 146 993 pipe1
Exercise 5.4.2 After you have created the named
pipe, try changing the order of running the
other two processes. Observe that whatever is
run first will wait for the other process to start.
This is because a named pipe is not storing the
data piped to it in the filesystem. It is simply a
buffer in the memory.
A named pipe is more useful over regular files
when two processes want to communicate with
each other. This is because a named pipe is
1. Faster than a regular file as it does not store
the data in the file system.
2. Independent of the order of launching the
processes. The reader can be launched first
and it will still wait for the writer to send the
data.
246 5 Streams, Redirections, Piping
3. Works concurrently. The writer does not need
to be done writing the entire data before the
reader can start reading. The reader can start
as soon as the writer starts writing. The faster
process will simply block until the slower
process catches up.
To demonstrate the last point, try running the fol-
lowing commands.
Ensure that a named pipe is created.
Terminal 1:
1 $ grep linux pipe1
This process is looking for lines containing the word
linux in the file pipe1. Initially it will simply block.
Grep is a command that does not wait for the entire
file to be read. It starts printing output as soon as a
line containing the pattern is read.
Terminal 2:
1 $ tree / > pipe1
This command is attempting to list out each and
every file on the filesystem. This takes a lot of time.
However, since we are using a named pipe, the grep
command will start running as soon as the first line
is written to the pipe.
You can now observe the first terminal will start
printing some lines containing the word linux as
soon as the second terminal starts writing to the
pipe.
Now try the same with a regular file.
1 $ touch file1 # ensure its a normal file
2 $ tree / > file1
3 $ grep linux file1
5.4 Pipes 247
Since this is a regular file, we cannot start reading
from the file before the entire file is written. If we
do that, the grep command will quit as soon as it
catches up with the writer, it will not block and
wait.
Observe that to start getting output on the screen
takes a lot longer in this method. Not to mention
the disk space wasted due to this.
Remark 5.4.4 Remember that when we use redi-
rection (>) to write to a file, the shell truncates
the file. But when we use a named pipe, the shell
knows that the file is a named pipe and does not
truncate the file.
5.4.6 Tee Command
The tee command is a command that reads from the
standard standard input and writes to the standard
output and to a file. It is very useful when you want
to save the output of a command but also want to
see the output on the terminal.
1 $ ls -d /home /nonexistant | tee output.txt
2 ls: cannot access ’/nonexistant’: No such file
or directory
3 /home
4 $ cat output.txt
5 /home
Observe that only the standard output is written to
the file, and not the standard error. This is because
pipes only connect the standard output to the stan-
dard input of the next command, and the standard
error remains mapped to the terminal.
248 5 Streams, Redirections, Piping
Thus in the above output, although it may look like
that both the standard output and the standard error
are written to the terminal by the same command,
it is not so.
The standard error of the ls command remains
mapped to the terminal, and thus gets printed
directly, whereas the standard output is redirected
to the standard input of the tee command, which
then prints that to the standard output (and also
writes it to the file).
You can also mention multiple files to write to.
1 $ ls -d /home | tee output1.txt output2.txt
2 $ cat output1.txt
3 /home
4 $ cat output2.txt
5 /home
6 $ diff output1.txt output2.txt
7 $
Remark 5.4.5 The diff command is used to
compare two files. If the files are the same, then
the diff command will not print anything. If
the files are different, then the diff command
will print the lines that are different.
We can also append to the file using the -a option.
1 $ ls -d /home | tee output.txt
2 $ ls -d /etc | tee -a output.txt
3 $ cat output.txt
4 /home
5 /etc
5.5 Command Substitution 249
5.5 Command Substitution
We have already seen that we can run multiple
commands in a subshell in bash by enclosing them
in parentheses.
1 $ (whoami; date)
2 sayan
3 Thu Jul 4 09:05:30 AM IST 2024
This is useful when you simply want to print the
standard output of the commands to the terminal.
However, what if you want to store the output of
the commands in a variable? Or what if you want
to pass the standard output of the commands as an
argument to another command?
To do this, we use command substitution. Com-
mand substitution is a way to execute one or more
processes in a subshell and then use the output of
the subshell in the current shell.
There are two ways to do command substitution.
1. Using backticks ‘command‘ - this is the legacy
way of doing command substitution. It is not
recommended to use this as it is difficult to
read and can be confused with single quotes.
It is also harder to nest.
2. Using the $(command) syntax - this is the rec-
ommended way of doing command substitu- Throughout this book, we
will use the $(command)
tion. It is easier to read and nest.
syntax and not the backticks.
1 $ echo "Today is $(date)"
2 Today is Thu Jul 4 09:05:30 AM IST 2024
Here we are using the $(date) command substitu-
tion to get the current date and time and then using
it as an argument to the echo command.
250 5 Streams, Redirections, Piping
1 $ myname="$(whoami)"
2 $ mypc="$(hostname)"
3 $ echo "Hello, $myname from $mypc"
4 Hello, sayan from rex
We can store the output of the command in a vari-
able and then use it later. This is useful when you
want to use the output of a command multiple
times.
Remark 5.5.1 Although you do not need to
use the quotes with the command substitution
in this case, it is always recommended to use
quotes around the variable assignment, since
if the output is multiword or multiline, it will
throw an error.
5.6 Arithmetic Expansion
Arithmetic expansion allows the evaluation of an
arithmetic expression and the substitution of the
result. The format for arithmetic expansion is:
1 $(( expression ))
This is the reason we cannot directly nest subshells
without a command between the first and the sec-
ond subshell.
11: 1MiB = 1024KiB =
1 $ cat /dev/random | head -c$((1024*1024)) >
1024 × 1024 bytes - this is
called a mebibyte random.txt
1MB = 1000KB = 1000 × 2 $ ls -lh random.txt
1000 bytes - this is called a 3 -rw-r--r-- 1 sayan sayan 1.0M Jul 4 15:04
megabyte random.txt
This is a very common confu-
sion amongst common peo-
ple. Kilo, Mega, Giga are SI
Here we are using arithmetic expansion to calculate
11
prefixes, while Kibi, Mebi, the number of bytes in 1MiB and then using it as
Gibi are IEC prefixes.
5.7 Process Substitution 251
an argument to the head command. This results in
creation of a file random.txt of size 1MiB.
5.6.1 Using variables in arithmetic
expansion
We can also use variables in arithmetic expansion.
We dont have to use the $ operator to access the
value of the variable inside the arithmetic expan-
sion.
1 $ a=10
2 $ b=20
3 $ echo $((a+b))
4 30
There are other ways to do arithmetic in bash, such
as using the expr command, or using the let com-
mand. However, the $(()) syntax is the most recom-
mended way to do simple arithmetic with variables
in bash.
5.7 Process Substitution
Process substitution is a way to provide the output
of a process as a file. This is done by using the
<(command) syntax.
Some commands do not accept standard input. They
only accept a filename as an argument. This is the
exact opposite of the issue we had with the read
command, which accepted only standard input and
not a filename. There we used the < operator to
redirect the standard input from a file.
The diff command is a command that compares
two files and prints out differences. It does not
252 5 Streams, Redirections, Piping
accept standard input. If we want to compare dif-
ferences between the output of two processes, we
can use process substitution.
Imagine you have two directories and you want
to compare the files in the two directories. You
can use the diff command to compare the two
directories.
1 $ ls
2 $ dir1 dir2
3 $ ls dir1
4 file1 file2 file3
5 $ ls dir2
6 file2 file3 file4
We can see that the two directories have some
common files (file2 and file3) and some different
files (file1 in dir1 and file4 in dir2).
However, if we have a lot of files, it is difficult to see
manually which files are different.
Let us first try to save the output of the two ls
commands to files and then compare the files using
diff.
1 $ ls
2 dir1 dir2
3 $ ls dir1 > dir1.txt
4 $ ls dir2 > dir2.txt
5 $ diff dir1.txt dir2.txt
6 1d0
7 < file1
8 3a3
9 > file4
Great! We can see that the file file1 is present only
in dir1 and the file file4 is present only in dir2. All
other files are common.
However observe that we had to create two files
dir1.txt and dir2.txt to store the output of the ls
5.7 Process Substitution 253
commands. This is not efficient. If the directories
contained a million files, then we would have to
store tens or hundreds of megabytes of data in the
files.
It sounds like a job for the named pipes we learnt
earlier. Lets see how easier or harder that is.
1 $ ls
2 dir1 dir1.txt dir2 dir2.txt
3 $ rm dir1.txt dir2.txt
4 $ mkfifo dir1.fifo dir2.fifo
5 $ ls dir1 > dir1.fifo &
6 $ ls dir2 > dir2.fifo &
7 $ diff dir1.fifo dir2.fifo
8 1d0
9 < file1
10 3a3
11 > file4
Et voila! We have the same output as before, but
without the data actually being stored in the filesys-
tem. The data is simply stored in the memory till
the diff command reads it. However observe that
we had to create two named pipes, and also run the
ls processes in the background as otherwise they
would block. Also, we have to remember to delete
the named pipes after using them. This is still too
much hassle.
Let us remove the named pipes and the files.
1 $ rm *fifo
Now let us see how easy it is using process substi-
tution.
1 $ diff <(ls dir1) <(ls dir2)
2 1d0
3 < file1
4 3a3
5 > file4
254 5 Streams, Redirections, Piping
Amazing! We have the same output as before, but
without having to initialize anything. The process
substitution does all the magic of creating tempo-
rary named pipes and running the processes with
the correct redirections concurrently. It then substi-
tutes the filenames of the named pipes in the place
of the process substitution.
Process substitution is also extremely useful when
comparing between expected output and actual
output in some evaluation of a student’s scripts.
12: Try to find if this is used 12
in the evaluation scripts of
the VM Tasks! We can also use process substitution to provide
input to a process running in the subshell.
1 tar cf >(bzip2 -c > file.tar.bz2) folder1
This calls tar cf /dev/fd/?? folder1, and bzip2 -
c > file.tar.bz2.
This example is
lifted from https: Because of the /dev/fd/<n> system feature, the pipe
//tldp.org/LDP/abs/ between both commands does not need to be named.
html/process-sub.html.
This can be emulated as
If you are interested in
more examples of process 1 mkfifo pipe
substitution, refer the same.
2 bzip2 -c < pipe > file.tar.bz2 &
3 tar cf pipe folder1
4 rm pipe
Remark 5.7.1 tar is a command that is used to
create archives. It simply puts all the files and
directories in a single file. It does not perform
any compression. The c option is used to create
an archive. The f option is used to mention the
name of the archive. The bzip2 command is used
to compress files. The -c option is used to write
the compressed data to the standard output.
The > operator is used to redirect the standard
5.7 Process Substitution 255
output to a file. We will cover tar and zips in
more detail later.
That is pretty much all you need to know about
pipes and redirections. To really understand and
appreciate the power of pipes and redirections, you
have to stop thinking imperically (like C or Python)
and start thinking in streams, like a functional
programming language. Once this paradigm shift
happens, you will start to see the power of pipes
and redirections and will be able to tackle any kind
of task in the command line.
256 5 Streams, Redirections, Piping
5.8 Summary
Let us quickly summarize the important syntax and
commands we learnt in this chapter.
5.8 Summary 257
Table 5.1: Pipes, Streams, and Redirection syntax
Syntax Command Description
; Command Separator Run multiple commands
in a single line
&& Logical AND Run the second command only
if the first command succeeds
|| Logical OR Run the second command
only if the first command fails
> Output Redirection Redirect the stdout of
the process to a file
>> Output Redirection Append the stdout of the
process to a file
< Input Redirection Redirect the stdin of
the process from a file
2> Error Redirection Redirect the stderr of
the process to a file
2>\&1 Error Redirection Redirect the stderr of the
process to the stdout
<<EOF Here Document Redirect the stdin of
the process from a block of text
<<< Here String Redirect the stdin of
the process from a string
| Pipe Connect the stdout of one process
to the stdin of another process
&| Pipe Stderr Connect the stderr of one process
to the stdin of another process
$(command) Command Substitution Run a command and use
the output in the current shell
$((expression)) Arithmetic Expansion Evaluate an arithmetic
expression
<(command) Process Substitution Provide the output of
a process as a file
>(command) Process Substitution Provide the input to a
process from a file
Pattern Matching 6
6.1 Introduction
We have been creating files and directories for a
while now, and we have often required to search for
files or directories in a directory. Till now we used
to use the ls command to list out all the files in a
directory and then check if the file we are looking
for is present in the list or not. This works fine when
the number of files is less, but when the number
of files is large, this method becomes cumbersome.
This is where we can use pattern matching to search
for files or directories or even text in a file.
You would have also used the popular Ctrl+F short-
cut on most text editors or browsers to search for
text in a file or on a webpage. This is also an example
of pattern matching.
6.2 Globs and Wildcards
The simplest form of pattern matching is using
globs for filename expansion. Globs are used to
match filenames in the shell.
Definition 6.2.1 (Glob) A glob is a pattern-
matching mechanism used for filename expan-
sion in the shell. The term "glob" represents
the concept of matching patterns globally or
expansively across multiple filenames or paths.
In bash, we can use the following wildcards to
match filenames:
260 6 Pattern Matching
▶ * - Matches zero or more characters.
▶ ? - Matches exactly one character.
▶ [abc] - Matches any one of the characters
within the square brackets.
▶ [a-z] - Matches any one of the characters in
the range.
▶ [!abc] - Matches any character except the
ones within the square brackets.
Let us explore these in detail.
Try to guess the output of
each of the command be- 1 $ touch abc bbc zbc aac ab
fore seeing the output. If you
2 $ ls -1
get an output different from
3 aac
what you expected, try to un-
derstand why. 4 ab
5 abc
6 bbc
7 zbc
8 $ echo a*
9 aac ab abc
10 $ echo a?
11 ab
12 $ echo ?bc
13 abc bbc zbc
14 $ echo [ab]bc
15 abc bbc
16 $ echo [az]bc
17 abc zbc
18 $ echo [a-z]bc
19 abc bbc zbc
20 $ echo [!ab]bc
21 zbc
22 $ echo [!z]bc
23 abc bbc
24 $ echo [!x-z]?c
25 aac abc bbc
Shell globs only work with files and directories in
the current directory. The glob expansion to sorted
list of valid files in the current directory is done by
the shell, and not by the command itself. It is done
6.2 Globs and Wildcards 261
before the command is executed. The command
thus does not even know that a glob was used to
expand the filenames. To the command, it looks
like the user directly typed the filenames.
A glob always expands to a space separated list of
filenames. However, how the command interprets
this list of filenames is up to the command. Some
commands such as ls -1 will print each filename on
a new line, whereas some commands such as echo
will print all filenames on the same line separated
by a space. echo does not care if the arguments
passed to it are filenames or not. It just prints them
as is.
1 $ echo a*
2 aac ab abc
3 $ ls -1 a*
4 aac
5 ab
6 abc
7 $ ls a*
8 aac ab abc
9 $ wc a*
10 0 0 0 aac
11 0 0 0 ab
12 0 0 0 abc
13 0 0 0 total
14 $ stat a*
15 File: aac
16 Size: 0 Blocks: 0 IO
Block: 4096 regular empty file
17 Device: 8,2 Inode: 4389746 Links: 1
18 Access: (0644/-rw-r--r--) Uid: ( 1000/
sayan) Gid: ( 1001/ sayan)
19 Access: 2024-07-12 19:10:27.542322238 +0530
20 Modify: 2024-07-12 19:10:27.542322238 +0530
21 Change: 2024-07-12 19:10:27.542322238 +0530
22 Birth: 2024-07-12 19:10:27.542322238 +0530
23 File: ab
24 Size: 0 Blocks: 0 IO
262 6 Pattern Matching
Block: 4096 regular empty file
25 Device: 8,2 Inode: 4389748 Links: 1
26 Access: (0644/-rw-r--r--) Uid: ( 1000/
sayan) Gid: ( 1001/ sayan)
27 Access: 2024-07-12 19:16:10.707221331 +0530
28 Modify: 2024-07-12 19:16:10.707221331 +0530
29 Change: 2024-07-12 19:16:10.707221331 +0530
30 Birth: 2024-07-12 19:16:10.707221331 +0530
31 File: abc
32 Size: 0 Blocks: 0 IO
Block: 4096 regular empty file
33 Device: 8,2 Inode: 4389684 Links: 1
34 Access: (0644/-rw-r--r--) Uid: ( 1000/
sayan) Gid: ( 1001/ sayan)
35 Access: 2024-07-12 19:10:22.055523865 +0530
36 Modify: 2024-07-12 19:10:22.055523865 +0530
37 Change: 2024-07-12 19:10:22.055523865 +0530
38 Birth: 2024-07-12 19:10:22.055523865 +0530
As seen above, the globs simply expand to the file-
names in the current path and pass it as arguments
to the command. The output of the command de-
pends on what command it is. The wc command
counts the number of lines, words, and characters
in a file. The stat command prints out the metadata
of the files. Similarly, we can use the file command
to print the type of the files. Try it out.
Exercise 6.2.1 Go to an empty directory, and
run the following command.
1 $ expr 5 * 5
2
Now run the following command.
1 $ touch +
2 $ expr 5 * 5
3
Observe the output of the commands. Can you
explain why the output is different?
6.3 Regular Expressions 263
When using globs in the shell, if a glob does not
match any files, it is passed as is to the command.
The command then interprets the glob as a normal
string.
1 $ touch abc bbc
2 $ ls
3 abc bbc
4 $ echo ?bc
5 abc bbc
6 $ echo ?bd
7 ?bd
As echo does not care if the arguments passed to it
are filenames or not, it simply prints the arguments
as is. The ls command, however, will not print the
filenames if they do not exist and will instead print
to the standard error that the file does not exist. Use
this knowledge to decipher why the above exercise
behaves the way it does.
6.3 Regular Expressions
Globs are good enough when we simply want to run
some command and pass it a list of files matching
some pattern. However, when we want to do more
complex pattern matching, we need to use regular
expressions.
Definition 6.3.1 (Regular Expression) A regu-
lar expression (shortened as regex or regexp),
sometimes referred to as rational expression, is
a sequence of characters that specifies a match
pattern in text. Usually such patterns are used by
string-searching algorithms for "find" or "find
and replace" operations on strings, or for input
264 6 Pattern Matching
validation. Regular expression techniques are
developed in theoretical computer science and
formal language theory.
Due to the powerfullness of regular expressions,
almost all programming languages and text pro-
cessing tools support regular expressions directly.
This makes text processing very easy and powerful,
as well as cross-platform.
However, there are multiple flavors of regular ex-
pressions, and the syntax of each flavor may differ
slightly. The most common flavors are:
▶ Basic Regular Expressions (BRE)
▶ Extended Regular Expressions (ERE)
▶ Perl-Compatible Regular Expressions (PCRE)
These are the regular expressions that are sup-
ported by most text processing utilities in Unix-like
systems. These follow the POSIX standard for regu-
lar expressions. There are also Perl syntax regular
expressions, which are more powerful and flexi-
1: PCRE and Perl Regex are ble, and are supported by the Perl programming
not the same. PCRE is a li-
language. 1
brary that implements Perl
like regular expressions, but
in C. Perl Regex is the regular We will focus on BRE and ERE in this chapter,
expression that is used in the as these are the most commonly used flavors in
Perl programming language. Unix-like systems.
More details can be found
online.
6.3.1 Basic Regular Expressions
Basic Regular Expressions (BRE) are the simplest
form of regular expressions. They are supported by
2: awk uses ERE by default, most Unix-like systems and are the default regular
not BRE.
expressions used by most text processing utilities
such as grep, and sed. 2
6.3 Regular Expressions 265
BRE syntax is similar to the glob syntax, but with
more power and flexibility. There are some subtle
differences between the two.
The following are the basic regular expressions that
can be used in BRE:
▶ a - Matches the character a.
▶ . - Matches any single character exactly once.
▶ * - Matches zero or more occurrences of the
previous character.
▶ ^ - Matches the null string at the start of a
line.
▶ $ - Matches the null string at the end of a line.
▶ [abc] - Matches any one of the characters
within the square brackets.
▶ [a-z] - Matches any one of the characters in
the range, both ends inclusive.
▶ [^abc] - Matches any character except the
ones within the square brackets; the caret
symbol has a different meaning when inside
the brackets.
▶ \+ - Matches one or more occurrences of the
previous character.
▶ \? - Matches zero or one occurrence of the
previous character.
▶ {n} - Matches exactly n occurrences of the
previous character.
▶ {n,} - Matches n or more occurrences of the
previous character.
▶ {n,m} - Matches n to m occurrences of the
previous character.
▶ \ - Escapes a special character such as *, ., [,
\, $, or ^.
▶ regex1|regex2 - Matches either regex1 or regex2.
▶ (regex) - Groups the regex.
▶ \2 - Matches the 2-nd (...) parenthesized
subexpression in the regular expression. This
266 6 Pattern Matching
is called a back reference. Subexpressions are
implicitly numbered by counting occurrences
of ( left-to-right.
▶ \n - Matches a newline character.
6.3.2 Character Classes
A bracket expression is a list of characters enclosed
by ’[’ and ’]’. It matches any single character in that
list; if the first character of the list is the caret ’^’,
then it matches any character not in the list. For
example, the following regex matches the words
’gray’ or ’grey’.
gr[ae]y
Let’s create a small script to test this regex.
1 $ cat regex1.sh
2 #!/bin/bash
3 read -r -p "Enter color: " color
4 if [[ $color =~ gr[ae]y ]]; then
5 echo "The color is gray or grey."
6 else
7 echo "The color is not gray or grey."
8 fi
9 $ ./regex1.sh
10 Enter color: gray
11 The color is gray or grey.
12 $ ./regex1.sh
13 Enter color: grey
14 The color is gray or grey.
15 $ ./regex1.sh
16 Enter color: green
17 The color is not gray or grey.
6.3 Regular Expressions 267
Ranges
Within a bracket expression, a range expression
consists of two characters separated by a hyphen.
It matches any single character that sorts between
the two characters, inclusive. In the default C locale,
the sorting sequence is the native character order;
for example, ‘[a-d]’ is equivalent to ‘[abcd]’. 3 3: There are locales other
than the default C locale,
such as the en_US.UTF-8 lo-
For example, the following regex matches any low-
cale, which sorts and col-
ercase letter. lates characters differently.
In the en_US.UTF-8 locale,
[a-z] the sorting sequence is based
on the Unicode code points
Let’s create a small script to test this regex. of the characters and col-
lates characters with accents
1 $ cat regex2.sh along with the characters.
2 #!/bin/bash
3 read -r -p "Enter a letter: " letter
4 if [[ $letter =~ [a-z] ]]; then
5 echo "The letter is a lowercase letter."
6 else
7 echo "The letter is not a lowercase letter
."
8 fi
9 $ ./regex2.sh
10 Enter a letter: a
11 The letter is a lowercase letter.
12 $ ./regex2.sh
13 Enter a letter: A
14 The letter is not a lowercase letter.
Named Character Classes
There are some predefined character classes which
are used often, that can be used in regular expres-
sions. These classes contain a pair of brackets, and
should be present inside a bracket expression. Some
of the common character classes are:
268 6 Pattern Matching
▶ [[:alnum:]] - Alphanumeric characters: [[:
alpha:]] and [[:digit:]]; in the ‘C’ locale
and ASCII character encoding, this is the
same as [0-9A-Za-z].
▶ [[:alpha:]] - Alphabetic characters: [[:lower
:]] and [[:upper:]]; in the ‘C’ locale and
ASCII character encoding, this is the same as
[A-Za-z].
▶ [[:blank:]] - Blank characters: space and tab.
▶ [[:cntrl:]] - Control characters. In ASCII,
these characters have octal codes 000 through
037, and 177 (DEL). In other character sets,
these are the equivalent characters, if any.
▶ [[:digit:]] - Digits: 0 1 2 3 4 5 6 7 8 9.
▶ [[:graph:]] - Graphical characters: [[:alnum
:]] and [[:punct:]].
▶ [[:lower:]] - Lower-case letters; in the ‘C’
locale and ASCII character encoding, this is a
b c d e f g h i j k l m n o p q r s t u v w x y z.
▶ [[:print:]] - Printable characters: [[:alnum
:]], [[:punct:]], and space.
▶ [[:punct:]] - Punctuation characters.
▶ [[:space:]] - Space characters: in the ‘C’ lo-
cale, this is tab, newline, vertical tab, form
feed, carriage return, and space.
▶ [[:upper:]] - Upper-case letters: in the ‘C’
locale and ASCII character encoding, this is
ABCDEFGHIJKLMNOPQRSTUV
W X Y Z.
▶ [[:xdigit:]] - Hexadecimal digits: 0 1 2 3 4 5
6 7 8 9 A B C D E F a b c d e f.
These named character classes’s expansion depends
on the locale. For example, in the en_US.UTF-8
locale, the [[:lower:]] class will match all lowercase
letters in the Unicode character set, not just the
ASCII character set.
6.3 Regular Expressions 269
It is important to note that these named character
classes should be present inside two square brackets,
and not just one. If we use only one square bracket,
it will intepret each character inside the square
brackets as a separate character in the list.
[:digit:] will match any of the characters d g i
t :.If a character is repeated in the list, it has no
additional effect.
Some characters have different meaning inside the
list depending on the position they are in. For
example, the caret symbol ^ negates the entire list
if it is the first character in the list, but is matched
literally if it is not the first character in the list.
▶ ] is used to end the list of characters, unless
it is the first character in the list, then it is
matched literally. From this example, we have
▶ - is used to specify a range of characters, started using the grep com-
unless it is the first or last character in the list, mand to match regex quickly
instead of creating a script.
then it is matched literally.
We will discuss the grep
▶ ^ is used to negate the list of characters if it is command in more detail
the first character in the list, else it is matched later in this chapter. The -
literally. o flag is used to print only
the matching part of the line,
Example for ]: instead of the entire line. If
you omit it, and run the com-
1 $ echo "match square brackets [ and ]" | grep mand, you will see the entire
-o ’[mb]’ line that has the matching
2 m part being printed, however
the matching part may still
3 b
be highlighted using color,
4 $ echo "match square brackets [ and ]" | grep As it is not possible to high-
-o ’[]mb]’ light the matching part in
5 m this book, we are using the -
6 b o flag to print only the match-
7 ] ing part.
Observe how putting the hy-
Example for -: phen at the start of the list
makes it match the hyphen
1 $ echo "ranges are separated by hyphens like - literally, whereas putting it
" | grep -o ’[a-c]’ in the middle makes it match
a range of characters.
270 6 Pattern Matching
2 a
3 a
4 a
5 a
6 b
7 $ echo "ranges are separated by hyphens like -
" | grep -o ’[-a-c]’
8 a
9 a
10 a
11 a
12 b
13 -
The caret symbol ^ is used to
negate the list of characters
if it is the first character in
the list, else it is matched lit-
erally. First case will match
any character except l, i, n,
e, and the second case will
match only the characters l,
Example for ^:
i, n, e and the caret symbol
^. 1 $ echo "this is a ^line^" | grep -o ’[^line]’
2 t
3 h
4 s
5
6 s
7
8 a
9
10 ^
11 ^
12 $ echo "this is a ^line^" | grep -o ’[line^]’
13 i
14 i
15 ^
16 l
17 i
18 n
19 e
20 ^
6.3 Regular Expressions 271
Collating Symbols
A collating symbol is a single-character collating
element enclosed in ‘[.’ and ‘.]’. It stands for a col-
lating element that collates with a single character,
as if the character were a separate character in the
POSIX locale’s collation order. Collating symbols
are typically used when a digraph is treated like a
single character in a language. They are an element
of the POSIX regular expression specification, and
are not widely supported.
For example, the Welsh alphabet 4 has a number of 4: Read more about the
Welsh alphabet here.
digraphs that are treated as a single letter (marked
with a * below)
1 a b c ch d dd e f ff g ng h i j l ll m n o p
ph r rh s t th u w y
2 * * * *
* * *
5: a collating symbol will
Assuming the locale file defines it 5 , the collating
only work if it is defined in
symbol [[.ng.]] is treated like a single character. the current locale
Likewise, a single character expression like . or [^a]
will also match "ff" or "th." This also affects sorting,
so that [p-t] will include the digraphs "ph" and
"rh" in addition to the expected single letters.
A collating symbol represents a set of characters
which are considered as a single unit for collat-
ing (sorting) purposes; for example, "ch"/"Ch" or
"ss" (these are only valid in locales which define
them);
Equivalence Classes
An equivalence class groups characters which are
equivalent for collating purposes; for example, "a"
and "à" (and other accented variants).
272 6 Pattern Matching
[[=a=]] is an equivalence class that matches the
character "a" and all its accented variants, such as
aªáàâãäå, etc.
Collating symbols and equivalence classes are used
in locale definitions to encode complex ordering
information and are not implemented in some reg-
ular expression engines. We will not discuss these
in depth.
Escape Sequences
Along with the named character classes, there are
some escape sequences that can be used in regular
expressions. These are:
▶ \b - Matches a word boundary; that is it
matches if the character to the left is a “word”
character and the character to the right is a
“non-word” character, or vice-versa. It does
not match any character, but matches the
empty string that marks the word delimition.
▶ \B - Matches a non-word boundary; that is
it matches if the characters on both sides
are either “word” characters or “non-word”
characters.
▶ \< - Matches the start of a word only.
▶ \> - Matches the end of a word only.
▶ \d - Matches a digit character. Equivalent to
[0-9].
▶ \D - Matches a non-digit character. Equivalent
to [^0-9].
▶ \s - Matches a whitespace character. Equiva-
lent to [[:space:]].
▶ \S - Matches a non-whitespace character. Equiv-
alent to [^[:space:]].
▶ \w - Matches a word character. Equivalent to
[[:alnum:]_].
6.3 Regular Expressions 273
▶ \W - Matches a non-word character. Equivalent
to [^[:alnum:]_].
▶ \‘ - Matches the start of pattern space if mul-
tiline mode is enabled.
▶ \’ - Matches the end of pattern space if mul-
tiline mode is enabled.
Other than these, other escape characters are present
to match special non-graphical characters such as
newline, tab, etc. These are GNU extensions and
are not defined in the original POSIX standard.
▶ \a - Matches the alert character (ASCII 7).
▶ \f - Matches the form feed character (ASCII
12).
▶ \n - Matches the newline character (ASCII 10).
▶ \r - Matches the carriage return character
(ASCII 13).
▶ \t - Matches the tab character (ASCII 9).
▶ \v - Matches the vertical tab character (ASCII
11).
▶ \0 - Matches the null character (ASCII 0).
▶ \cx - Matches the control character x. For ex-
ample, \cM matches the carriage return char-
acter. This converts lowercases to uppercase
then flips the bit-6 of the character.
▶ \xxx - Matches the character with the hex
value 𝑥𝑥 .
▶ \oxxx - Matches the character with the octal
value 𝑥𝑥𝑥 .
▶ \dxxx - Matches the character with the deci-
mal value 𝑥𝑥𝑥 .
You can also read more about the locale issues in
regex here.
274 6 Pattern Matching
6.3.3 Anchors
Anchors are used to match a position in the text,
rather than a character. The following are the an-
chors that can be used in regular expressions:
▶ - Matches the start of a line.
^
▶ - Matches the end of a line.
$
▶ \b - Matches a word boundary.
▶ \B - Matches a non-word boundary.
▶ \< - Matches the start of a word.
▶ \> - Matches the end of a word.
▶ \‘ - Matches the start of the pattern space if
multiline mode is enabled.
▶ \’ - Matches the end of the pattern space if
multiline mode is enabled.
These anchors are used to match the position in
the text, rather than the character. For example, the
regex ^a will match the character a only if it is at
the start of the line. It does not match any other
character other than the a. However, if there are
not a present at the start of the line, then nothing is
xxd is a command that is
used to convert a file to a matched at all.
hex dump. It is used here to
1 $ echo "apple" | grep -o ’^a’
show the output in a more
a
readable format. 61 is the hex 2
value of the character a. 0 𝑎 is 3 $ echo "apple" | grep -o ’^a’ | xxd
the hex value of the newline 4 00000000: 610a
character. The lack of out- a.
put in the second case means 5 $ echo "banana" | grep -o ’^a’
that nothing is matched, and
6 $ echo "banana" | grep -o ’^a’ | xxd
no bytes are output.
The anchors ^, $, and \b are very useful in most of
the text processing tasks. The ^ and $ when used
together can match the entire line, meaning that
the pattern between them is not matched if it is a
substring of the line; it only matches if the pattern
is the entire line. The \b is used to surround the
6.3 Regular Expressions 275
pattern if we want to match the pattern as a word,
and not as substring of a word.
1 $ echo "apple" | grep -o ’^apple$’
2 apple
3 $ echo "apple is great" | grep -o ’^apple$’
4 $ echo "apple is great" | grep -o ’\bapple\b’
5 apple
6 $ echo "i like pineapple" | grep -o ’\bapple\b
’
Observe that even though we are using the -o flag,
the entire word is printed in a single line. This is
because the -o flag prints only the matches and
prints them on separate lines. However, unlike the
previous cases where we were using character lists,
here the entire word is a single match, and thus is
printed on a single line.
6.3.4 Quantifiers
Quantifiers are used to match a character or a group
of characters multiple times. This is useful if we do
not know the exact number of times a character or
group of characters will be repeated or its length.
Paired with a character list, it makes regex very
powerful and able to match any arbitrary pattern.
The following are the quantifiers that can be used
in regular expressions:
▶ * - Matches zero or more occurrences of the
previous character.
▶ \+ - Matches one or more occurrences of the
previous character.
▶ \? - Matches zero or one occurrence of the
previous character.
▶ {n} - Matches exactly n occurrences of the
previous character.
276 6 Pattern Matching
▶ {n,} - Matches n or more occurrences of the
previous character.
▶ {,n} - Matches n or less occurrences of the
previous character.
▶ {n,m} - Matches n to m occurrences of the
previous character, both ends inclusive.
Note that + and ? are not part of the BRE standard,
but are part of the ERE standard. However, most
text processing utilities support them in BRE mode
as well if escaped.
1 $ echo -n "aaaaaaaaaaaaaa" | wc -c # there are
14 a’s
2 14
3 $ echo "aaaaaaaaaaaaaa" | grep -E "a{4}" -o #
the first 12 a’s are matched in groups of
four
4 aaaa
5 aaaa
6 aaaa
7 $ echo "aaaaaaaaaaaaaa" | grep -E "a{4,}" -o #
entire string is matched
8 aaaaaaaaaaaaaa
9 $ echo "aaaaaaaaaaaaaa" | grep -E "a{4,5}" -o
# maximal matching, first 10 a’s are
matched as groups of 5, then the last 4 a’
s are matched as group of four.
10 aaaaa
11 aaaaa
12 aaaa
13 $ echo "aaaaaaaaaaaaaa" | grep -E "a{,5}" -o
14 aaaaa
15 aaaaa
16 aaaa
Here we are using the -E flag
to enable ERE mode in grep. Let us also see how the *, + and ? quantifiers work.
If we are using the default
BRE mode, we need to es- 1 $ echo "There are there main quantifiers,
cape the +, ?, {, and }, (, ), which are asterisk (*), plus (+), and
| characters to use them. eroteme (?)." | grep "[^aeiou][aeiou]*" -o
2 T
6.3 Regular Expressions 277
3 he
4 re
5 a
6 re
7
8 t
9 he
10 re
11
12 mai
13 n
14
15 qua
16 n
17 ti
18 fie
19 r
20 s
21 ,
22
23 w
24 hi
25 c
26 h
27 a
28 re
29 a
30 s
31 te
32 ri
33 s
34 k
35
36 (
37 *
38 )
39 ,
40
41 p
42 lu
43 s
278 6 Pattern Matching
44
45 (
46 +
47 )
48 ,
49 a
50 n
51 d
52 e
53 ro
54 te
55 me
56
57 (
58 ?
59 )
60 .
This shows that the asterisk quantifier matches zero
or more occurrences of the previous character, here
we are matching for any pattern which does not
start with a vowel and has zero or more vowels
after it, thus the matching can keep on growing as
long as there are consequtive vowels. As soon as a
non-vowel is present, the previous match ends and
a new match starts.
Now compare and constrast the previous output
to the next output using the plus quantifier. The
lines with only a single character (non-vowel) will
no longer be present.
1 $ echo "There are there main quantifiers,
which are asterisk (*), plus (+), and
eroteme (?)." | grep "[^aeiou][aeiou]\+" -
o
2 he
3 re
4 a
5 re
6 he
6.3 Regular Expressions 279
7 re
8 mai
9 qua
10 ti
11 fie
12 hi
13 a
14 re
15 a
16 te
17 ri
18 lu
19 a
20 e
21 ro
22 te
23 me
Finally, observe how using the eroteme quantifier
will bring back the single character lines, but remove
the lines with more than a vowel.
1 $ echo "There are there main quantifiers,
which are asterisk (*), plus (+), and
eroteme (?)." | grep "[^aeiou][aeiou]\?" -
o
2 T
3 he
4 re
5 a
6 re
7
8 t
9 he
10 re
11
12 ma
13 n
14
15 qu
16 n
280 6 Pattern Matching
17 ti
18 fi
19 r
20 s
21 ,
22
23 w
24 hi
25 c
26 h
27 a
28 re
29 a
30 s
31 te
32 ri
33 s
34 k
35
36 (
37 *
38 )
39 ,
40
41 p
42 lu
43 s
44
45 (
46 +
47 )
48 ,
49 a
50 n
51 d
52 e
53 ro
54 te
55 me
56
57 (
6.3 Regular Expressions 281
58 ?
59 )
60 .
When mixed with character lists, quantifiers can be
used to match any arbitrary pattern. This makes
regular expressions very powerful and flexible.
1 $ echo "sometimes (not always) we use
parentheses (round brackets) to clarify
some part of a sentence (or phrase)." |
grep "([^)]\+)" -o
2 (not always)
3 (round brackets)
4 (or phrase)
Observe how ([^)]\+) matches any pattern that
starts with an opening parenthesis, followed by one
or more characters that are not a closing parenthe-
sis, and ends with a closing parenthesis. This lets
us all of the bracketed parts of a sentence, without
knowing how many such brackets exist, or what is
the length of each expression. This is pretty pow-
erful, and can be used in similar situations, such 6: Regular Expressions can
as extracting text from HTML tags 6 JSON strings, only match regular lan-
etc. guages, and not context-free
languages, context-sensitive
languages, or unrestricted
languages. This means that
6.3.5 Alternation they cannot be used to parse
HTML or XML files. How-
ever, for simple tasks such as
Alternation is used to match one of the multiple extracting text from tags, reg-
ular expressions can be used.
patterns. It is used to match multiple patterns in To explore community lore
a single regex. The syntax for alternation is regex1 on this topic, see this stack-
|regex2. The regex will match if either regex1 or overflow answer. To learn
regex2 is matched. more about the theoretical
aspects of regular expres-
sions, see Chomsky Hierar-
Alternation in BRE needs to be escaped, as it is not chy in Theory of Computa-
part of the standard. However, most text processing tion.
utilities support it in BRE mode if escaped.
282 6 Pattern Matching
1 $ echo -e "this line starts with t\nand this
starts with a\nwhereas this line starts
with w" | grep ’^t’
2 this line starts with t
3 $ echo -e "this line starts with t\nand this
starts with a\nwhereas this line starts
with w" | grep ’^t\|^a’
4 this line starts with t
5 and this starts with a
As seen above, the regex ^t\|^a matches any line
that starts with either t or a. This is very useful
when we want to match multiple patterns in a
single regex. Note that we have to mention the start
of line anchor both times, this is because alternation
has the lowest precedence, and thus the start of line
anchor is not shared between the two patterns.
Let us now see a more complex example of alterna-
tion similar to previous example of brackets.
1 $ echo "sometimes (not always) we use
parentheses (round brackets) or brackets [
square brackets] to clarify some part of a
sentence (or phrase)." | grep "([^)]\+)
\|\[[^\]\+\]" -o
2 (not always)
3 (round brackets)
4 [square brackets]
5 (or phrase)
Here we are matching phrases inside round OR
square brackets. Observe a few things here:
1. We need to escape the alternation operator |
as it is not part of the BRE standard.
2. We need to escape the square brackets []
when we want it to match literally as they
have special meaning in regex.
3. We need to escape the plus quantifier + as it
is not part of the BRE standard.
6.3 Regular Expressions 283
6.3.6 Grouping
Grouping is used to group multiple characters or
patterns together. This is useful when we want to
apply a quantifier to multiple characters or pat-
terns. The syntax for grouping is (regex). The regex
will match if the pattern inside the parentheses
is matched. The parenthesis will not be matched.
However, grouping is not present unescaped in
BRE, so if we want to match literal parenthesis
then we use (regex), and if we want to group the
regex without matching the parenthesis, then we
use (regex).
Let’s revisit one of the earlier examples of alter-
nation, and group the patterns inside the alterna-
tion.
1 $ echo -e "this line starts with t\nand this
starts with a\nwhereas this line starts
with w" | grep ’^t\|^a’
2 this line starts with t
3 and this starts with a
4 $ echo -e "this line starts with t\nand this
starts with a\nwhereas this line starts
with w" | grep ’^\(t\|a\)’
5 this line starts with t
6 and this starts with a
As evident from above, both the grouped and un-
grouped regexes match the same lines. But in the
grouped version, we do not have to repeat the
start of line anchor, and the regex is more readable.
Also, grouping is useful when we want to apply a
quantifier to the entire group.
Notice the subtly different
1 $ grep -E "([b-d]|[f-h]|[j-n]|[p-t]|[v-z]){2}" way of providing the string
-o <<< "this is a sentence" to the stdin of the grep com-
mand. We have covered here-
2 th
strings earler.
3 nt
4 nc
284 6 Pattern Matching
In this example, we are matching any two consec-
utive characters that are consonants. Here we are
not matching not vowels, rather we are explicitly
matching consonants which are lowercase. Thus
this will not match spaces, digits, or punctuations.
There is no direct way to match consonants in BRE,
so we have to list them explicitly and chain them
using alternations. However, if we want to match
two consonants consequtively we do not have to
list the entire pattern again, we can simply group it
and apply the {n} quantifier on it.
The biggest use-case of grouping is to refer to the
matched group later in the regex. This is called
backreferencing, and is very useful when we want
to match a pattern that is repeated later in the text.
1 $ grep -E "([b-d]|[f-h]|[j-n]|[p-t]|[v-z]){2}"
-o <<< "this is an attached sentence"
2 th
3 tt
4 ch
5 nt
6 nc
Observe in this similar example, where the input
string now has the word attached in it. One of
the matched pattern is tt. But if we want to only
list those consonants groups that use the same
consonant, like tt? Then we require to use backref-
erencing.
6.3.7 Backreferences
Backreferences are used to refer to a previously
matched group in the regex. This is useful when we
want to match a pattern that is repeated later in the
text. The syntax for backreferencing is \n, where n
is the number of the group that we want to refer
6.3 Regular Expressions 285
to. The group is implicitly numbered by counting
occurrences of (...) left-to-right.
To accomplish the previous example, we use the
backreference \1 to match only the same consonant,
and not repeat the match using {n}.
1 $ grep -E "([b-d]|[f-h]|[j-n]|[p-t]|[v-z])\1"
-o <<< "this is an attached sentence"
2 tt
Backreferences are also useful in tools such as sed
and awk where we can replace the string with
another string and can use the matched group and
in the replacement string.
For example, if we want to make the first letter of
either apple or banana uppercase, we can use the
following command.
1 $ echo "apple & banana" | sed -E ’s/\<([ab])/\
U\1/g’
2 Apple & Banana
Here we are using the sed command to replace
the matched pattern with the uppercase version
of the first character. The \1 is used to refer to
the first matched group, which is the first charac-
ter of the word. The syntax of sed is s/pattern/
replacement/flags, where pattern is the regex to
match, replacement is the string to replace the
matched pattern with, and flags are the flags to
apply to the regex. The replacement string has to
be a string, and not a regex, however, we can use
backreferences in the replacement string. The \U is
used to convert the matched group to uppercase.
The g flag is used to replace all occurrences of the
pattern in the line. The \< is used to match the start
of the word, otherwise the a inside banana will also
be capitalized. We will cover sed in details in later
chapters.
286 6 Pattern Matching
Let us use backreferences to find three letter palin-
dromes in a string. You should have a dictionary of
7: If your distribution does words in /usr/share/dict/words. 7
not have the /usr/share
/dict/words file, you can 1 $ grep -E "^([a-z])[a-z]\1$" <(tr ’A-Z’ ’a-z’
download it from here. < /usr/share/dict/words ) | tail -n30
Here we are preprocessing 2 rsr
the dictionary to convert all
3 rtr
the words to lowercase us-
sas
ing the tr command. Then 4
we are passing the output of 5 sbs
tr to grep to find the lines 6 scs
that match the pattern ^([ 7 sds
a-z])[a-z]1$. This pat- 8 ses
tern matches any three let- 9 sis
ter palindrome. The ^ and $
10 sls
are used to match the start
11 sms
and end of the line respec-
tively. The ([a-z]) is used12 sos
to match any lowercase letter,13 sps
and the 1 is used to match14 srs
the same letter as the first15 sss
matched group. This is used
16 sts
to match the palindrome. Fi-
17 sus
nally, as the output is too
large, we are using the tail18 svs
command to print only the19 sws
last 50 lines. We have used20 sxs
process substitution as well21 tat
as pipes in this, so the flow of22 tct
data is not strictly left to right.
23 tet
Revise the previous chapters
and try to understand how24 tft
the data is flowing. What dif-25 tgt
ference would it make if we26 tit
replaced <(tr with < <(tr,27 tyt
in the internal workings and28 tkt
the output of the command?
29 tnt
30 tot
31 tpt
32 trt
33 tst
34 tut
35 twt
36 txt
6.4 Extended Regular Expressions 287
37 ulu
38 umu
39 upu
40 uru
41 usu
42 utu
43 vav
44 viv
45 waw
46 wnw
47 wow
48 wsw
49 xix
50 xxx
51 zzz
6.4 Extended Regular Expressions
Throughout the chapter, we have noticed that some
regex syntax are not really supported in BRE stan-
dard, and to use them in most text processing appli-
cations when using BRE, we have to escape them.
Explicitly, the characters that are not supported in
BRE but are supported in ERE are as follows.
▶ + - In BRE, it would match the literal plus sign
if not escaped. Otherwise it is a quantifier of
the previous character or group, making it
one or more.
▶ ? - In BRE, it would match the literal eroteme
sign if not escaped. Otherwise it is a quantifier
of the previous character or group, making it
zero or one, not more.
▶ ( and ) - The parenthesis match literal paren-
thesis in the data in BRE if unescaped, other-
wise are used to group regular expressions.
288 6 Pattern Matching
▶ { and } - The curly braces match literal curly
braces in the data in BRE if unescaped, other-
wise are used to specify the number of times
the previous character or group is repeated.
▶ | - The pipe symbol matches the literal pipe
symbol in the data in BRE if unescaped, oth-
erwise is used for alternation.
To use these seven characters with their special
meaning directly, without escaping, we can use
extended regular expressions.
Definition 6.4.1 (POSIX-Extended Regular Ex-
pressions) Extended regular expressions (EREs)
are a variant of regular expressions that sup-
port additional features and syntax. EREs are
supported by several command line utilities in
Linux, including grep, sed, and awk.
In cases where we want to use these symbols for
8: as defined by POSIX-ERE their special meaning 8 instead of as a literal char-
standard
acter, we can use the -E flag in grep to enable ERE
mode. This will allow us to use these symbols
without escaping them. This makes the regular
expression easier to read and understand. This is
useful since it is less likely that we want to match
these symbols literally and more likely that we want
to use them for their special meaning.
However, in cases where we want to match these
symbols literally, we can escape them using the back-
slash \ if using Extended Regular Expressions.
Thus the action of escaping switches between the
two modes, and the -E flag is used to enable ERE
mode in grep.
When we want to only match the symbols literally,
it might thus be better to use BRE, as it is more strict
6.4 Extended Regular Expressions 289
and less likely to match unintended patterns.
The following table (Table 6.1) shows when to escape
the character in which mode.
Table 6.1: Differences between BRE and ERE
Use Literal Symbol Use ERE Special Syntax
BRE + \+
ERE \+ +
Let us also demonstrate this using an example.
When we use grep without
1 $ echo -e "a+b\naapple\nbob" > demo.txt the -E flag, it uses BRE by de-
2 $ cat demo.txt fault. We have to escape the
3 a+b + symbol to use its special
meaning. However, when we
4 aapple
use the -E flag, we can use
5 bob
the + symbol directly with-
6 $ grep ’a+’ demo.txt # matches literally out escaping it when using
7 a+b as a quantifier. However, if
8 $ grep ’a\+’ demo.txt # uses special meaning we want to match the symbol
9 a+b literally, we need to escape it
in ERE but not in BRE. In this
10 aapple
example, when matching the
11 $ grep -E ’a+’ demo.txt # uses special meaning
+ symbol literally, we get only
12 a+b one line of output, which
13 aapple contains the literal symbol
14 $ grep -E ’a\+’ demo.txt # matches literally +. When using + as a quan-
15 a+b tifier, we get both the lines,
since it means one or more
a, and both lines have one
or more a. The line bob is
never printed, as it does not
contain any a characters.
290 6 Pattern Matching
6.5 Perl-Compatible Regular
Expressions
While POSIX has defined the BRE and ERE stan-
9: The Portable Operating
System Interface is a fam-
dards, Perl has its own regular expression engine
ily of standards specified by that is more powerful and flexible. POSIX 9 specifi-
the IEEE Computer Society cations are meant to be portable amongst different
for maintaining compatibil- flavors of Unix and other languages, thus most pro-
ity between operating sys-
tems. POSIX defines both the
gramming languages also support BRE or ERE, or
system and user-level appli- a similar superscript of them.
cation programming inter-
faces (APIs), along with com-
mand line shells and utility However, Perl has its own regular expression engine
interfaces, for software com-
patibility (portability) with
that is more powerful and flexible. Perl-Compatible
variants of Unix and other Regular Expressions (PCRE) is a project written in C
operating systems. POSIX is inspired by the Perl Regex Engine. Although PCRE
also a trademark of the IEEE. originally aimed at feature equivalence with Perl
POSIX is intended to be used
by both application and sys-
Regex, the two are not fully equivalent. To study
tem developers. the nuanced differences between PRE and PCRE,
you can go through the Wikipedia page.
10: Python and Ruby sup-
port PCRE through external PCRE is way more powerful than ERE, with some
libraries.
additional syntax and features. It is supported by
some programming languages, including Perl, PHP,
Python 10 , and Ruby. It is also supported by some
text processing utilities, like grep, but not by sed,
and awk.
Will we not dive deep into PCRE, as it is a vast
topic and is not supported by most text process-
ing utilities. However, feel free to explore PCRE
online.
Some of the features of PCRE are:
6.5 Perl-Compatible Regular Expressions 291
6.5.1 Minimal Matching (a.k.a.
"ungreedy")
A ? may be placed after any repetition quantifier
to indicate that the shortest match should be used.
The default is to attempt the longest match first
and backtrack through shorter matches: e.g. a.*?
b would match first "ab" in "ababab", where a.*b
would match the entire string.
If the U flag is set, then quantifiers are ungreedy
(lazy) by default, while ? makes them greedy.
6.5.2 Multiline matching
^ and $ can match at the beginning and end of a
string only, or at the start and end of each "line"
within the string, depending on what options are
set.
6.5.3 Named subpatterns
A sub-pattern (surrounded by parentheses, like
(...)) may be named by including a leading ?P<
name> after the opening parenthesis. Named subpat-
terns are a feature that PCRE adopted from Python
regular expressions.
This feature was subsequently adopted by Perl, so
now named groups can also be defined using ?<
name>...)| or \lstinline|?’name’...)|, as well as
(?P<name>...).
Named groups can be backreferenced with, for
example: (?P=name) (Python syntax) or \k’name’
(Perl syntax).
292 6 Pattern Matching
6.5.4 Look-ahead and look-behind
assertions
This is one of the most useful features of PCRE.
Patterns may assert that previous text or subsequent
text contains a pattern without consuming matched
text (zero-width assertion). For example, /\w+(?=\
t)/ matches a word followed by a tab, without
including the tab itself.
Look-behind assertions cannot be of uncertain length
though (unlike Perl) each branch can be a different
fixed length. \K can be used in a pattern to reset the
start of the current whole match. This provides a
flexible alternative approach to look-behind asser-
tions because the discarded part of the match (the
The regex matches either the part that precedes \K) need not be fixed in length.
left bound of a word (<), the
right bound of a word (>), So, the word boundary match \b can be emulated
the start of line anchor (^), using look-ahead and look-behind assertions: (?<=\
or the end of line anchor ($).
W)(?=\w)|(?<=\w)(?=\W)|^|$
Figure 6.1: Positive and Neg-
ative Look-ahead and look-
behind assertions
6.5.5 Comments
A comment begins with (?# and ends at the next
closing parenthesis.
6.5 Perl-Compatible Regular Expressions 293
6.5.6 Recursive patterns
A pattern can refer back to itself recursively or to
any subpattern. For example, the pattern \((a*|(?
R))*\) will match any combination of balanced
parentheses and "a"s.
294 6 Pattern Matching
6.6 Other Text Processing Tools
Now that we have discussed the basics of regular
expressions, let us see how we can use them in
some text processing utilities. We will discuss the
following text processing utilities:
▶ - Translate characters.
tr
▶ cut- Cut out fields (columns) from a line.
▶ grep - Search for patterns in a file.
▶ sed - Stream editor - search and replace, insert,
select, delete, translate.
▶ awk - A programming language for text pro-
cessing.
However, these are not all the text processing tools
that exist. There are many other text processing
utilities that are used in Unix-like systems. Some of
them are:
▶ rg - ripgrep - A search tool that combines
the usability of The Silver Searcher (ag) with
the raw speed of grep. Useful to find files
recursively in a directory or git repository.
▶ fzf - Fuzzy Finder - A command-line fuzzy
finder. Useful to search for files and directo-
ries even when you might not know a valid
substring of the text. It works by searching for
subsequences instead of substrings and other
fuzzy search logic. It is extremely powerful
when paired with other applications as an
interactive select menu.
▶ csvlens - A tool to view and query CSV files.
It is useful to view and query CSV files in a
tabular format. It can search for data in a CSV
using regex.
▶ pdfgrep - A tool to search for text in PDF files.
It is useful to search for text in PDF files. It
6.6 Other Text Processing Tools 295
can search for text in a PDF using regex. It
also supports PCRE2.
There are other useful utilities which are not part of
GNU coreutils, but are very useful in text processing.
Feel free to find such tools, install them, and play
around with them. However, we wont be able to
discuss those in detail here.
6.6.1 tr
tr is a command that is used to translate characters.
It is used to replace characters in a string with other
characters. It is useful when we want to replace
a list of characters with another list of characters,
or remove a character from a string. It is trivial to
create rotation ciphers using tr.
Without any flags, tr takes two arguments, LIST1
and LIST2, which are the two list of characters. LIST1
is the source map and LIST2 is the destination map
of the translation. A character can only map to a
single character in a translation, however multiple
character can map to the same character. That is, it
can be many-to-one, but not one-to-many.
A simple example of tr is to convert a single char-
acter to another character.
1 $ echo "hello how are you" | tr ’o’ ’e’
2 helle hew are yeu
Figure 6.2: Many-to-one
However, the true power of tr is when we use mapping
character lists. We can use character lists to replace
multiple characters with other characters.
1 $ echo "hello how are you" | tr ’a-z’ ’A-Z’
2 HELLO HOW ARE YOU
296 6 Pattern Matching
tr can also toggle case of characters, that is, the
LIST1 and LIST2 can have common characters.
1 $ echo "Hello How Are You" | tr ’a-zA-Z’ ’A-Za
-z’
2 hELLO hOW aRE yOU
We do not need to surround the character ranges in
brackets.
Ciphers
Definition 6.6.1 (Cipher) A cipher is an algo-
rithm for performing encryption or decryp-
tion—a series of well-defined steps that can
be followed as a procedure. An alternative, less
common term is encipherment. To encipher or
encode is to convert information into cipher or
code. In cryptography, encryption is the process
of encoding information. This process converts
the original representation of the information,
known as plaintext, into an alternative form
known as ciphertext. Ideally, only authorized
parties can decipher a ciphertext back to plain-
text and access the original information.
One of the common ciphers are rotation ciphers,
Figure 6.3: Caesar Cipher where each character is replaced by another char-
acter that is a fixed number of positions down the
11: This is special because alphabet. This is also known as the Caesar cipher,
the shift of 13 is the same for named after Julius Caesar, who is said to have used
both encoding and decoding.
This is because the English
it to communicate with his generals.
alphabet has 26 characters,
and 13 is half of 26. Thus, if
If the shift is 13 11 then the cipher is called ROT13. It
we shift by 13, we will get is a simple letter substitution cipher that replaces
the same character when we a letter with the 13th letter after it in the alphabet.
shift back by 13. It is thus an ROT13 is a special case of the Caesar cipher which
involution, that is, applying
it twice will give the original
was developed in ancient Rome.
text.
6.6 Other Text Processing Tools 297
Let us try to implement ROT13 using tr.
1 $ echo "hello how are you?" | tr ’a-zA-Z’ ’n-
za-mN-ZA-M’
2 uryyb ubj ner lbh?
3 $ echo "uryyb ubj ner lbh?" | tr ’a-zA-Z’ ’n-
za-mN-ZA-M’
4 hello how are you?
Observe how running the output of the cipher
through the same cipher gives us back the original
plain-text. This is because ROT13 is an involution
cipher.
We can concatenate multiple ranges of characters in
the character-lists as seen above, tr simply converts
each character in LIST1 to its corresponding charac-
ter in LIST2. The length of both the lists should thus
be same. However, if the LIST2 is smaller than LIST1,
tr will simply repeat the last character of LIST2 as
many times as required to make both the lists same
length.
1 $ echo "abcdefghijklmnopqrstuvwxyz" | tr ’a-z’
’1’
2 11111111111111111111111111
3 $ echo "abcdefghijklmnopqrstuvwxyz" | tr ’a-z’
’12’
4 12222222222222222222222222
5 $ echo "abcdefghijklmnopqrstuvwxyz" | tr ’a-z’
’123’
6 12333333333333333333333333
7 $ echo "abcdefghijklmnopqrstuvwxyz" | tr ’a-z’
’1234-9’
8 12345678999999999999999999
The characters in the list are treated as characters,
and not digits, thus we cannot replace a character
with a multi-digit number. the range 1-26 will not
result to 26 numbers from 1 to 26, rather, it results
in three numbers: 1, 2, and 6. Similarly 1-72 means
298 6 Pattern Matching
1 , 2 , 3 , 4 , 5 , 6 , 7 , 2.
1 $ echo "abcdefghijklmnopqrstuvwxyz" | tr ’a-z’
’1-26’
2 12666666666666666666666666
3 $ tr ’a-z’ ’1-72’ <<< "
abcdefghijklmnopqrstuvwxyz"
4 12345672222222222222222222
We can also repeat a character any arbritrary num-
ber of times by using the * character in the LIST2
inside square brackets.
1 $ echo "abcdefghijklmnopqrstuvwxyz" | tr ’a-z’
’1-4[5*10]7’
2 12345555555555777777777777
Here the repeat number is actually treated as a
number, and thus multi-digit numbers (as shown
above) can be used to repeat the character any
number of times.
trcan perform any cipher that does not depend on
additional memory or state.
Deletion
tr can also delete or drop characters from a string
of characters. The flag -d is used to delete characters
from the input string. The syntax is ls -d ’LIST1’,
where LIST1 is the list of characters to delete.
1 $ echo "hi! hello how are you?" | tr -d ’aeiou
’
2 h! hll hw r y?
Here we are deleting all the vowels from the input
string.
We can also use ranges to delete characters.
6.6 Other Text Processing Tools 299
1 $ echo "hi! hello how are you?" | tr -d ’a-m’
2 ! o ow r you?
Here we are deleting all the characters from a to
m.
Sometimes the characters we want to delete is a
lot, and its easier to specify the characters we want
to keep. We can use the -c flag to complement the
character list, that is, to keep the characters that are
in the list.
1 $ echo "hi! hello how are you?" | tr -cd ’a-m’
2 hihellhae
Here we are keeping only the characters from a
to m. Observe that it also deletes the punctuations,
spaces and the newline characters as well.
This is useful if we want to filter out only some
characters from a stream of random characters, 12: It is not recommended to
for example when trying to generate a random 12 use /dev/random instead
password. of /dev/urandom as it will
block if there is a lack of
1 $ tr -cd ’a-zA-Z0-9’ < /dev/urandom | head -c enough entropy. We should
20 ; echo totally avoid using RANDOM
2 8JOzmr4BUbho6wDPaipT variable for cryptographic
works since it is not crypto-
This uses the /dev/urandom file to generate random graphically secure random
number. Computers cannot
characters, and then filters out only the alphanu- really generate random num-
meric characters. The head -c 20 is used to print bers, since they are determin-
only the first 20 characters, and the echo is used to istic. Most random number
print a newline after the password. generators simply use exter-
nal variables like the volt-
age, temperature, and micro-
phone noise to simulate ran-
Squeeze domness. This may seem ran-
dom to humans but is not
cryptographically secure for
Finally, tr can also be used to squeeze characters, use in generating passwords.
that is, to replace multiple consequtive occurrences There are more secure al-
of a character with a single occurrence. The -s gorithms to generate pass-
words. Read more about it
flag is used to squeeze characters. It also takes a
here.
300 6 Pattern Matching
single argument, which is the list of characters to
squeeze.
Use single quotes for this
example, and not double 1 $ echo ’Hello!!!!! Using multiple punctuations
quotes, as !! has special is not only gramatticaly incorrect but
meaning in double quotes. It also obnoxious!’ | tr -s ’!’
expands to the last run com-
2 Hello! Using multiple punctuations is not only
mand. This is useful when
gramatticaly incorrect but also obnoxious
you write a long command
and forget to use sudo, in- !
stead of typing the entire
thing again, or using arrow
keys, simply type sudo !!
to expand the !! with the 6.6.2 cut
entire previous command.
If you want to extract a certain column, or a cer-
tain range of columns from a structured text file,
doing so using regular expressions can be a bit
cumbersome.
1 $ cat data.csv
2 name,age,gender
3 Alice,18,F
4 Bob,32,M
5 Carla,23,F
6 $ grep -P ’[^,]*,\K[^,]*(?=,[^,]*)’ data.csv -
o
7 age
8 18
9 32
10 23
As seen above, we can use regular expressions to
extract the second column from a CSV file. However,
this is not very readable, and can be cumbersome for
large files with a lot of columns. This also requires
using PCRE to avoid matching the lookbehind and
the lookaheads. However, this can be done in an
easier manner. This is where the cut command
comes in.
Using cut, this operation becomes trivial.
6.6 Other Text Processing Tools 301
1 $ cat data.csv Lets try to parse the
2 name,age,gender PCRE regex [^,]*,\K
3 Alice,18,F [^,]*(?=,[^,]*) to
4 Bob,32,M extract the second column
from a CSV file.
5 Carla,23,F
6 $ cut -d, -f2 data.csv The first part, [^,]*,
7 age matches the first column,
8 18 and the \K is used to reset
9 32 the start of the match. This
10 23 ensures that the first column
is present, but not matched
and thus not printed.
The -d flag is used to specify the delimiter, and the
-f flag is used to specify the field. The delimiter is The second part, [^,]*
the character that separates the fields, and the field matches the second column.
is the column that we want to extract. The fields are We are matching as many
non-comma characters as
numbered starting from 1.
possible.
cut can also extract a range of columns.
The (?=,[^,]*) is a
1 $ cat data.csv lookahead assertion, and
is used to match the third
2 name,age,gender
column. This ensures that
3 Alice,18,F
it matches only the second
4 Bob,32,M column, and no other
5 Carla,23,F column. It will ensure a
6 $ cut -d, -f1,3 data.csv third column is present,
7 name,gender but not match it, thus not
8 Alice,F printing it.
9 Bob,M
We had to use \K and
10 Carla,F we could not use a
11 $ cut -d, -f2-3 data.csv lookbehind assertion, as
12 age,gender lookbehind assertions are
13 18,F fixed length, and we do not
14 32,M know the length of the first
column. We could have used
15 23,F
a lookbehind assertion if we
16 $ cut -d, -f2- data.csv knew the length of the first
17 age,gender column.
18 18,F
19 32,M
20 23,F
We can also mention disjoint sets of columns or
ranges of columns separated by commas.
302 6 Pattern Matching
The range is inclusive, that 1 $ cut -d: -f1,5-7 /etc/passwd | tail -n5
is, it includes the start and 2 dhcpcd:dhcpcd privilege separation:/:/usr/bin/
end columns. nologin
3 redis:Redis in-memory data structure store:/
If the start of the range is
absent, it is assumed to be var/lib/redis:/usr/bin/nologin
the first column. 4 saned:SANE daemon user:/:/usr/bin/nologin
5 tor::/var/lib/tor:/usr/bin/nologin
If the end of the range 6 test1::/home/test1:/usr/bin/bash
is absent, it is assumed to be
the last column.
Exercise 6.6.1 Now that you are familiar with
This lets us extract
columns even if we do
the /etc/passwd file, try to extract the usernames
not know the number of and the home directories of the users. The user-
columns in the file. names are the first field, and the home directo-
Here we are using the / ries are the sixth field. Use the cut command to
etc/passwd file as an ex-
extract the usernames and the home directories
ample. The /etc/passwd
file is a text file that con-
of the users.
tains information about the
users on the system. It con-
tains information like the
username, user ID, group Delimiter
ID, home directory, and shell.
The file is a colon-separated
file, where each line contains
The default delimiter for cut is the tab character.
information about a single However, we can specify the delimiter using the
user. The fields are separated -d flag. Although it is not required to quote the
by colons, and the fields are delimiter, certain characters might be apprehended
the username, password (it
is not stored, so it is always
by the shell and not passed as-is to the command,
x), user ID, group ID, user hence it is always best practice to quote the delimiter
information (this is usually using single-quotes. The delimiter has to be a single
used by modern distrbutions
character, and cannot be more than one character.
to store the user’s full name),
home directory, and shell.
The input delimiter can only be speficied if splitting
The file is readable by all
users, but only writable by the file by fields, that is, when we are using -f flag
the root user. The file is used with a field or range of fields.
by the system to authenticate
users and to store informa-
tion about the users on the
Output Delimiter
system.
The output delimiter by defalut is the same as
the input delimiter. However, we can specify the
6.6 Other Text Processing Tools 303
output delimiter using the --output-delimiter flag.
The output delimiter can be a string, and not just
a single character. This is useful when we want to
change the delimiter of the output file.
cut, like most coreutils, will
1 $ head -n1 /etc/passwd | cut -d: -f5- --output read the data from standard
-delimiter=, input (stdin) if no file is spec-
2 root,/root,/usr/bin/bash ified. This is useful when we
want to pipe the output of
3 $ head -n1 /etc/passwd | cut -d: -f5- --output
another command to cut.
-delimiter=,,
4 root,,/root,,/usr/bin/bash
Character Range
The -b flag is used to extract bytes from a file. The
bytes are numbered starting from 1. The byte range
is inclusive, that is, it includes the start and end
bytes. If the start of the range is absent, it is assumed
to be the first byte. If the end of the range is absent,
it is assumed to be the last byte.
If we are working with a file with multi-byte char-
acters, the -b flag will not work as expected, as it
will extract bytes, and not characters. Then we can
use the -c flag to extract characters from a file. The
characters are numbered starting from 1. However,
in GNU cut, the feature is not yet implemented. If
you are using freebsd cut then the difference can
be observed.
1 $ head -n1 /etc/passwd | cut -b5-10
2 :x:0:0
3 $ head -n1 /etc/passwd | cut -c5-10
4 :x:0:0
Complement
Sometimes its easier to specify the fields we want
to drop, rather than the fields we want to keep. We
304 6 Pattern Matching
can use the --complement flag to drop the fields we
specify.
Recall that the second field of the /etc/passwd file is
the password field, which is always x. We can drop
this field using the --complement flag.
1 $ head -n1 /etc/passwd
2 root:x:0:0:root:/root:/usr/bin/bash
3 $ head -n1 /etc/passwd | cut -d: --complement
-f2
4 root:0:0:root:/root:/usr/bin/bash
Only Delimited Lines
Sometimes we have a semi-structured file, where
some lines are delimited by a character, and some
are not. We can use the --only-delimited flag to
print only the lines that are delimited by the delim-
iter.
Note that if we mention
a field number that is not 1 $ cat data.csv
present in the line, cut will
2 name,age,gender
simply print nothing. If we
3 Alice,18,F
print fields 1 to 3, and there
are only 2 fields, it will print 4 Bob,32,M
only first two fields, etc. 5 Carla,23,F
6 # This is a comment
7 $ cut -d, -f1,3 data.csv
8 name,gender
9 Alice,F
10 Bob,M
11 Carla,F
12 # This is a comment
13 $ cut -d, -f1,3 data.csv --only-delimited
14 name,gender
15 Alice,F
16 Bob,M
17 Carla,F
Cut is a very handy tool for text processing, we will
be using it extensively in the upcoming chapters.
6.6 Other Text Processing Tools 305
6.6.3 paste
paste is a very simple command that is used to
merge lines of files. It is used to merge lines of files
horizontally, that is, to merge lines from multiple
files into a single line. It is useful when we want
to merge lines from multiple files into a single line,
and separate them by a delimiter. It can also be
used to join one file into a single line separated by
a delimiter.
1 $ cat file1.txt
2 hello world
3 this is file1
4 $ cat file2.txt
5 this is file2
6 and it has more lines
7 than file1
8 $ paste file1.txt file2.txt
9 hello world this is file2
10 this is file1 and it has more lines
11 than file1
The default delimiter is the tab character, however
we can specify the delimiter using the -d flag.
1 $ paste -d: file1.txt file2.txt
2 hello world:this is file2
3 this is file1:and it has more lines
4 :than file1
Paste can also be used to merge lines from a single
file into a single line. The -s flag is used to merge
lines from a single file into a single line. We can
specify the delimiter for this as well using the -d
flag.
1 $ paste -d: -s file1.txt
2 hello world:this is file1
This is better than using tr ’\n’ ’:’ to replace the
newline character with a delimiter, as tr will also
306 6 Pattern Matching
replace the last newline character of the last line,
giving a trailing delimiter.
1 $ tr ’\n’ ’:’ < file1.txt
2 hello world:this is file1:
This is very helpful when we want to find the sum
of numbers in a file.
1 $ cat numbers.txt
2 1
3 4
4 2
5 6
6 7
7 2
8 $ paste -sd+ numbers.txt
9 1+4+2+6+7+2
10 $ paste -sd+ numbers.txt | bc
11 22
Here we are first using paste to merge the lines of
the file into a single line, separated by the + character.
We then pipe this to bc, which is a command line
calculator, to calculate the sum of the numbers.
6.6.4 fold
Just like we can use paste to merge lines of a file,
we can use fold to split lines of a file. fold is a
command that is used to wrap lines of a file. It is
used to wrap lines of a file to a specified width.
1 $ cat data.txt
2 123456789
3 $ fold -w1 data.txt
4 1
5 2
6 3
7 4
8 5
6.6 Other Text Processing Tools 307
9 6
10 7
11 8
12 9
13 $ fold -w2 data.txt
14 12
15 34
16 56
17 78
18 9
We can also force fold to break lines at spaces only,
and not in the middle of a word. However, if it
is not possible to maintain the maximum width
specified if breaking solely on spaces, it will break
on non-spaces as well.
1 $ cat text.txt
2 This is a big block of text
3 some of these lines can break easily
4 Whereas_some_are_to_long_to_break
5 $ fold -sw10 text.txt
6 This is a
7 big block
8 of text
9 some of
10 these
11 lines can
12 break
13 easily
14 Whereas_so
15 me_are_to_
16 long_to_br
17 eak
This is useful if you want to undo the operation of
tr -d ’\n’ performed on lines of equal width.
308 6 Pattern Matching
6.6.5 grep
grep is a command that is used to search for patterns
in a file. It is used to search for a pattern in a file,
and print the lines that match the pattern.
Grep has a lot of flags and features, and is a very
powerful tool for searching for patterns in a file. It
can search using BRE, ERE, and even PCRE.
We will discuss grep in detail in the next chapter.
6.6.6 sed
sed is a stream editor that is used to perform basic
text transformations on an input stream. It is used
to search and replace, insert, select, delete, and
translate text.
Sed is a sysadmin’s go to tool for perform quick
text transformations on files. It can also perform
the changes directly on the file, without needing to
write the changes to a new file.
We cover sed in detail in later chapters.
6.6.7 awk
awk is a programming language that is used for text
processing and data extraction. It is a very powerful
tool for text processing, and is used to extract and
manipulate data from files.
It has its own programming language, although
with very few keywords, and is very easy to learn.
We cover awk in detail in later chapters.
Grep 7
We have already seen and used grep throughout this
chapter while discussing regex. grep is a command
that is used to search for patterns in a file. It is used
to search for a pattern in a file, and print the lines
that match the pattern.
The name grep comes from the g/re/p command
in the ed editor. The ed editor is a line editor, and
the g/re/p command is used to search for a pattern
in a file, and print the lines that match the pattern.
The grep command is a standalone command that
is used to search for a pattern in a file, and print the
lines that match the pattern.
grep can use BRE or ERE, or even PCRE if the -P
flag is used. By default, grep matches using the BRE
engine.
1 $ grep "[aeiou]" -o <<< "hello how are you?"
2 e
3 o
4 o
5 a
6 e
7 o
8 u
1: There is also an exe-
7.1 Regex Engine cutable called egrep, which
is the same as grep -E. It
is provided for compatibility
However, grep can also use the ERE engine using with older Unix systems. It
is not recommended to use
the -E flag. This is useful when we want to use the
egrep, as it is deprecated
special characters like +, ?, (, ), {, }, and | without and not present in all sys-
escaping them. 1 tems. You should use grep
-E instead.
310 7 Grep
1 $ grep -E "p+" -o <<< "apple and pineapples"
2 pp
3 p
4 pp
Sometimes, it is required to not use regex at all, and
simply search for a string. We can use the -F flag
to search for a fixed string, and not a regex. This
will search for any line that has the substring. Any
symbol that has special meaning in regex will be
treated as a literal character.
1 $ grep -F "a+b*c=?" -o <<< "What is a+b*c=?"
2 a+b*c=?
This mode is useful if you want to match any arbri-
tary string in a file, and do not know what the string
is going to be, thus not allowing you to escape the
special characters.
1 $ cat data.txt
2 Hello, this is a file
3 with a lot of equations
4 1. a^2 + b^2 = c^2 for right angle triangles
5 2. E = mc^2
6 3. F = ma
7 4. The meaning of life, the universe, and
everything = 42
8 $ read -r -p "What to search for? " pattern
9 What to search for? ^2
10 $ grep "$pattern" data.txt
11 2. E = mc^2
12 $ grep -F "$pattern" data.txt
13 1. a^2 + b^2 = c^2 for right angle triangles
14 2. E = mc^2
In the above example, you can see a file full of
equations, thus special symbols. If we want to dy-
namically input what to search for, we can use the
read command to read the input from the user, and
then use grep to search for the pattern. If we use
7.2 PCRE 311
the -F flag, we can search for the pattern as a fixed
string, and not as a regex.
Here we wanted to find all the quadratic equations,
and thus searched for ^2. However, observe that
the grep, without the -F flag, did not match the first
equation which has multiple ^2 in it, because it is
treating ^ with its special meaning of start of line
anchor.
If we were statically providing the search string,
we can simply escape the special characters, and
use grep without the -F flag. But in cases like this,
it is easier to simply use the -F to not use regular
expression and simply find substrings.
7.2 PCRE
Similarly, there are situations where the ERE engine
is not powerful enough and we need to use the
PCRE engine. We can use the -P flag to use the
PCRE engine.
1 $ cat data.txt
2 Hello, this is a file
3 with a lot of equations
4 1. a^2 + b^2 = c^2 for right angle triangles
5 2. E = mc^2
6 3. F = ma
7 4. The meaning of life, the universe, and
everything = 42
8 $ grep -P "c\^2" data.txt
9 1. a^2 + b^2 = c^2 for right angle triangles
10 2. E = mc^2
11 $ grep -P "c\^2(?=.*triangle)" data.txt
12 1. a^2 + b^2 = c^2 for right angle triangles
Here, if we want to find all the equations with c^2,
but only if that equation also mentions "triangle"
312 7 Grep
somewhere after the c^2, then we can use lookahead
assertions of PCRE to accomplish that. Here, the
.* matches any character zero or more times, and
the triangle matches the string "triangle". Putting
it inside a lookahead assertion ((?= )) ensures that
the pattern is present, but not part of the match.
This can be confirmed by using the -o flag to print
only the matched part of the line.
1 $ grep -P "c\^2(?=.*triangle)" -o data.txt
2 c^2
7.3 Print Only Matching Part
We have been using the -o flag extensively to print
only the matching part of the line. This is useful
when we want to extract only the part of the line
that matches the pattern and not the entire line.
This is also very useful for debugging regex, as we
can see what part of the line is actually matching
the pattern.
If we do not use -o, then any line having one or
more match will be printed entirely, and the matches
2: grep will color the
match only if you pass the
will be colored red 2 ,however, if two consequtive
flag --color=always, matches are present, it becomes hard to distinguish
or if the flag is set to between the two matches.
--color=auto and the
terminal supports color If we use the -o flag, then only the matching part of
output. Otherwise no the line is printed, and each match is printed on a
ANSI-escape code is printed
by grep.
new line, making it easy to see exactly which parts
are matched.
You can also change
the color of the match by 1 $ grep -E "o{,3}" <<< "hellooooo"
setting the GREP_COLORS 2 hellooooo
environment variable. The 3 $ grep -Eo "o{,3}" <<< "hellooooo"
default color is red, but you 4 ooo
can change it to any color 5 oo
you want.
7.4 Matching Multiple Patterns 313
In the above example, we ask grep to match the
pattern o{,3}, that is, match the letter o zero to
three times. If we do not use the -o flag, then the
entire line is printed, and the matches are colored
red. This however creates a confusion, even though
all the 5 𝑜 ’s are matched, they cannot obviously be
a single match, since a single match can only be
of a maximum of 3 𝑜 ’s. So how are the matches
grouped? Is it 3 𝑜 ’s followed by 1 𝑜 followed by
another 𝑜 ? Is it 2 𝑜 ’s followed by 2 𝑜 ’s followed by
one 𝑜 ? There seems to be no way to tell from the
first output.
However, if we use the -o flag, then only the match-
ing part of the line is printed, and each match is
printed on a new line. It then becomes clear that
grep will greedily match as much as possible first,
so the first three 𝑜 ’s are matched, and then the
remaining are grouped as a single match.
7.4 Matching Multiple Patterns
7.4.1 Disjunction
If we want to match multiple patterns, we can use
the -e flag to specify multiple patterns. This is useful
when we want to match multiple patterns, and not
just a single pattern. Any line containing one or In this example, we want
more of the patterns we are searching for will be to find lines that match the
printed. This is like using an OR clause. word "file" or the word "life".
We can use the -e flag
1 $ cat data.txt to specify multiple patterns.
2 Hello, this is a file The -e flag is automatically
3 with a lot of equations implied if we are searching
for a single pattern, however,
4 1. a^2 + b^2 = c^2 for right angle triangles
if we are searching for more
5 2. E = mc^2 than one pattern, we have
6 3. F = ma specify it for all of the pat-
terns, including the first one.
314 7 Grep
7 4. The meaning of life, the universe, and
everything = 42
8 $ grep "file" data.txt
9 Hello, this is a file
10 $ grep -e "file" -e "life" data.txt
11 Hello, this is a file
12 4. The meaning of life, the universe, and
everything = 42
7.4.2 Conjunction
If we want to match lines that contain all of the
patterns we are searching for, we can pipe the output
of one grep to another, to do iterative filtering.
In this example, first we
show each of the pattern If we want to find lines which start with a number,
we are matching, and that and also contain the word "a", then we can use two
they output multiple lines. greps to accomplish this.
Then finally we combine
both the patterns using pipe 1 $ cat data.txt
to find only the common 2 Hello, this is a file
line in both the outputs. The
3 with a lot of equations
patterns can be specified
in either order. We need to 4 1. a^2 + b^2 = c^2 for right angle triangles
provide the file only in the 5 2. E = mc^2
first grep, and we should 6 3. F = ma
not provide a file name 7 4. The meaning of life, the universe, and
to any of the other greps, everything = 42
otherwise it will not use the
8 $ grep ’\ba\b’ data.txt
output of the previous grep
9 Hello, this is a file
as input and just filter from
the file instead. 10 with a lot of equations
11 1. a^2 + b^2 = c^2 for right angle triangles
Here the regex ^[0-9]12 $ grep ’^[0-9]’ data.txt
matches any line that13 1. a^2 + b^2 = c^2 for right angle triangles
starts with a number. And14 2. E = mc^2
the regex \ba\b matches
15 3. F = ma
any line that contains the
word "a". The \b is a word16 4. The meaning of life, the universe, and
boundary, and matches the everything = 42
start or end of a word. Thus17 $ grep ’\ba\b’ data.txt | grep ’[0-9]’
it will match the word "a"18 1. a^2 + b^2 = c^2 for right angle triangles
and not the letter "a" in a
word.
7.5 Read Patterns from File 315
7.5 Read Patterns from File
If we have a lot of patterns to search for, we can
put them in a file, and use the -f flag to read the
patterns from the file. Each line of the file is treated
as a separate pattern, and any line that matches any
of the patterns will be printed. The type of regex
engine used depends on whether -E, -P, or -F is
used.
1 $ cat data.txt
2 p+q=r
3 apple
4 e*f=g
5 $ cat pattern
6 p+
7 e*
8 $ grep -G -f pattern data.txt -o
9 p+
10 e
11 e
12 $ grep -F -f pattern data.txt -o
13 p+
14 e*
15 $ grep -E -f pattern data.txt -o
16 p
17 pp
18 e
19 e
7.6 Ignore Case
If we want to ignore the case of the pattern, we
can use the -i flag. This is useful when we want to
match a pattern, but do not care about the case of
the pattern, or we are not sure what the case is.
1 $ grep ’apple’ /usr/share/dict/words | head
316 7 Grep
2 appleberry
3 appleblossom
4 applecart
5 apple-cheeked
6 appled
7 appledrane
8 appledrone
9 apple-eating
10 apple-faced
11 apple-fallow
12 $ grep -i ’apple’ /usr/share/dict/words | head
13 Apple
14 appleberry
15 Appleby
16 appleblossom
17 applecart
18 apple-cheeked
19 appled
20 Appledorf
21 appledrane
22 appledrone
As seen above, the first grep matches only the lines
that contain the word "apple", and not "Apple".
However, the second grep matches both "apple"
and "Apple".
7.7 Invert Match
Sometimes its easier to specify the patterns we do
not want to match, rather than the patterns we want
to match. We can use the -v flag to invert the match,
that is, to print only the lines that do not match the
pattern.
1 $ cat data.txt
2 apple
3 banana
4 blueberry
7.8 Anchoring 317
5 blackberry
6 raspberry
7 strawberry
8 $ grep -v ’berry’ data.txt
9 apple
10 banana
This is useful when we want to filter out some
arbritary stream of data for some patterns.
In this example, we are filter-
1 $ grep -v ’nologin$’ /etc/passwd ing out all the users that have
2 root:x:0:0:root:/root:/usr/bin/bash the shell set to nologin.
3 git:x:971:971:git daemon user:/:/usr/bin/git- These are users which can-
not be logged into. We can
shell
use the -v flag to invert
4 ntp:x:87:87:Network Time Protocol:/var/lib/ntp the match, and print only
:/bin/false the users that do not have
5 sayan:x:1000:1001:Sayan:/home/sayan:/bin/bash the shell set to nologin. Ef-
6 test1:x:1001:1002::/home/test1:/usr/bin/bash fectively printing all the ac-
counts in the current system
that can be logged into. ntp
uses the /bin/false shell,
7.8 Anchoring which is used to prevent the
user from logging in, but is
not the same as /usr/bin/
If we want to match a pattern only at the start of nologin, which is used to
prevent the user from log-
the line, or at the end of the line, we can use the ^ ging in and also prints a mes-
and $ anchors respectively. sage to the user.
However, in grep, if we want to match a pattern
that is the entire line, we can use the -x flag. This is
useful when we want to match the entire line, and
not just a part of the line. This is same as wrapping
the entire pattern in ^$, but is more readable.
Similarly, if we want to match a pattern that is a
word, we can use the -w flag. This is useful when
we want to match a word, and not a part of a word.
This is same as wrapping the entire pattern in \\b, Observe in this example, if
we do not use the -w flag
but is more readable.
then words that have the
1 $ grep ’apple’ /usr/share/dict/words | tail substring "apple" will also
be matched. However, when
2 stapple
we use the -w flag, only the
3 star-apple
word "apple" is matched as
a whole word.
318 7 Grep
4 strapple
5 thorn-apple
6 thrapple
7 toffee-apple
8 undappled
9 ungrapple
10 ungrappled
11 ungrappler
12 $ grep -w ’apple’ /usr/share/dict/words | tail
13 may-apple
14 oak-apple
15 pine-apple
16 pond-apple
17 rose-apple
18 snap-apple
19 sorb-apple
20 star-apple
21 thorn-apple
22 toffee-apple
7.9 Counting Matches
Sometimes all we want is to see how many lines
match the pattern, and not the lines themselves. We
can use the -c flag to count the number of lines that
match the pattern. This is exactly same as piping
the output of grep to wc -l, but is more readable.
1 $ grep -c ’apple’ /usr/share/dict/words
2 101
3 $ grep -ic ’apple’ /usr/share/dict/words
4 107
From this we can quickly see that there are 107 −
101 = 6 lines that contain the word "Apple" in the
file.
We can also print those lines using the diff com-
mand.
7.9 Counting Matches 319
1 $ diff <(grep apple /usr/share/dict/words) <(
grep -i apple /usr/share/dict/words)
2 0a1
3 > Apple
4 1a3
5 > Appleby
6 5a8
7 > Appledorf
8 10a14
9 > Applegate
10 28a33
11 > Appleseed
12 31a37
13 > Appleton
Or by using the comm command.
The comm command is used to compare two sorted
files line by line. It is used to find lines that are
common, or different between two files. The comm
command requires the input files to be sorted, and
it will not work if the files are not sorted. The comm
command has three numbered columns, the first
column is the lines that are unique to the first file, Figure 7.1: The comm com-
the second column is the lines that are unique to mand
the second file, and the third column is the lines
that are common to both files. The comm command
is useful when we want to compare two files, and
find the differences between them, or find the lines Observe that we had to sort
the files before using comm
common between them.
, as comm requires the files
1 $ comm -13 <(grep apple /usr/share/dict/words to be sorted. We can use the
| sort) <(grep -i apple /usr/share/dict/ <(command) syntax to pass
the output of a command as
words | sort)
a file to another command.
2 Apple This is called process substi-
3 Appleby tution.
4 Appledorf
5 Applegate
6 Appleseed
7 Appleton
320 7 Grep
7.10 Print Filename
Sometimes, we may want to search for a pattern
in multiple files, and we want to know which file
contains the pattern. We can use the -H flag to print
the filename along with the matched line. This
is however the default behaviour in GNU grep if
multiple files are passed.
In this example, we are pass-
ing all the .txt files in the 1 $ cat hello.txt
current directory to grep, 2 hello world
and searching for patterns.
3 hello universe
The -H flag is implicit and
4 $ cat linux.txt
is used to print the filename
along with the matched line. 5 this is linux
This is useful when we are 6 $ grep hello *txt
searching for a pattern in 7 hello.txt:hello world
multiple files, and we want 8 hello.txt:hello universe
to know which file contains 9 $ grep this *txt
the pattern.
10 linux.txt:this is linux
11 $ grep i *txt
12 hello.txt:hello universe
13 linux.txt:this is linux
But if we want to suppress the printing of the
filename, we can use the -h flag. This is useful when
we are searching for a pattern in multiple files, and
we do not want to know which file contains the
pattern, just the line.
1 $ cat hello.txt
2 hello world
3 hello universe
4 $ cat linux.txt
5 this is linux
6 $ grep -h hello *txt
7 hello world
8 hello universe
9 $ grep -h this *txt
10 this is linux
11 $ grep -h i *txt
12 hello universe
7.11 Limiting Output 321
13 this is linux
Similarly, if we want to print the filename only
if there are multiple files, we can use the -l flag.
This will print only the name of the file that has
one or more matches. This mode does not print
the filename multiple time even if it has multiple
matches on same or different lines. Thus this is
not same as grep pattern files... | cut -d: -f1.
Rather it is same as grep pattern files... | cut -
d: -f1 | uniq
7.11 Limiting Output
Although we can use head and tail to limit the
number of lines of output of grep, grep also has the
-m flag to limit the number of matches. This is useful
when we want to see only the first few matches, and
The benefit of using -m in-
not all the matches. stead of head is that the out-
put will remain colored if col-
1 $ grep ’nologin’ /etc/passwd
oring is supported, although
2 bin:x:1:1::/:/usr/bin/nologin
it is not visible here, try run-
3 daemon:x:2:2::/:/usr/bin/nologin ning both to observe the dif-
4 mail:x:8:12::/var/spool/mail:/usr/bin/nologin ference.
5 ftp:x:14:11::/srv/ftp:/usr/bin/nologin
6 http:x:33:33::/srv/http:/usr/bin/nologin
7 nobody:x:65534:65534:Kernel Overflow User:/:/
usr/bin/nologin
8 dbus:x:81:81:System Message Bus:/:/usr/bin/
nologin
9 systemd-coredump:x:984:984:systemd Core Dumper
:/:/usr/bin/nologin
10 systemd-network:x:982:982:systemd Network
Management:/:/usr/bin/nologin
11 systemd-oom:x:981:981:systemd Userspace OOM
Killer:/:/usr/bin/nologin
12 systemd-journal-remote:x:980:980:systemd
Journal Remote:/:/usr/bin/nologin
322 7 Grep
13 systemd-journal-upload:x:979:979:systemd
Journal Upload:/:/usr/bin/nologin
14 systemd-resolve:x:978:978:systemd Resolver:/:/
usr/bin/nologin
15 systemd-timesync:x:977:977:systemd Time
Synchronization:/:/usr/bin/nologin
16 tss:x:976:976:tss user for tpm2:/:/usr/bin/
nologin
17 uuidd:x:68:68::/:/usr/bin/nologin
18 avahi:x:974:974:Avahi mDNS/DNS-SD daemon:/:/
usr/bin/nologin
19 named:x:40:40:BIND DNS Server:/:/usr/bin/
nologin
20 dnsmasq:x:973:973:dnsmasq daemon:/:/usr/bin/
nologin
21 geoclue:x:972:972:Geoinformation service:/var/
lib/geoclue:/usr/bin/nologin
22 _talkd:x:970:970:User for legacy talkd server
:/:/usr/bin/nologin
23 nbd:x:969:969:Network Block Device:/var/empty
:/usr/bin/nologin
24 nm-openconnect:x:968:968:NetworkManager
OpenConnect:/:/usr/bin/nologin
25 nm-openvpn:x:967:967:NetworkManager OpenVPN
:/:/usr/bin/nologin
26 nvidia-persistenced:x:143:143:NVIDIA
Persistence Daemon:/:/usr/bin/nologin
27 openvpn:x:965:965:OpenVPN:/:/usr/bin/nologin
28 partimag:x:110:110:Partimage user:/:/usr/bin/
nologin
29 polkitd:x:102:102:PolicyKit daemon:/:/usr/bin/
nologin
30 rpc:x:32:32:Rpcbind Daemon:/var/lib/rpcbind:/
usr/bin/nologin
31 rpcuser:x:34:34:RPC Service User:/var/lib/nfs
:/usr/bin/nologin
32 rtkit:x:133:133:RealtimeKit:/proc:/usr/bin/
nologin
33 sddm:x:964:964:SDDM Greeter Account:/var/lib/
sddm:/usr/bin/nologin
7.11 Limiting Output 323
34 usbmux:x:140:140:usbmux user:/:/usr/bin/
nologin
35 qemu:x:962:962:QEMU user:/:/usr/bin/nologin
36 cups:x:209:209:cups helper user:/:/usr/bin/
nologin
37 dhcpcd:x:959:959:dhcpcd privilege separation
:/:/usr/bin/nologin
38 redis:x:958:958:Redis in-memory data structure
store:/var/lib/redis:/usr/bin/nologin
39 saned:x:957:957:SANE daemon user:/:/usr/bin/
nologin
40 tor:x:43:43::/var/lib/tor:/usr/bin/nologin
41 $ grep ’nologin’ /etc/passwd | head -n5
42 bin:x:1:1::/:/usr/bin/nologin
43 daemon:x:2:2::/:/usr/bin/nologin
44 mail:x:8:12::/var/spool/mail:/usr/bin/nologin
45 ftp:x:14:11::/srv/ftp:/usr/bin/nologin
46 http:x:33:33::/srv/http:/usr/bin/nologin
47 $ grep ’nologin’ /etc/passwd -m5
48 bin:x:1:1::/:/usr/bin/nologin
49 daemon:x:2:2::/:/usr/bin/nologin
50 mail:x:8:12::/var/spool/mail:/usr/bin/nologin
51 ftp:x:14:11::/srv/ftp:/usr/bin/nologin
52 http:x:33:33::/srv/http:/usr/bin/nologin
1 $ cat hello.txt
2 hello world
3 hello universe
4 $ cat linux.txt
5 this is linux
6 $ grep -l hello *txt
7 hello.txt
8 $ grep -l this *txt
9 linux.txt
10 $ grep -l i *txt
11 hello.txt
12 linux.txt
324 7 Grep
7.12 Quiet Quitting
No, we are not talking about the recent trend of
doing the bare minimum in a job.
If we want to suppress the output of grep, and only
see if the pattern was found or not, we can use the
-q flag. This is useful when we want to use grep in
a script, and we do not want to see the output of
grep, just the exit status. This also implies -m1, as
we can determine the exit status of grep as soon as
In this example we use con- we find the first match.
cepts such as if and then to
1 $ cat hello.txt
check the exit status of grep.
If the exit status is 0, then the 2 hello world
pattern was found, and we 3 hello universe
print "world mentioned". If 4 $ grep -q ’world’ hello.txt
the exit status is 1, then the 5 $ echo $?
pattern was not found, and
6 0
we do not print anything.
7 $ if grep -q ’world’ hello.txt ; then echo "
$? is a special variable that
stores the exit status of the world mentioned" ; fi
last command. If the exit sta- 8 world mentioned
tus is 0, then the command 9 $ if grep -q ’galaxy’ hello.txt ; then echo "
was successful, and if the galaxy mentioned" ; fi
exit status is 1, then the com-
mand failed.
We will cover these in later
chapters.
7.13 Numbering Lines
If we want to number the lines that match the
pattern, we can use the -n flag. This is useful when
we want to see the line number of the lines that
match the pattern.
1 $ cat hello.txt
2 hello world
3 hello universe
4 $ grep -n ’hello’ hello.txt
5 1:hello world
6 2:hello universe
7.14 Recursive Search 325
7.14 Recursive Search
grep can also search for patterns in directories and
subdirectories recursively. We can use the -r flag to
achieve this. This is useful when we want to search
for a pattern in multiple files, and we do not know
which files contain the pattern, or if the files are
nested deeply inside directories.
By default it starts searching from the current di-
rectory, but we can specify the directory to start
searching from as the second argument.
1 $ mkdir -p path/to/some/deeply/nested/folders/
like/this
2 mkdir: created directory ’path’
3 mkdir: created directory ’path/to’
4 mkdir: created directory ’path/to/some’
5 mkdir: created directory ’path/to/some/deeply’
6 mkdir: created directory ’path/to/some/deeply/
nested’
7 mkdir: created directory ’path/to/some/deeply/
nested/folders’
8 mkdir: created directory ’path/to/some/deeply/
nested/folders/like’
9 mkdir: created directory ’path/to/some/deeply/
nested/folders/like/this’
10 $ echo hello > path/to/some/deeply/nested/
folders/like/this/hello.txt
11 $ echo "hello world" > path/world.txt
12 $ grep -r hello
13 path/world.txt:hello world
14 path/to/some/deeply/nested/folders/like/this/
hello.txt:hello
15 $ grep -r hello path/to/
16 path/to/some/deeply/nested/folders/like/this/
hello.txt:hello
326 7 Grep
7.15 Context Line Control
Sometimes it is useful to see a few lines before
or after the actual line that contains the matched
pattern. We can use the -A, -B, and -C flags to control
the number of lines to print after, before, and around
the matched line respectively. 3
3: The -A flag is for printing
lines after the match, the -
B flag is for printing lines 1 $ grep -n sayan /etc/passwd
before the match, and the -C 2 37:sayan:x:1000:1001:Sayan:/home/sayan:/bin/
flag is for printing lines both bash
before and after the match.
3 $ grep -n sayan /etc/passwd -A2
4 37:sayan:x:1000:1001:Sayan:/home/sayan:/bin/
bash
5 38-qemu:x:962:962:QEMU user:/:/usr/bin/nologin
6 39-cups:x:209:209:cups helper user:/:/usr/bin/
nologin
7 $ grep -n sayan /etc/passwd -B2
8 35-sddm:x:964:964:SDDM Greeter Account:/var/
lib/sddm:/usr/bin/nologin
9 36-usbmux:x:140:140:usbmux user:/:/usr/bin/
nologin
10 37:sayan:x:1000:1001:Sayan:/home/sayan:/bin/
bash
11 $ grep -n sayan /etc/passwd -C2
12 35-sddm:x:964:964:SDDM Greeter Account:/var/
lib/sddm:/usr/bin/nologin
13 36-usbmux:x:140:140:usbmux user:/:/usr/bin/
nologin
14 37:sayan:x:1000:1001:Sayan:/home/sayan:/bin/
bash
15 38-qemu:x:962:962:QEMU user:/:/usr/bin/nologin
16 39-cups:x:209:209:cups helper user:/:/usr/bin/
nologin
These were some of the flags that can be used with
grep to control the output. There are many more
flags that can be used with grep, and you can see
them by running man grep.
Now that we have learnt some of the flags on their
7.16 Finding Lines Common in Two Files 327
own, let us try to accomplish certain tasks using
multiple flags together.
7.16 Finding Lines Common in
Two Files
If we have two files which has some lines of text,
and our job is to find the lines that are common in
both, we can use comm if the files are sorted, or we
can sort the files before passing it to comm if we are
allowed to have the output sorted.
But if we are not allowed to sort the files, we can
use grep to accomplish this.
1 $ cat file1.txt
2 this is file1
3 this.*
4 a common line
5 is to check if it misinteprets regex
6 apple
7 $ cat file2.txt
8 this is file2
9 this.*
10 a common line
11 is to check if we are checking fixed strings
12 pineapple
Ideally the lines that should be detected as being
common are the lines
1 this.*
2 a common line
Remember that grep allows using a file as the pat-
tern, and it will match any line that matches any
of the patterns in the file. Let us try to use it to
provide one file as a pattern, and the other file as
the input.
328 7 Grep
1 $ grep -f file2.txt file1.txt
2 this is file1
3 this.*
4 a common line
Observe that it is also matching the line this is
file1, which is not common in both the files. This
is because the .* in the pattern this.* is being
interpreted as a regex, and not as a literal character.
We can use the -F flag to treat the pattern as a fixed
string, and not as a regex.
1 $ grep -Ff file2.txt file1.txt
2 this.*
3 a common line
So is that it? It looks like we are getting the correct
output. Not yet. If we reverse the order of files, we
see that we are not getting the correct output.
1 $ grep -Ff file1.txt file2.txt
2 this.*
3 a common line
4 pineapple
This is because the apple in file1.txt is matching
the word pineapple in file2.txt as a substring.
If we want to print only lines that match the entire
line, we can use the -x flag.
1 $ grep -Fxf file1.txt file2.txt
2 this.*
3 a common line
Now we are getting the correct output, in the origi-
nal file order.
If we were to use comm, we would have to sort the
files first, and then use comm to find the common
lines.
1 $ comm -12 file1.txt file2.txt
7.16 Finding Lines Common in Two Files 329
2 comm: file 1 is not in sorted order
3 comm: file 2 is not in sorted order
4 comm: input is not in sorted order
5 $ comm -12 <( sort file1.txt) <(sort file2.
txt)
6 a common line
7 this.*
Observe how the order of output is different.
The trick to mastering grep is to understand the
flags, and how they can be combined to accomplish
the task at hand. The more you practice, the more
you will understand how to use grep effectively.
Refer to the practice questions in the VM.
Shell Variables 8
We have seen how to execute commands in the linux
shell, and also how to combine those commands
to perform complex tasks. However, to make really
powerful scripts, we need to store the output of
commands, and also store intermediate results. This
is where variables come in. In this chapter, we will
learn how to create, manipulate, and use variables
in the shell.
8.1 Creating Variables
There are two types of variables in the shell: En-
vironment Variables and Shell Variables. Envi-
ronment variables are accessible to all processes
running in the environment, while shell variables
are only accessible to the shell in which they are
created.
Definition 8.1.1 (Environment Variables) An
environment variable is a variable that is access-
bile to all processes running in the environment.
It is a key-value pair. It is created using the
export command. It can be accessed using the
$ followed by the name of the variable (e.g.
$HOME) or using the printenv command. They
are part of the environment in which a process
runs. For example, a running process can query
the value of the TEMP/TMPDIR environment
variable to discover a suitable location to store
temporary files, or the HOME or USER variable
to find the directory structure owned by the
332 8 Shell Variables
user running the process.
Definition 8.1.2 (Shell Variables) A shell vari-
able is a variable that is only accessible to the
shell in which it is created. It is a key-value
pair. It is created using the = operator. It can
be accessed using the $ followed by the name
of the variable (e.g. $var). They are local to the
shell in which they are created.
Let us see how to create a shell variable.
1 $ var="Hello World"
This creates a shell variable var with value Hello
World. If the value of our variable contains spaces,
we need to enclose it in quotes. It is important to
note that there should be no spaces around the =
operator.
Variable names can contain letters, numbers, and
underscores, but they cannot start with a number.
Similarly, for an environment variable, we use the
export command.
1 $ export var="Hello World"
This creates an environment variable var with value
Hello World. The difference between a shell variable
and an environment variable is that the environ-
ment variable is accessible to all processes running
in the environment, while the shell variable is only
accessible to the shell in which it is created.
A variable can also be exported after it is created.
1 $ var="Hello World"
2 $ export var
8.2 Printing Variables to the Terminal 333
8.2 Printing Variables to the
Terminal
To access the value of a variable, we use the \$
operator followed by the name of the variable. It
is only used when we want to get the value of the
variable, and not for setting the value.
1 $ var="Hello World"
2 $ echo $var
3 Hello World
Here we want to concatenate
However, it is often required to enclose the variable the values of the variables
name in braces to avoid ambiguity when concate- date and time with an un-
derscore in between. How-
nating with other alpha-numeric characters.
ever, the shell things that the
1 $ date="2024_07_30" first variable we are access-
ing is date_, and the second
2 $ time="08:30:00"
variable is time. This gives
3 $ echo "$date_$time"
an empty first variable. To fix
4 08:30:00 this ambiguity, we enclose
5 $ echo "${date}_${time}" the variable name in braces.
6 2024_07_30_08:30:00
Remark 8.2.1 If we want to print the dollar
symbol literally, we need to escape it using the
backslash character, or surround it in single
quotes, not double quotes.
1 $ echo ’$USER is’ "$USER"
2 $USER is sayan
3 $ echo "\$HOSTNAME is $HOSTNAME"
4 $HOSTNAME is rex
Here we are using the echo command to print the
value of the variable var to the terminal.
334 8 Shell Variables
8.2.1 Echo Command
The echo command displays a line of text on the
terminal. It is commonly used in shell scripts to
display a message or output of a command. It is
also used to print the value of a variable.
On most shells echo is actually a built-in command,
not an external program. This means that the shell
has a built-in implementation of the echo command,
which is faster than calling an external program.
Although the echo binary might be present on your
system, it is not the one being called when executing
echo. This is usually not an issue for most use cases,
however, you should be aware which echo you
are running, so that you can refer to the correct
documentation.
When using man echo, we get the documentation
of echo binary distributed through GNU core utils,
whereas when we using help echo, we get the doc-
umentation of the echo which is built-in into the
bash shell.
1 $ man echo | sed ’/^$/d’ | head -n15
2 ECHO(1)
User Commands
ECHO(1)
3 NAME
4 echo - display a line of text
5 SYNOPSIS
6 echo [SHORT-OPTION]... [STRING]...
7 echo LONG-OPTION
8 DESCRIPTION
9 Echo the STRING(s) to standard output.
10 -n do not output the trailing
newline
8.2 Printing Variables to the Terminal 335
11 -e enable interpretation of
backslash escapes
12 -E disable interpretation of
backslash escapes (default)
13 --help display this help and exit
14 --version
15 output version information and
exit
16 If -e is in effect, the following
sequences are recognized:
1 $ help echo | head -n20 | sed ’/^ +$/d’
2 echo: echo [-neE] [arg ...]
3 Write arguments to the standard output.
4 Display the ARGs, separated by a single
space character and followed by a
5 newline, on the standard output.
6 Options:
7 -n do not append a newline
8 -e enable interpretation of the
following backslash escapes
9 -E explicitly suppress interpretation
of backslash escapes
10 ‘echo’ interprets the following backslash-
escaped characters:
11 \a alert (bell)
12 \b backspace
13 \c suppress further output
14 \e escape character
15 \E escape character
16 \f form feed
17 \n new line
18 \r carriage return
The options of both the echo commands look similar,
however, GNU core-utils echo also has the support
for two long options. These are options with two
dashes, and are more descriptive than the short
options. However, these are not present in the built-
in echo command.
336 8 Shell Variables
Thus, if we are reading the man page of echo, we
might think that the long options will work with
the default echo, but it will not.
When we call the echo exe-
cutable with its path, the exe- 1 $ type -a echo
cutable gets executed instead 2 echo is a shell builtin
of the built-in. This supports
3 echo is /sbin/echo
the long options. The built-
4 echo is /bin/echo
in version simply prints the
long option as text. 5 echo is /usr/bin/echo
6 echo is /usr/sbin/echo
1 $ echo --version
2 --version
1 $ /bin/echo --version
2 echo (GNU coreutils) 9.5
3 Copyright (C) 2024 Free Software Foundation,
Inc.
4 License GPLv3+: GNU GPL version 3 or later <
https://gnu.org/licenses/gpl.html>.
5 This is free software: you are free to change
and redistribute it.
6 There is NO WARRANTY, to the extent permitted
by law.
7
8 Written by Brian Fox and Chet Ramey.
Escape Characters
The echo command also supports escape characters.
These are special characters that are used to format
the output of the echo command.
However, to use these escape characters, we need
to use the -e option. The following list of escape
characters are supported by the echo command is
taken from the help echo output as-is.
1 ‘echo’ interprets the following backslash-
escaped characters:
2 \a alert (bell)
8.2 Printing Variables to the Terminal 337
3 \b backspace
4 \c suppress further output
5 \e escape character
6 \E escape character
7 \f form feed
8 \n new line
9 \r carriage return
10 \t horizontal tab
11 \v vertical tab
12 \\ backslash
13 \0nnn the character whose ASCII code
is NNN (octal). NNN can be
14 0 to 3 octal digits
15 \xHH the eight-bit character whose
value is HH (hexadecimal). HH
16 can be one or two hex digits
17 \uHHHH the Unicode character whose
The backspace character \b
value is the hexadecimal value HHHH.
moves the cursor one charac-
18 HHHH can be one to four hex
ter to the left. This is useful to
digits. overwrite a previously typed
19 \UHHHHHHHH the Unicode character whose character.
value is the hexadecimal value The suppress further output
20 HHHHHHHH. HHHHHHHH can be one character \c suppresses the
to eight hex digits. output of any text present
after it.
1 $ echo -e "abc\bd" The escape character \e is
2 abd used to escape the escape
character after it, but contin-
1 $ echo -e "abc\cde" ues printing after that.
2 abc The form feed character \f
and vertical tab character
1 $ echo -e "abc\ede" \v moves the cursor to the
2 abce next line, but does not move
the cursor to the start of line,
1 $ echo -e "abc\fde" remaining where it was in
2 abc the previous line.
3 de The carriage return character
\r moves the cursor to the
1 $ echo -e "abc\vde" start of the line. This can be
2 abc used to overwrite some parts
3 de of the previously written line.
Characters not overwritten
1 $ echo -e "abc\rde" remain intact. Here the de
overwrite the ab but the c
remains intact.
338 8 Shell Variables
2 dec
The new line character \n
moves the cursor to the next 1 $ echo -e "abc\nde"
line and the cursor to the 2 abc
start of the line. This is same 3 de
as performing \f\r.
The horizontal tab character 1 $ echo -e "abc\tde"
\t moves the cursor to the 2 abc de
next tab stop.
The octal and hexadecimal 1 $ echo -e "\0132"
characters can be used to 2 Z
print the character with the 3 $ echo -e "\x41"
given ASCII code. Here, \ 4 A
x41 is the ASCII code for A
and \0132 is the ASCII code
for Z.
8.2.2 Accessing and Updating Numeric
Variables
If the variable is a number, we can perform arith-
metic operations on it in a mathematical context.
1: There are no data types 1
in bash. A variable can store
a number, a string, or any
other data type. Every vari-
able is treated as a string, un- Basic Arithmetic
less inside a mathematical
context.
Basic Arithmetic operations such as addition, sub-
traction, multiplication, division, and modulo can
be performed using the (( )) construct.
Exponentiation can be performed using the ** op-
erator, similar to Python.
The ^ operator is reserved for bitwise XOR.
1 $ var=5
2 $ echo $var
3 5
4 $ echo $((var+5))
5 10
6 $ echo $((var-5))
7 0
8 $ echo $((var*5))
8.2 Printing Variables to the Terminal 339
9 25
10 $ echo $((var/5))
11 1
12 $ echo $((var%5))
13 0
14 $ echo $((var**2))
15 25
Bitwise Operations
Bash also supports bitwise operations. The bitwise
NOT operator is ~, the bitwise AND operator is
&, the bitwise OR operator is |, and the bitwise
XOR operator is ^. They operate on the binary
representation of the number.
AND operation: The result is 1 if both bits are 1,
otherwise 0.
OR operation: The result is 1 if either of the bits is 1,
otherwise 0.
XOR operation: The result is 1 if the bits are different,
otherwise 0.
NOT operation: The result is the complement of To understand why the re-
the number. sult of the bitwise NOT op-
eration is -6, we need to un-
1 $ echo $((var^2)) derstand how the numbers
2 7 are stored in the computer.
The number 5 is stored as
3 $ echo $((var&3))
00000101 in binary. The bit-
4 1
wise NOT operation inverts
5 $ echo $((var|3)) the bits, so 00000101 becomes
6 7 11111010. This is the two’s
7 $ echo $((~var)) complement representation
8 -6 of -6. Two’s complement is
how computers store nega-
tive numbers. This is better
than one’s complement (just
Comparison Operations flipping the bits of the pos-
itive number) as it gives a
single value of zero, which
The comparision operators return 1 if the condition is its own additive inverse.
is true, and 0 if the condition is false. This is the Read more about two’s com-
plement here.
340 8 Shell Variables
opposite of how the exit codes work in bash, where
0 means success and 1 means failure. However, we
will see later that the mathematical enviroment
actually exits with a zero exit code if the value is
non-zero, fixing this issue.
1 $ echo $((var>5))
2 0
3 $ echo $((var>=5))
4 1
5 $ echo $((var<5))
6 0
7 $ echo $((var<=5))
8 1
9 $ echo $((var==5))
10 1
11 $ echo $((var!=5))
12 0
Increment and Decrement
Bash supports both pre-increment and post-increment,
as well as pre-decrement and post-decrement opera-
tors. The difference between pre and post is that the
pre operator increments the variable before using
it, while the post operator increments the variable
after using it.
Although var++ increases
the value, the updated value 1 $ echo $((++var))
is not returned, thus the out-
2 6
put of echo remains same
3 $ echo $((var++))
as earlier. We can reprint
the variable to confirm the 4 6
change. 5 $ echo $var
6 7
7 $ echo $((--var))
8 6
9 $ echo $((var--))
10 6
11 $ echo $var
12 5
8.2 Printing Variables to the Terminal 341
Assignment and Multiple Clauses
We can also perform multiple arithmetic operations
in a single line. The result of the last operation is
returned.
1 $ echo $((a=5, b=10, a+b))
2 15
3 $ echo $((a=5, b=10, a+b, a*b))
4 50
The evaluation of an assignment operation is the
value of the right-hand side of the assignment. This
behaviour helps in chaining multiple assignment
operations.
1 $ echo $((a=5, b=10))
2 10
3 $ echo $((a=b=7, a*b))
4 49
Floating Point Arithmetic
Bash does not support floating point arithmetic.
1 $ echo $((10/3))
2 3
3 $ echo $((40/71))
4 0
However, we can use the bc command to perform
floating point arithmetic. Other commands that
support floating point arithmetic are awk and perl.
We can also use the printf command to format
the output of the floating point arithmetic after
performing integer arithmetic in bash.
1 $ printf "%.2f\n" "$((10**4 * 10/3))e-4"
2 3.33
3 $ printf "%.2f%%\n" "$((10**4 * 40/71))e-4"
4 56.33%
342 8 Shell Variables
1 $ awk ’BEGIN { printf "%.2f%%", (40/71*100) }’
2 56.34%
Working with different bases
Bash supports working with different bases. The
bases can be in the range [2 , 64].
The syntax to refer to a number in a non-decimal
base is:
1 base#number
We can use the 0x prefix shortcut for hexadecimal
numbers, and the 0 prefix shortcut for octal num-
bers.
31 in octal is same as 25 in
decimal. This leads to the
We can also mix bases in the same expression.
joke that programmers get
confused between Haloween 1 $ echo $((2#1000 + 2#101))
(31 Oct) and Christmas (25
2 13
Dec).
3 $ echo $((8#52 + 8#21))
4 59
5 $ echo $((052 + 021))
6 59
7 $ echo $((16#1a + 16#2a))
8 68
9 $ echo $((0x1a + 0x2a))
10 68
11 $ echo $((8#31 - 25))
12 0
If we only want to evaluate the expression and not
print it, we can use the (( )) construct without the
echo command.
1 $ var=5
2 $ ((var++))
3 $ echo $var
4 6
8.3 Removing Variables 343
There are also other ways to perform arithmetic
operations in bash, such as using the expr command,
or using the let command.
The expr command is an ex-
1 $ var=5 ternal command, and not a
2 $ expr $var + 5 shell built-in. Thus, it does
3 10 not have access to the shell
variables. If we want to use
4 $ expr $var \* 5
the values of the variables,
5 25
we have to expand them first.
The * operator is a glob that
1 $ a=5
matches all files in the cur-
2 $ echo $a rent directory. To prevent
3 5 this, we need to escape the *
4 $ let a++ operator.
5 $ echo $a The let command is a shell
6 6 built-in, and has access to the
shell variables. Thus it can be
used to not only access, but
also alter the variables.
8.3 Removing Variables
To remove a variable, we use the unset command.
1 $ var="Hello World"
2 $ echo $var
3 Hello World
4 $ unset var
5 $ echo $var
This can also be used to remove multiple vari-
ables.
1 $ var1="Hello"
2 $ var2="World"
3 $ echo $var1 $var2
4 Hello World
5 $ unset var1 var2
6 $ echo $var1 $var2
Variables can also be unset by setting them to an
empty string.
1 $ var="Hello World"
344 8 Shell Variables
2 $ echo $var
3 Hello World
4 $ var=""
5 $ echo $var
However, this does not remove the variable, but
only sets it to an empty string.
The difference between unsetting a variable and
setting it to an empty string is that the unset variable
is not present in the shell, while the variable set to
an empty string is present in the shell. This can be
observed using the test shell built-in.
The shell built-in version of test command can de-
tect if a variable is set or not using the -v flag, this
is not present in the executable version since an ex-
ternal executable can not read the shell variables.
1 $ var="Hello World"
2 $ echo $var
3 Hello World
4 $ test -v var ; echo $?
5 0
6 $ var=""
7 $ echo $var
8
9 $ test -v var ; echo $?
10 0
11 $ unset var
12 $ echo $var
13
14 $ test -v var ; echo $?
15 1
8.4 Listing Variables 345
8.4 Listing Variables
8.4.1 set
To list all the variables in the shell, we use the
The output of the set com-
set command. This displays all the variables and
mand is very long, and we
functions defined in the shell. can use the less command
to scroll through it. Here we
1 $ set | head -n120 | shuf | head are showing random 10 lines
2 BASH=/bin/bash from the output.
3 LC_NUMERIC=en_US.UTF-8
4 LESS_TERMCAP_so=$’\E[01;44;33m’
5 BASH_REMATCH=()
6 PATH=/opt/google-cloud-cli/bin:/sbin:/bin:/usr
/local/sbin:/usr/local/bin:/usr/bin:/usr/
sbin:/opt/android-sdk/cmdline-tools/latest
/bin:/opt/android-sdk/platform-tools:/opt/
android-sdk/tools:/opt/android-sdk/tools/
bin:/usr/lib/jvm/default/bin:/usr/bin/
site_perl:/usr/bin/vendor_perl:/usr/bin/
core_perl:/usr/lib/rustup/bin:/home/sayan/
scripts:/home/sayan/.android/sdk:/home/
sayan/.android/sdk/tools:/home/sayan/.
android/sdk/platform-tools:/home/sayan/
scripts:/home/sayan/.local/bin:/home/sayan
/.pub-cache/bin:/usr/lib/jvm/default/bin:/
home/sayan/.fzf/bin
7 HOSTTYPE=x86_64
8 COLUMNS=187
9 SHELL=/bin/bash
10 BROWSER=thorium-browser
set also lists the user-defined variables defined in
the shell.
1 $ var1=5
2 $ set | grep var1
3 var1=5
346 8 Shell Variables
8.4.2 declare
Another way to list the variables is to use the declare
command. It lists all the environment variables,
shell variables, and functions defined in the shell.
1 $ declare | head -n10
2 ANDROID_HOME=/home/sayan/.android/sdk
3 ANDROID_SDK_ROOT=/home/sayan/.android/sdk
4 AWT_TOOLKIT=MToolkit
5 BASH=/bin/bash
6 BASHOPTS=autocd:cdspell:checkjobs:checkwinsize
:cmdhist:complete_fullquote:execfail:
expand_aliases:extglob:extquote:
force_fignore:globasciiranges:globskipdots
:histappend:interactive_comments:
patsub_replacement:progcomp:promptvars:
sourcepath
7 BASH_ALIASES=()
8 BASH_ARGC=([0]="0")
9 BASH_ARGV=()
10 BASH_CMDS=()
11 BASH_COMPLETION_VERSINFO=([0]="2" [1]="11")
It can also be used to list the user-defined vari-
ables.
1 $ var1=5
2 $ declare | grep var1
3 var1=5
8.4.3 env
Although the env command is used to run a com-
mand in a modified environment, it can also be used
to list all the environment variables if no arguments
are supplied to it.
1 $ env | head -n10
2 SHELL=/bin/bash
8.4 Listing Variables 347
3 WINDOWID=100663324
4 COLORTERM=truecolor
5 LANGUAGE=
6 LC_ADDRESS=en_US.UTF-8
7 JAVA_HOME=/usr/lib/jvm/default
8 LC_NAME=en_US.UTF-8
9 SSH_AUTH_SOCK=/run/user/1000/ssh-agent.socket
10 SHELL_SESSION_ID=91
c0e4dcd4b644e8bfa2a25613b60f60
11 XDG_CONFIG_HOME=/home/sayan/.config
This only lists the environment variables, and not
the shell variables. Only if the shell variable is
exported, it will be listed in the output of the env
command.
1 $ var1=5
2 $ env | grep var1
3 $ export var1
4 $ env | grep var1
5 var1=5
env is an external command and not a shell built-in,
thus it does not have the access to unexported shell
variables at all.
8.4.4 printenv
The printenv command is used to print all the
environment variables. Similar to the env command,
it only lists the environment variables, and not the
shell variables since it is an executable and not a
shell built-in.
1 $ printenv | shuf | head
2 KONSOLE_VERSION=240202
3 FZF_DEFAULT_COMMAND=fd --type f -H
4 HOME=/home/sayan
5 LANGUAGE=
6 KONSOLE_DBUS_WINDOW=/Windows/1
348 8 Shell Variables
7 USER=sayan
8 CLOUDSDK_PYTHON=/usr/bin/python
9 COLORFGBG=15;0
10 VISUAL=nvim
11 SSH_AUTH_SOCK=/run/user/1000/ssh-agent.socket
8.5 Special Variables
The bash shell exports some special variables whose
values are set by the shell itself. These are useful
to refer in scripts, and also to understand the en-
vironment in which the script is running. They let
the script to be more dynamic and adapt to the
environment.
2: Originally, the System
V distributions exported the ▶ USERstores the currently logged in user. 2
LOGNAME variable, while the ▶ HOME stores the home directory of the user.
BSD distributions exported ▶ PWD stores the current working directory.
the USER variable. Mod-
▶ SHELL stores the path of the shell being used.
ern distros export both, but
USER is more commonly ▶ PATH stores the paths to search for commands.
used. The zsh shell exports ▶ PS1 stores the prompt string for the shell.
USERNAME as well. ▶ PS2 stores the secondary prompt string for the
shell
▶ HOSTNAME stores the network name of the sys-
tem
▶ OSTYPE stores the type of operating system.
▶ TERM stores the terminal type.
The shell also sets some special variables that are
useful in scripts. These are not exported, but set for
every shell or child process accordingly.
▶ $0 stores the name of the script or shell.
▶ $1, $2, $3, ... store the arguments to the
script.
▶ $# stores the number of arguments to the
script.
8.5 Special Variables 349
▶ $* stores all the arguments to the script as a
single string.
▶ $@ stores all the arguments to the script as
array of strings.
▶ $? stores the exit status of the last command.
▶ $$ stores the process id of the current shell.
▶ $! stores the process id of the last background
command.
▶ $- stores the current options set for the shell.
▶ $IFS stores the Internal Field Separator.
▶ $LINENO stores the current line number of the
script.
▶ $RANDOM stores a random number.
▶ $SECONDS stores the number of seconds the
script has been running.
These variables are automatically set and updated
by the shell whenever required.
8.5.1 PWD
The PWD variable is updated
1 $ echo $PWD whenever the current work-
2 /home/sayan ing directory changes.
3 $ cd /tmp
4 $ echo $PWD
5 /tmp
8.5.2 RANDOM
The RANDOM variable stores
1 $ echo $RANDOM a random number between 0
2 11670 and 32767, and is constantly
changed.
3 $ echo $RANDOM
4 29897
Remark 8.5.1 The RANDOM variable is not truly
random, but is a pseudo-random number gener-
350 8 Shell Variables
ated by the shell. It is generated using the Linear
Congruential Generator algorithm. The seed for
the random number is the process id of the shell.
Thus, the same sequence of random numbers
will be generated if the shell is restarted. To get a
more cryptographically secure random number,
we can use the openssl command or read from
the /dev/urandom file. Read more here.
8.5.3 PATH
The PATH variable stores the paths to search for
commands. Whenever a command is executed in
the shell, the shell searches for the command in the
directories listed in the PATH variable if it is not a shell
keyword or a shell built-in. If an executable with
that name with execute permissions is not found in
any of the paths mentioned in PATH variable, then
the command fails.
The PATH variable is colon separated. It is not set
automatically, rather it has to be set by the system.
It is usually set in the /etc/profile file and the
~/.bashrc file.
1 $ echo "echo hello" > sayhello
2 $ chmod 764 sayhello
3 mode of ’sayhello’ changed from 0644 (rw-r--r
--) to 0764 (rwxrw-r--)
4 $ sayhello
5 bash: sayhello: command not found
6 $ PATH=$PATH:$PWD
7 $ sayhello
8 hello
Here we create an executable that prints hello to
the terminal. It is present in our current directory,
but the shell does not know where to find it, as
the current directory is not present in the PATH
8.5 Special Variables 351
variable. Once we add the current directory to the
PATH variable, the shell is able to find the executable
and execute it.
8.5.4 PS1
The PS1 variable stores the primary prompt string
for the shell. It can be used to customize the prompt
of the shell.
1 [sayan@rex ~] $ PS1="[\u@\h \w] \$ "
2 [sayan@rex ~] $ PS1="hello "
3 hello PS1="enter command> "
4 enter command> PS1="User: \u> "
5 User: sayan> PS1="User: \u, Computer: \h> "
6 User: sayan, Computer: rex> PS1="Date: \d> "
7 Date: Thu Jul 25> PS1="Time: \t> "
8 Time: 17:35:44> PS1="Jobs: \j> "
9 Jobs: 0> sleep 50 &
10 [1] 3600280
11 Jobs: 1>
12 Jobs: 1> PS1="Shell: \s> "
13 Shell: bash> PS1="History Number: \!> "
14 [1]+ Done sleep 50
15 History Number: 563>
16 History Number: 563> echo hello
17 hello
18 History Number: 564> PS1="Command Number: \#>
"
19 Command Number: 65>
20 Command Number: 65> echo hello
21 hello
22 Command Number: 66> PS1="Ring a bell \a> "
23 Ring a bell >
24 Ring a bell > PPS1="[\u@\h \w] \$ "
25 [sayan@rex ~] $
These changes are temporary and are only valid for
the current shell session. To make the changes per-
352 8 Shell Variables
manent, we need to add the PS1 variable assignment
to the ~/.bashrc file.
The PS1 variable gives us some customization, how-
ever it is limited. To run any arbitrary command to
determine the prompt, we can use the PROMPT_COMMAND
variable.
For example, if you want to display the exit code of
the last command in the prompt, you can use the
Notice how the exit code following command.
of the last command is dis-
1 $ prompt(){
played in the prompt and
is updated whenever a new 2 > PS1="($?)[\u@\h \w]\$ "
command is run. 3 > }
4 $ PROMPT_COMMAND=prompt
5 (0)[sayan@rex ~]$ ls /home
6 sayan
7 (0)[sayan@rex ~]$ ls /random
8 ls: cannot access ’/random’: No such file or
directory
9 (2)[sayan@rex ~]$
We can also show the exit code of each process in
the prompt if piped commands are used.
1 $ export PROMPT_COMMAND="
2 _RES=\${PIPESTATUS[*]};
3 _RES_STR=’’;
4 for res in \$_RES; do
5 if [[ ( \$res > 0 ) ]]; then
6 _RES_STR=\" [\$_RES]\";
7 fi;
8 done"
9 $ export PS1="\u@\h \w\$_RES_STR\\$ "
10 sayan@rex ~$ echo hello
11 hello
12 sayan@rex ~$ exit 1 | exit 2 | exit 3
Read more here. 13 sayan@rex ~ [1 2 3]$
We can also color the prompt using ANSI escape
codes. We will these in details in later chapters.
8.6 Variable Manipulation 353
8.6 Variable Manipulation
8.6.1 Default Values
The :- operator is used to substitute a default value
if the variable is not set or is empty. This simply
returns the value, and the variable still remains
unset.
1 $ unset var
2 $ echo ${var:-default}
3 default
4 $ echo $var
5
6 $ var=hello
7 $ echo ${var:-default}
8 hello
9 $ echo $var
10 hello
The :+ operator is used to substitute a replacement
value if the variable is set, but not do anything if
not present.
1 $ unset var
2 $ echo ${var:+default}
3
4 $ var=hello
5 $ echo ${var:+default}
6 default
7 $ echo $var
8 hello
The := operator is used to substitute a default value
if the variable is not set or is empty, and also set the 3: This operator is also
variable to the default value. 3 This is similar to the present in modern python
:- operator, but also sets the variable to the default and is called the walrus op-
erator.
value. It does nothing if the variable is already set.
1 $ unset var
2 $ echo ${var:=default}
354 8 Shell Variables
3 default
4 $ echo $var
5 default
6 $ var=hello
7 $ echo ${var:=default}
8 hello
9 $ echo $var
10 hello
These operations consider an empty variable as un-
set. However, sometimes we may need to consider
an empty variable as set, but empty. In those cases,
we can drop the colon from the operator.
Remark 8.6.1 echo ${var:-default} will print
default if var is absent or empty.
echo \${var-default} will print default if var is
absent, but not if var is empty.
echo \${var:+default} will print default if var
is present and not empty.
echo \${var+default} will print default if var is
present, regardless of whether var is empty or
not.
8.6.2 Error if Unset
Sometimes we may want to throw an error if a vari-
able is unset. This is useful in scripts to ensure that
all the required variables are set when performing
critical operations.
Imagine the following line of code.
1 $ rm -rf "$STEAMROOT/"
This looks like a simple line to remove the steam
root directory. However, if the STEAMROOT variable
8.6 Variable Manipulation 355
is unset, this will expand to rm -rf /, which will
delete the root directory of the system if ran with
priviledge, or at the very least, the entire home
directory of the user. This is a very dangerous
operation, and can lead to loss of data.
To prevent this, we can use the :? operator to throw
an error if the variable is unset.
The reason for the specific
1 $ unset STEAMROOT example is that this is a
bug that was present in the
2 $ rm -rf "${STEAMROOT:?Variable not set}/"
steam installer script, and
3 bash: STEAMROOT: Variable not set
was fixed by Valve after it
was reported.
The shell will not even attempt to run the command
if the variable is unset, and will throw an error
immediately, saving the day.
8.6.3 Length of Variable
The length of a variable can be found using the #
operator.
1 $ var="Hello World"
2 $ echo ${#var}
3 11
8.6.4 Substring of Variable
The substring of a variable can be found using the
: operator.
The syntax is ${var:start:length}. The start is the
index of the first character of the substring, and the
length is the number of characters to include in the
substring. The index starts from zero. If the length
exceeds the end of the string, it will print till the 4: In case of a negative in-
end and not throw any error. The index can also be dex, the space between the
negative, in which case it is counted from the end colon and the negative index
is important. If there is no
of the string, similar to Python. 4 space, it will be considered
as a default substitution.
356 8 Shell Variables
1 $ var="Hello World"
2 $ echo ${var:0:5}
3 Hello
4 $ echo ${var:6:5}
5 World
6 $ echo ${var:6:50}
7 World
8 $ echo ${var: -5:5}
9 World
10 $ echo ${var: -11:5}
11 Hello
8.6.5 Prefix and Suffix Removal
The prefix and suffix of a variable can be removed
using the # and \% operators respectively. Any glob
like pattern can be matched, not just fixed strings.
The match can be made greedy (longest match)
by using ## and \%\% respectively. This applies for
wildcard matching.
▶ ${var%pattern} will delete the shortest match
of pattern from the end of var.
▶ ${var%%pattern} will delete the longest match
of pattern from the end of var.
▶ ${var#pattern} will delete the shortest match
of pattern from the start of var.
▶ ${var##pattern} will delete the longest match
of pattern from the start of var.
Here the dot is used as a sepa-
rator, and we want to extract If we have a variable with value "abc.def.ghi.xyz".
the first and last part of the
string. The dot does not sig-
nify a wildcard, but a literal ▶ echo ${var%.*} will print "abc.def.ghi".
dot, since this is not regex.
▶ echo ${var%%.*} will print "abc".
▶ echo ${var#*.} will print "def.ghi.xyz".
▶ echo ${var##*.} will print "xy
8.6 Variable Manipulation 357
8.6.6 Replace Substring
The substring of a variable can be replaced using
the / operator. The syntax is ${var/pattern/string}.
This will replace the first occurence of pattern with
string. The search pattern can be a glob pattern,
not just a fixed string. The replacement has to be a
fixed string, and not a glob pattern.
1 $ var="Hello World"
2 $ echo ${var/ */ Universe}
3 Hello Universe
We can also replace all occurences of the pattern
using the // operator.
1 $ var="Hello World"
2 $ echo ${var//o/O}
3 HellO WOrld
8.6.7 Anchoring Matches
We can also anchor the match to the start or end
of the string using the # and \% operators respec- Observe how the # oper-
ator anchors the match to
tively.
the start of the string. Even
1 $ var="system commands" though the second variable
also had an s in the middle,
2 $ echo ${var/#s/S}
it was not replaced.
3 System commands
4 $ var="linux commands"
5 $ echo ${var/#s/S}
6 linux commands
Similarly, we can anchor the match to the end of the
string using the \% operator.
1 $ var="Hello World"
2 $ echo ${var/%?/&!}
3 Hello World!
358 8 Shell Variables
Here we are using the ? wildcard to match any
character, and replace it with &!. The & is used to
refer to the matched string and is not interpreted
as a literal ampersand.
8.6.8 Deleting the match
We can also delete the match by keeping the re-
placement string empty.
1 $ var="Hello World"
2 $ echo ${var/% */}
This matches all the words after the first word, and
deletes them. Here the match is always greedy, and
will match the longest possible string.
8.6.9 Lowercase and Uppercase
The case of a variable can be changed using the ,
and ^ operators. This changes only the first character
of the variable. To change the entire variable, we
can use the ,, and ^^ operators.
1 $ var="sayan"
2 $ echo ${var^}
3 Sayan
Similarly, we can change the entire variable.
1 $ var="SAYAN"
2 $ echo ${var,,}
3 sayan
This is useful if you want to approximate the user’s
name from the username.
1 $ echo "Hello ${USER^}!"
2 Hello Sayan!
8.7 Restrictions on Variables 359
8.6.10 Sentence Case
To convert the first letter of a variable to uppercase,
and the rest to lowercase, we can use the following
command.
1 var="hELLO wORLD"
2 lower=${var,,}
3 echo ${lower^}
4 Hello world
Here we are simply using the two operators in
sequence to achieve the desired result.
8.7 Restrictions on Variables
Since bash variables are untyped, they can be set
to any value. However, sometimes we may want
to restrict the type of value that can be stored in a
variable.
This can be done using the declare command.
8.7.1 Integer Only
To restrict a variable to only store integers, we can
use the -i flag.
1 $ declare -i ‘var‘
2 $ var=5
3 $ echo "$var * $var = $((var**2))"
4 5 * 5 = 25
5 $ var=hello
6 $ echo "$var * $var = $((var**2))"
7 0 * 0 = 0
If we assign any non-integer value to the variable, it
will be treated as zero. This will not throw an error,
but will silently set the variable to zero.
360 8 Shell Variables
8.7.2 No Upper Case
To automatically convert all the characters of a
variable to lowercase, we can use the -l flag. This
does not change non-alphabetic characters.
1 $ declare -l var
2 $ var="HELLO WORLD!"
3 $ echo $var
4 hello world!
8.7.3 No Lower Case
Similarly, we can use the -u flag to convert all the
characters of a variable to uppercase, while retain-
ing non-alphabetic characters as-is.
1 $ declare -u var
2 $ var="hello world!"
3 $ echo $var
4 HELLO WORLD!
8.7.4 Read Only
5: This way of assigning the
value is also possible for the
other flags, but is not neces- To make a variable read only, we can use the -r
sary. flag. This means we cannot change the value of the
variable once it is set. Thus the value also has to be
set at the time of declaration. 5
1 $ declare -r PI=3.14159
2 $ echo "PI = $PI"
3 PI = 3.14159
4 $ PI=3.1416
5 -bash: PI: readonly variable
8.8 Bash Flags 361
8.7.5 Removing Restrictions
We can remove the restrictions from a variable using
the + flag. This cannot be done for the read only
flag.
1 $ declare -i var
2 $ var=hello
3 $ echo $var
4 0
5 $ declare +i var
6 $ var=hello
7 $ echo $var
8 hello
8.8 Bash Flags
There are some flags that can be set in the bash
shell to change the behaviour of the shell. The
currently set flags can be viewed using the echo \$-
command.
1 $ echo $-
2 himBHs
These can be set or unset using the set command.
1 $ echo $-
2 himBHs
3 $ set +m
4 $ echo $-
5 hiBHs
The same convention as the declare command is
used, with + to unset the flag, and - to set the flag.
The default flags are:
▶ h: locate and remember (hash) commands as
they are looked up.
362 8 Shell Variables
▶ i: interactive shell
▶ m: monitor the jobs and report changes
▶ B: braceexpand - expand the expression in
braces
▶ H: histexpand - expand the history command
▶ s: Read commands from the standard input.
We can see the default flags change if we start a
non-interactive shell.
1 $ bash -c ’echo $-’
2 hBc
The c flag means that bash is reading the command
from the argument and it assigns $0 to the first
non-option argument.
Some other important flags are:
▶ e: Exit immediately if a command exits with
a non-zero status.
▶ u: Treat unset variables as an error when
substituting.
▶ x: Print commands and their arguments as
they are executed.
▶ v: Print shell input lines as they are read.
▶ n: Read commands but do not execute them.
This is useful for testing syntax of a command.
▶ f: Disable file name generation (globbing).
▶ C: Also called noclobber. Prevent overwriting
of files using redirection.
If we set the -f flag, the shell will not expand the
glob patterns which we discussed in Chapter 6.
8.9 Signals
8.10 Brace Expansion 363
Remark 8.9.1 If you press Ctrl+S in the terminal,
some terminals might stop responding. You can
resume it by pressing Ctrl+Q. This is because
ixoff is set. You can unset it using stty -ixon.
This is a common problem with some terminals,
thus it can placed inside the ~/.bashrc file.
Other Ctrl+Key combinations:
▶ Ctrl+C: Interrupt the current process, this
sends the SIGINT signal.
▶ Ctrl+D: End of input, this sends the EOF
signal.
▶ Ctrl+L: Clear the terminal screen.
▶ Ctrl+Z: Suspend the current process, this
sends the SIGTSTP signal.
▶ Ctrl+R: Search the history of commands using
reverse-i-search.
▶ Ctrl+T: Swap the last two characters
▶ Ctrl+U: Cut the line before the cursor
▶ Ctrl+V: Insert the next character literally
▶ Ctrl+W: Cut the word before the cursor
▶ Ctrl+Y: Paste the last cut text
8.10 Brace Expansion
As the B flag is set by default, brace expansion is
enabled in the shell.
Definition 8.10.1 (Brace Expansion) Brace ex-
pansion is a mechanism by which arbitrary
strings can be generated using a concise syntax.
It is similar to pathname expansion, but can be
used to expand to non-existing patterns too.
364 8 Shell Variables
Brace expansion is used to generate list of strings,
and is useful in generating sequences of strings.
8.10.1 Range Expansion
For generating a sequence of numbers, we can use
the range expansion.
Syntax:
1 {start...end}
1 $ echo {1..5}
2 1 2 3 4 5
We can also specify the increment value.
1 $ echo {1..11..2}
2 1 3 5 7 9 11
The start and end values are both inclusive.
This can also be used for alphabets.
1 $ echo {a..f}
2 a b c d e f
8.10.2 List Expansion
For generating a list of strings, we can use the list
expansion.
Syntax:
1 {string1,string2,string3}
This can be used to generate a list of strings.
1 $ echo {apple,banana,cherry}
2 apple banana cherry
8.10 Brace Expansion 365
8.10.3 Combining Expansions
The real power of brace expansion comes when we
combine the expansion with a static part.
1 $ echo file{1..5}.txt
2 file1.txt file2.txt file3.txt file4.txt file5.
txt
Brace expansion automatically expands to the carte-
sian product of the strings if multiple expansions
are present in a single token.
1 $ echo {a,b}{1,2}
2 a1 a2 b1 b2
We can also combine multiple tokens with space in
between by escaping the space.
1 $ echo {a,b}\ {1,2}
2 a 1 a 2 b 1 b 2
The expansion is done from left to right, and the
order of the tokens is preserved.
This can be used to create a list of files following
some pattern. Here we are using ascii en-
coded output of the tree
1 $ mkdir -p test/{a,b,c}/{1,2,3}
command for compatibil-
2 $ touch test/{a,b,c}/{1,2,3}/file{1..5}.txt ity with the book. Feel free
3 $ tree --charset ascii test to drop the --charset
4 test ascii when trying this out
5 |-- a in your terminal.
6 | |-- 1
7 | | |-- file1.txt
8 | | |-- file2.txt
9 | | |-- file3.txt
10 | | |-- file4.txt
11 | | ‘-- file5.txt
12 | |-- 2
13 | | |-- file1.txt
14 | | |-- file2.txt
15 | | |-- file3.txt
366 8 Shell Variables
16 | | |-- file4.txt
17 | | ‘-- file5.txt
18 | ‘-- 3
19 | |-- file1.txt
20 | |-- file2.txt
21 | |-- file3.txt
22 | |-- file4.txt
23 | ‘-- file5.txt
24 |-- b
25 | |-- 1
26 | | |-- file1.txt
27 | | |-- file2.txt
28 | | |-- file3.txt
29 | | |-- file4.txt
30 | | ‘-- file5.txt
31 | |-- 2
32 | | |-- file1.txt
33 | | |-- file2.txt
34 | | |-- file3.txt
35 | | |-- file4.txt
36 | | ‘-- file5.txt
37 | ‘-- 3
38 | |-- file1.txt
39 | |-- file2.txt
40 | |-- file3.txt
41 | |-- file4.txt
42 | ‘-- file5.txt
43 ‘-- c
44 |-- 1
45 | |-- file1.txt
46 | |-- file2.txt
47 | |-- file3.txt
48 | |-- file4.txt
49 | ‘-- file5.txt
50 |-- 2
51 | |-- file1.txt
52 | |-- file2.txt
53 | |-- file3.txt
54 | |-- file4.txt
55 | ‘-- file5.txt
56 ‘-- 3
8.11 History Expansion 367
57 |-- file1.txt
58 |-- file2.txt
59 |-- file3.txt
60 |-- file4.txt
61 ‘-- file5.txt
62
63 13 directories, 45 files
8.11 History Expansion
Definition 8.11.1 (History Expansion) History
expansion is a mechanism by which we can
refer to previous commands in the history list.
It is enabled using the H flag.
We can run history to see the list of commands in
the history. To re-run a command, we can use the !
operator along with the history number.
1 $ echo hello
2 hello
3 $ history | tail
4 499 man set
5 500 man set
6 501 man tree
7 502 tree --charset unicode
8 503 tree --charset ascii
9 504 history
10 505 tree --charset unicode
11 506 clear
12 507 echo hello
13 508 history | tail
14 $ !507
15 echo hello
16 hello
The command itself is output to the terminal before
it is executed.
368 8 Shell Variables
We can also refer to the last command using the !!
operator.
1 $ touch file1 file2 .hidden
2 $ ls
3 file1 file2
4 $ !! -a
5 ls -a
6 . .. .hidden file1 file2
One frequent use of history expansion is to run a
command as root. If we forget to run a command
as root, we can use the sudo command to run the
last command as root.
1 $ touch /etc/test
2 touch: cannot touch ’/etc/test’: Permission
denied
3 $ sudo !!
4 sudo touch /etc/test
8.12 Arrays
Definition 8.12.1 (Array) An array is a collec-
tion of elements, each identified by an index.
We can declare an array using the declare com-
mand.
1 $ declare -a arr
However, this is not necessary, as bash automatically
creates an array when we assign multiple values to
a variable.
1 $ arr=(1 2 3 4 5)
2 $ echo ${arr[2]}
3 3
8.12 Arrays 369
We can set the value of each element of the array
using the index.
1 $ arr[2]=6
2 $ echo ${arr[2]}
3 6
If we only access the variable without the index, it
will return the first element of the array.
1 $ arr=(1 2 3 4 5)
2 $ echo $arr
3 1
8.12.1 Length of Array
The length of the array can be found using the \#
operator.
1 $ arr=(1 2 3 4 5)
2 $ echo ${#arr[@]}
8.12.2 Indices of Array
Although it looks like a continuous sequence of
numbers, the indices of the array are not necessarily
continuous. Bash arrays are actually dictionaries or
hash-maps and the index is the key. Thus we may
also need to get the indices of the array.
1 $ arr=(1 2 3 4 5)
2 $ arr[10]=6
3 $ echo ${!arr[@]}
4 0 1 2 3 4 10
370 8 Shell Variables
8.12.3 Printing all elements of Array
To print all the elements of the array, we can use
the @ operator.
By default, the indices start from zero and are
incremented by one.
1 $ arr=(1 2 3 4 5)
2 $ echo ${arr[@]}
3 1 2 3 4 5
In indexed arrays, the indices are always integers,
however, if we try to use a non-integer index, it will
be treated as zero.
1 $ arr=(1 2 3 4 5)
2 $ arr["hello"]=6
3 $ echo ${arr[@]}
4 6 2 3 4 5
8.12.4 Deleting an Element
To delete an element of an array, we can use the
unset command.
1 $ arr=(1 2 3 4 5)
2 $ arr[10]=6
3 $ echo ${arr[@]}
4 1 2 3 4 5 6
5 $ echo ${!arr[@]}
6 0 1 2 3 4 10
7 $ unset arr[10]
8 $ echo ${arr[@]}
9 1 2 3 4 5
10 $ echo ${!arr[@]}
11 0 1 2 3 4
8.12 Arrays 371
8.12.5 Appending an Element
If we want to simply append an element to the
array without specifying the index, we can use the
+= operator. The index of the new element will be
the next integer after the last element.
1 $ arr=(1 2 3 4 5)
2 $ arr[10]=6
3 $ arr+=(7)
4 $ echo ${arr[@]}
5 1 2 3 4 5 6 7
6 $ echo ${!arr[@]}
7 0 1 2 3 4 10 11
We can also append multiple elements at once by
separating them with spaces.
8.12.6 Storing output of a command in
an Array
We can store the output of a command in a normal
variable using the = operator.
1 $ var=$(ls)
2 $ echo $var
3 file1 file2 file3
But if we want to iterate over the output of a com-
mand, we can store it in an array by surrounding
the command evaluation with parenthesis.
1 $ arr=($(ls))
2 $ echo ${arr[@]}
3 file1 file2 file3
4 $ echo ${!arr[@]}
5 0 1 2
6 $ echo ${arr[1]}
7 file2
372 8 Shell Variables
8.12.7 Iterating over an Array
We can iterate over an array using a for loop.
1 $ arr=(1 2 3 4 5)
2 $ for i in ${arr[@]}; do
3 > echo $i
4 > done
5 1
6 2
7 3
8 4
9 5
We will cover loops in more detail in Chapter 9.
Different Ways of Iterating
There are three ways to iterate over an array de-
pending on how we break the array.
Treat entire array as a single element
1 $ arr=("Some" "elements" "are" "multi word")
2 $ for i in "${arr[*]}"; do
3 > echo $i
4 > done
5 Some elements are multi word
Here, as we are using the * operator, the array
is expandded as string, and not array elements.
Further, as we have then quoted the variable, the
for loop does not break the string by the spaces.
Break on each word
This can be done in two ways, either by using the @
operator, or by using the * operator. In either case
we do not quote the variable.
8.13 Associative Arrays 373
1 $ arr=("Some" "elements" "are" "multi word")
2 $ for i in ${arr[*]}; do echo $i; done
3 Some
4 elements
5 are
6 multi
7 word
1 $ arr=("Some" "elements" "are" "multi word")
2 $ for i in ${arr[@]}; do echo $i; done
3 Some
4 elements
5 are
6 multi
7 word
Break on each element
Finally, the last way is to break on each element of
the array. This is often the desired way to iterate
over an array. We use the @ operator, and quote the
variable to prevent word splitting.
1 $ arr=("Some" "elements" "are" "multi word")
2 $ for i in "${arr[@]}"; do echo $i; done
3 Some
4 elements
5 are
6 multi word
8.13 Associative Arrays
If we want to store key-value pairs, we can use
associative arrays. In this, the index is not an integer,
but a string. It also does not automatically assign
the next index, but we have to specify the index.
We use the declare -A command to declare an as-
sociative array, although this is not necessary.
374 8 Shell Variables
1 $ declare -A arr
2 $ arr=(["name"]="Sayan" ["age"]=22)
3 $ echo ${arr[name]}
4 Sayan
5 $ arr["age"]=23
6 $ echo ${#arr[@]}
7 2
8 $ echo ${!arr[@]}
9 age name
10 $ echo ${arr[@]}
11 23 Sayan
12 $ unset arr[name]
In bash, the order of the elements is not preserved,
and the elements are not sorted. This was how
Python dictionaries worked before Python 3.7.
Shell Scripting 9
Now that we have learnt the basics of the bash shell
and the core utils, we will now see how we can
combine them to create shell scripts which let us do
even more complex tasks in just one command.
9.1 What is a shell script?
A shell script is a file that contains a sequence
of shell commands. When you run a shell script,
the commands in the script are executed in the
order they are written. Shell scripts are used to
automate tasks that would otherwise require a lot of
manual work. They are also used to create complex
programs that can be run from the command line.
9.2 Shebang
Definition 9.2.1 (Shebang) The shebang is a
special line at the beginning of a shell script that
tells the operating system which interpreter to
use to run the script. The shebang is written as
#! followed by the path to the interpreter.
For example, the shebang for a bash script is #!/
bin/bash. However, as /bin is a symbolic link to
/usr/bin in most systems, we can also specify the
path as #!/usr/bin/bash
The shebang is only needed if we run the script
directly from the shell, without specifying the in-
376 9 Shell Scripting
terpreter. This also requires the script to have the
execute permission set.
1 $ cat script.sh
2 #!/bin/bash
3 echo "Hello, World!"
4 $ bash script.sh
5 Hello, World!
6 $ ./script.sh
7 bash: ./script.sh: Permission denied
8 $ chmod +x script.sh
9 $ ./script.sh
10 Hello, World!
Here we are first running the script with the bash
command, which is why the shebang and execute
permission is not needed. However, when we try to
run the script directly, we get a permission denied
error as the permission is not set. We then set the
execute permission and run the script again, which
works.
We can also use the shebang for non-bash scripts,
for example, a python script can have the shebang
#!/usr/bin/python3.
1 $ cat script.py
2 #!/usr/bin/python3
3 print("Hello, World!")
4 $ python3 script.py
5 Hello, World!
6 $ chmod u+x script.py
7 $ ./script.py
8 Hello, World!
Even though we called the script without specifying
the interpreter, the shebang tells the shell to use
the python3 interpreter to run the script. This only
works if the file has the execute permission set.
If the shebang is absent, and a file is executed
without specifying the interpreter, the shell will try
9.3 Comments 377
to execute it in its own interpreter.
9.3 Comments
In shell scripts, comments are lines that are not
executed by the shell. Comments are used to doc-
ument the script and explain what the script does.
Comments in shell scripts start with a # character
and continue to the end of the line.
Although it may not be required when running a
command from the terminal, comments are sup-
ported there as well. Comments are most useful in
scripts, where they can be used to explain what the
script does and how it works.
1 $ echo "hello" # This is a comment
2 $ echo "world" # we are using the echo command
to print the word "world"
Here we use the # to start a line comment. Any
character after the pound sign is ignored by the
interpreter.
Remark 9.3.1 The shebang also starts with a #
character, so it is also ignored by the interpreter
when executing, however, if a file is executed
without mentioning the interpreter then the
shell reads its first line and find the path to the
interpreter using the shebang
9.3.1 Multiline Comments
Although bash does not have a built-in syntax for
multiline comments, we can use a trick to create
multiline comments. We can use the NOP command
378 9 Shell Scripting
:to create a multiline comment by passing the
multiline string to it.
1 $ : ’
2 This is a multiline comment
3 This is the second line
4 This is the third line
5 ’
9.4 Variables
We have already seen the various ways to create and
use variables in bash. We can use these variables in
scripts as well.
1 $ cat variables.sh
2 #!/bin/bash
3 name="alice"
4 declare -i dob=2001
5 echo "Hello, ${name^}! You are $((2024 - dob))
years old."
6 $ chmod u+x variables.sh
7 $ ./variables.sh
8 Hello, Alice! You are 23 years old.
The variables defined inside a script are not avail-
able outside the script if the script is executed in a
new environment. However, if we want to actually
retain the variables, we can use the source command
to run the script in the current environment.
This reads the script and executes it in the current
shell, so the variables defined in the script are
available in the current shell. The PID of the script
remains same as the executing shell, and no new
process is created. Since this only reads the file, the
execute permission is not needed.
This is useful when we want to set environment
variables or aliases using a script.
9.4 Variables 379
1 $ cat variables.sh
2 name="alice"
3 dob=2001
4 $ source variables.sh
5 $ echo "Hello, ${name^}! You are $((2024 - dob
)) years old."
6 Hello, Alice! You are 23 years old.
Similarly, if we want to run a script in a new envi-
ronment, but we still want it to have read access to
the variables declared in the current environment,
we can export the variables needed. In this example we can see
that if a variable is not de-
1 $ cat variables.sh fined in the scrtipt and we ac-
2 #!/bin/bash cess the variable in the script,
3 echo "Hello, ${name^}! You are $((2024 - dob)) it will take an empty string.
years old." If we define the variable in
4 $ chmod u+x variables.sh the parent shell and then call
the script, it will still not have
5 $ ./variables.sh
access to it. However, if we
6 Hello, ! You are 2024 years old.
export the variable, then call-
7 $ name="alice" ing the script will provide it
8 $ dob=2001 with the variable.
9 $ ./variables.sh
10 Hello, ! You are 2024 years old.
11 $ export name dob
12 $ ./variables.sh
13 Hello, Alice! You are 23 years old.
This is because export makes the variable into an
environment variable, which is available to all child
processes of the current shell. When a new envi-
ronment is created, the environment variables are
copied to the new environment, so the script has
access to the exported variables.
However, if we want to add some variable to the
environment of the script, but we do not want to
add it to our current shell’s environment, we can
directly specify the variable and the value before
running the script, in the same line. This sets the
380 9 Shell Scripting
variables for the script’s environment, but not for
the shell’s environment.
1 $ cat variables.sh
2 #!/bin/bash
3 echo "Hello, ${name^}! You are $((2024 - dob))
years old."
4 $ ./variables.sh
5 Hello, ! You are 2024 years old.
6 $ name=Alice dob=2001 ./script.sh
7 Hello, Alice! You are 23 years old.
9.5 Arguments
Just like we can provide arguments to a command,
we can also provide arguments to a script. These
arguments are stored in special variables that can
be accessed inside the script.
▶ $0 - The path of the script/interpreter by
which it is called.
▶ $1 - The first argument
▶ $2 - The second argument
..
.
▶ $@ - All arguments stored as an array
▶ $* - All arguments stored as a space separated
string
▶ $# - The number of arguments
1 $ cat arguments.sh
2 echo $0
3 $ ./arguments.sh
4 ./arguments.sh
5 $ bash arguments.sh
6 arguments.sh
7 $ source arguments.sh
8 /bin/bash
9 $ echo $0
9.5 Arguments 381
10 /bin/bash
If we know the number of arguments that will be
passed to the script, we can directly access them
using the $n syntax.
1 $ cat arguments.sh
2 echo $1 $2
3 echo $3
4 $ bash arguments.sh Hello World "This is a
multiword single argument"
5 Hello World
6 This is a multiword single argument
Here we can also see how to pass multiple words
as a single argument, by quoting it.
Sometimes, we may want to pass the value of a
variable as an argument, in those cases, we should
always quote the variable expansion, as if the vari-
able contains multiple words, bash will expand the
variables and treat each word as a separate argu-
ment, instead of the entire value of the variable as
a single argument.
Quoting the variable prevents word splitting.
1 $ cat split.sh
2 echo $1
3 $ var="Hello World"
4 $ bash split.sh $var
5 Hello
6 $ bash split.sh "$var"
7 Hello World
The first attempt prints only Hello as the second
word is stored in $2. But if we quote the variable
expansion, then the entire string is taken as one
parameter.
382 9 Shell Scripting
Double vs Single Quotes
There are some subtle differences between dou-
ble and single quotes. Although both are used to
prevent word splitting, single quotes also prevent
variable expansion, while double quotes do not.
1 $ echo "Hello $USER"
2 Hello sayan
3 $ echo ’Hello $USER’
4 Hello $USER
However, if we do not know the number of argu-
ments that will be passed to the script, we can use
the $@ and $* variables to access all the arguments.
1 $ cat all-arguments.sh
2 echo "$@"
3 $ bash all-arguments.sh Hello World "This is a
multiword single argument"
4 Hello World This is a multiword single
argument
Here we can see that $@ expands to all the arguments
as an array, while $* expands to all the arguments as
a single string. In this case, the output of both looks
same as echo its arguments side by side, separated
by a space.
We can observe the difference between $@ and $
* when we use them in a loop, or pass to any
command that shows output for each argument on
separate lines.
1 $ cat multiword.sh
2 touch "$@"
3 $ bash multiword.sh Hello World "This is a
multiword single argument"
4 $ ls -1
5 Hello
6 multiword.sh
7 ’This is a multiword single argument’
9.5 Arguments 383
8 World
In this case, since we are using "$@" (the quoting is
important), the touch command is called with each
argument as a separate argument, so it creates a
file for each argument. The last array element has
multiple words, but it creates a single file with the
entire sentence as the file name.
Whereas if we use $*, the arguments are stored
as a string, and it is split by spaces, so the touch
command will create a file for each word.
1 $ cat multiword.sh
2 touch $*
3 $ bash multiword.sh Hello World "This is a
multiword single argument"
4 $ ls -1
5 a
6 argument
7 Hello
8 is
9 multiword
10 multiword.sh
11 single
12 This
13 World
In this case, the touch command is called with each
word as a separate argument, so it creates a file for
each word.
Exercise 9.5.1 Similarly, try out the other way of
iterating over the arguments ("$*") and observe
the difference.
We can also iterate over the indices and access each
argument using a for loop. To find the number of
arguments, we can use $#.
1 $ cat args.sh
384 9 Shell Scripting
2 args=("$@")
3 echo $#
4 for ((i=0; i < $#; i++)); do
5 echo "${args[i]}"
6 done
7 $ bash args.sh hello how are you
8 4
9 hello
10 how
11 are
12 you
Here we are using $# to get the number of argu-
ments the script has recieved and then dynamically
iterating that many times to access each element of
the args array.
9.5.1 Shifting Arguments
Sometimes we may want to remove the first few
arguments from the list of arguments. We can do
this using the shift command.
shift is a shell builtin that shifts the arguments to
the left by one. The first argument is removed, and
the second argument becomes the first argument,
the third argument becomes the second argument,
and so on.
1 $ cat shift.sh
2 echo "First argument is $1"
3 shift
4 echo "Rest of the arguments are $*"
5 $ bash shift.sh one two three four
6 First argument is one
7 Rest of the arguments are two three four
9.6 Input and Output 385
9.6 Input and Output
Now that we have seen how to pass arguments to a
script, we will see how to take input from the user
and display output to the user.
This is called the standard streams in bash. There
are three standard streams in bash:
▶ Standard Input (stdin) - This is the input
stream that is used to read input from the
user. By default, this is the keyboard.
▶ Standard Output (stdout) - This is the output
stream that is used to display output to the
user. By default, this is the terminal.
▶ Standard Error (stderr) - This is the error
stream that is used to display error messages
to the user. By default, this is the terminal.
The streams can also be redirected to/from files or
other commands, as we have seen in Chapter 5.
9.6.1 Reading Input
To read input from the user, we can use the read
command. The read command reads a line of input
from the stdin and stores it in a variable.
Optionally, we can also provide a prompt to the
user, which is displayed before the user enters the
input.
1 $ read -p "Enter your name: " name
2 Enter your name: Alice
3 $ echo "Hello, $name!"
4 Hello, Alice!
This reads the input from the user and stores it in
the name variable, which is then used to display a
greeting message.
386 9 Shell Scripting
We can use the same command inside a script to
read input from the user, or take in any data passed
in standard input.
1 $ cat input.sh
2 read line
3 echo "STDIN: $line"
4 $ echo "Hello, World!" | bash input.sh
5 STDIN: Hello, World!
However, the read command reads only one line of
input. If we want to read multiple lines of input, we
can use a loop to read input until the user enters a
specific keyword.
1 $ cat multiline.sh
2 while read line; do
3 echo "STDIN: $line"
4 done
5 $ cat file.txt
6 Hello World
7 This is a multiline file
8 We have three lines in it
9 $ cat file.txt | bash multiline.sh
10 STDIN: Hello World
11 STDIN: This is a multiline file
12 STDIN: We have three lines in it
Here we are reading input from the file file.txt
and displaying each line of the file.
The redirection can be done more succinctly by
using the input redirection operator <. The pipe was
used to demonstrate that the input can be coming
from the output of any arbritary commands, not
just the content of a file.
1 $ bash multiline.sh < file.txt
2 STDIN: Hello World
3 STDIN: This is a multiline file
4 STDIN: We have three lines in it
9.7 Conditionals 387
9.7 Conditionals
Often times, in scripts, we require to decide between
two blocks of code to execute, depending on some
non-static condition. This can be either based on
some value inputted by the user, a value of a file on
the filesystem, or some other value fetched from the
sensors or over the internet. To facilitate branching
in the scripts, bash provides multiple keywords and
commands.
9.7.1 Test command
The test command is a command that checks the
value of an expression and either exits with a exit
1: Exit code 0 denotes suc-
code of 0 1 if the expression is true, or exits with a cess in POSIX
non-zero exit code if the expression is false.
It has a lot of unary and binary operators that can
be used to check various conditions. Here we evaluate two equa-
tions, 1 = 1 and 5 > 7. The
1 $ test 1 -eq 1 first equation is true, so the
2 $ echo $? test command exits with a
3 0 0 exit code, while the second
equation is false, so the test
4 $ test 5 -gt 7
command exits with a 1 exit
5 $ echo $? code.
6 1
String condtions
test can check for unary and binary conditions on
strings.
▶ -z - True if the string is empty
▶ -n - True if the string is not empty
▶ = - True if the strings are equal
▶ != - True if the strings are not equal
388 9 Shell Scripting
▶ < - True if the first string is lexicographically
less than the second string
▶ > - True if the first string is lexicographically
greater than the second string
Unary Operators
The -z and -n flags of test check if a string is empty
or not.
1 $ var="apple"
2 $ test -n "$var" ; echo $?
3 0
4 $ test -z "$var" ; echo $?
5 1
1 $ var=""
2 $ test -n "$var" ; echo $?
3 1
4 $ test -z "$var" ; echo $?
5 0
Binary Operators
The =, !=, <, and > flags of test check if two strings
are equal, not equal, less than, or greater than each
other.
1 $ var="apple"
2 $ test "$var" = "apple" ; echo $?
3 0
4 $ test "$var" != "apple" ; echo $?
5 1
1 $ var="apple"
2 $ test "$var" = "banana" ; echo $?
We are escaping the > and < 3 1
characters as they have spe-
4 $ test "$var" != "banana" ; echo $?
cial meaning in the shell. If
5 0
we do not escape them, the
shell will try to redirect the
1 $ var="apple"
input or output of the com-
mand to/from a file. We can 2 $ test "$var" \< "banana" ; echo $?
also quote the symbols in- 3 0
stead of escaping them.
9.7 Conditionals 389
4 $ test "$var" \> "banana" ; echo $?
5 1
Numeric conditions
test can also check for unary and binary conditions
on numbers.
▶ -eq - True if the numbers are equal
▶ -ne - True if the numbers are not equal
▶ -lt - True if the first number is less than the
second number
▶ -le - True if the first number is less than or
equal to the second number
▶ -gt - True if the first number is greater than
the second number
▶ -ge - True if the first number is greater than
or equal to the second number
1 $ test 5 -eq 5 ; echo $?
2 0
3 $ test 5 -ne 5 ; echo $?
4 1
5 $ test 5 -ge 5 ; echo $?
6 0
7 $ test 5 -lt 7 ; echo $?
8 0
9 $ test 5 -le 7 ; echo $?
10 0
11 $ test 5 -gt 7 ; echo $?
12 1
13 $ test 5 -ge 7 ; echo $?
14 1
File conditions
The most number of checks in test are for files. We
can check if a file exists, if it is a directory, if it is a
390 9 Shell Scripting
regular file, if it is readable, writable, or executable,
and many more.
File Types
▶ -e and -a - if the file exists
▶ -f - if the file is a regular file
▶ -d - if the file is a directory
▶ -b - if the file is a block device
▶ -c - if the file is a character device
▶ -h and -L - if the file is a symbolic link
▶ -p - if the file is a named pipe
▶ -S - if the file is a socket
File Permissions
▶ -r - if the file is readable
▶ -w - if the file is writable
▶ -x - if the file is executable
▶ -u - if the file has the setuid bit set
▶ -g - if the file has the setgid bit set
▶ -k - if the file has the sticky bit set
▶ -O - if the file is effectively owned by the user
▶ -G - if the file is effectively owned by the user’s
group
Binary Operators
▶ -nt - if the first file is newer than the second
file
▶ -ot - if the first file is older than the second
file
▶ -ef - if the first file is a hard link to the second
file
Other conditions
▶ -o - Logical OR (When used in binary opera-
tion)
▶ -a - Logical AND
9.7 Conditionals 391
▶ ! - Logical NOT
▶ -v - if the variable is set
▶ -o - if the shopt option is set (when used as
unary operator)
The test built-in can be invoked in two ways, either
by using the test word or by using the [ built-in.
The [ is a synonym for the test command, and is
used to make the code more readable. If we use the [
command, the last argument must be ]. This makes
it look like a shell syntax rather than a command,
but we should keep in mind that the test command
(even the [ shortcut) is a command built-in, and not
a shell keyword, thus we need to use a space after
the command.
1 $ [ 1 -eq 1 ] ; echo $?
2 0
3 $ var=apple
4 $ [ "$var" = "apple" ] ; echo $?
5 0
Remark 9.7.1 The test and [ exist as both a
executable stored in the filesystem and also as
a shell built-in in bash. However the built-in
takes preference when we simply type the name.
Almost all operators are same in both, however
the executable version cannot check if a variable
is set using the -v option since executables dont
have access to the shell variables.
9.7.2 Test Keyword
Although we already have the test executable and
its shorthand [ as built-in, bash also provides a
keyword [[ which is a more powerful version of
2: And other popular shells
the test command. This is present in bash 2 but not like zsh
392 9 Shell Scripting
present in the POSIX standard.
1 $ type -a test
2 test is a shell builtin
3 test is /usr/bin/test
4 $ type -a [
5 [ is a shell builtin
6 [ is /usr/bin/[
7 $ type -a [[
8 [[ is a shell keyword
This is an improvement over the test command, as
it has more features and is more reliable.
▶ It does not require us to quote variables
▶ It supports logical operators like && and ||
▶ It supports regular expressions
▶ It supports globbing
Quoting and Empty Variables
1 $ var=""
2 $ [ $var = "apple" ] ; echo $?
3 -bash: [: =: unary operator expected
4 2
5 $ [ "$var" = "apple" ] ; echo $?
6 1
As we can see, the test command requires us to
quote the variables, otherwise it will throw an error
if the variable is empty. However, the [[ keyword
does not require us to quote the variables.
1 $ var=""
2 $ [[ $var = "apple" ]] ; echo $?
3 1
Logical Operators
The logical operators to combine multiple condi-
tions are && and ||. In the test command, we can
use the -a and -o operators, or we can use && and ||
outside by combining multiple test commands.
9.7 Conditionals 393
1 $ var="apple"
2 $ [ "$var" = "apple" -a 1 -eq 1 ] ; echo $?
3 0
4 $ [ "$var" = "apple" ] && [ 1 -eq 1 ] ; echo $
?
5 0
But in the [[ keyword, we can use the && and ||
operators directly.
1 $ var="apple"
2 $ [[ "$var" = "apple" && 1 -eq 1 ]] ; echo $?
3 0
Comparison Operators
For string comparison, we had to escape the > and
<characters in the test command, but in the [[
keyword, we can use them directly.
1 $ var="apple"
2 $ [[ "$var" < "banana" ]] ; echo $?
3 0
Regular Expressions
The [[ keyword also supports regular expressions
using the =~ operator.
If we want to check whether the variable response
is y or yes, in test we would do the following.
1 $ response="yes"
2 $ [ "$response" = "y" -o "$response" = "yes" ]
; echo $?
3 0
But in [[ we can use the =~ operator to check if the
variable matches a regular expression.
1 $ response="yes"
2 $ [[ "$response" =~ ^y(es)?$ ]] ; echo $?
3 0
394 9 Shell Scripting
It also uses ERE (Extended Regular Expressions)
by default, so we can use the ? operator without
escaping it to match 0 or 1 occurrence of the previous
character.
Warning 9.7.1 When using regular expressions
in bash scripts, we should never quote the reg-
ular expression, as it will be treated as a string
and not a regular expression.
Globbing
We can also simply match for any response starting
with y using globbing.
1 $ response="yes"
2 $ [[ "$response" == y* ]] ; echo $?
3 0
Double Equals
In the [[ keyword, we can use the == operator to
check for string equality, which is a widely known
construct from most languages.
9.8 If-elif-else
9.8.1 If
The if statement is used to execute a block of code
if a condition is true. If the condition is false, the
block of code is skipped.
The end of the if statement The syntax of the if statement is as follows:
is denoted by the fi key-
word, which is the reverse 1 if command ; then
of the if keyword. 2 # code to execute if the command is
successful
3 else
9.8 If-elif-else 395
4 # code to execute if the command is not
successful
5 fi
Unlike most programming language that takes a
boolean expression, the if statement in bash takes
a command, and executes the command. If the
command exits with a 0 exit code, the block of code
after the then keyword is executed, otherwise the
block of code after the else keyword is executed.
This allows us to use any command that we have
seen so far in the if statement. The most used
command is the test command, which is used to
check conditions.
1 $ var="apple"
2 $ if [ "$var" =~ ^a.* ] ; then
3 > echo "The variable starts with a"
4 > fi
5 The variable starts with a
However, we can also use other commands, like
the grep command, to check if a file contains a
pattern.
1 $ cat file.txt
2 Hello how are you
3 $ if grep -q "how" file.txt ; then
4 > echo "The file contains the word how"
5 > fi
6 The file contains the word how
7 $ if grep -q "world" file.txt ; then
8 > echo "The file contains the word world"
9 > fi
The first if statement checks if the file contains
the word how, which is present, and thus prints the
message to the screen. The second if statement
checks if the file contains the word world, which is
not present, and thus does not print anything.
396 9 Shell Scripting
9.8.2 Else
The else keyword is used to execute a block of
code if the command in the if statement is not
successful.
1 $ var="apple"
2 $ if [ "$var" = "banana" ] ; then
3 > echo "The variable is banana"
4 > else
5 > echo "The variable is not banana"
6 > fi
7 The variable is not banana
The combination of the if and else keywords allows
us to execute different blocks of code depending on
the condition; this is one of the most used construct
in scripting.
1 $ read -p "Enter a number: " num
2 Enter a number: 5
3 $ if [[ $num -gt 0 ]] ; then
4 > echo "The number is positive"
5 > else
6 > echo "The number is negative"
7 > fi
8 The number is positive
However, notice that this allows us only two branches,
one for the condition being true, and one for the
condition being false. Due to this, we have mis-
classified the number 0 as positive, which is not
correct.
Figure 9.1: Flowchart of the To fix this, we can use the elif keyword, which is
if, elif, and else con- short for else if.
struct
9.9 Exit code inversion 397
9.8.3 Elif
If the command in the if statement is not success-
ful, we can run another command using the elif
keyword and decide the branching based on its exit
status.
1 $ read -p "Enter a number: " num
2 Enter a number: 0
3 $ if [[ $num -gt 0 ]] ; then
4 > echo "The number is positive"
5 > elif [[ $num -lt 0 ]] ; then
6 > echo "The number is negative"
7 > else
8 > echo "The number is zero"
9 > fi
10 The number is zero
Here we have added an elif statement to check if
the number is less than 0, and if it is, we print that
the number is negative, finally if both the commands
are unsuccessful, we execute the statements in the
else block.
9.9 Exit code inversion
We can also invert the condition using the ! opera-
tor.
1 $ var=apple
2 $ if ! [ "$var" = "banana" ] ; then
3 > echo "The variable is not banana"
4 > fi
5 The variable is not banana
398 9 Shell Scripting
9.10 Mathematical Expressions as
if command
We can also use the (( keyword to evaluate mathe-
matical expressions in the if statement. This envi-
ronment does not print any output to the standard
output, rather, it exits with a zero exit code if the
result of the mathematical evaluation is non-zero,
and exits with a non-zero exit code if the evalua-
tion results to zero. This is useful as 0 is false in
mathematics but 0 is success exit code in the shell.
1 $ if (( 5 > 3 )) ; then
2 > echo "5 is greater than 3"
3 > fi
4 5 is greater than 3
However, this environment only supports integers,
and not floating point numbers.
9.11 Command Substitution in if
We can also run a command in the if statement,
and compare the output of the command to a value
using the test command and the $(()) construct.
1 $ if [ $(wc -l < file.txt) -gt 100 ] ; then
2 > echo "The file is big"
3 > fi
4 The file is big
9.12 Switch
Definition 9.12.1 (Switch Case) A switch case is
a syntactical sugar that helps match a variable’s
9.12 Switch 399
value against a list of options (or patterns) and
executes the block of code corresponding to the
match.
The branching achieved by switch case can also
be achieved by using multiple if-elif-else state-
ments, but the switch case is more readable and
concise.
Syntax The case block is ended
with the keyword esac,
1 case var in
which is the reverse of the
2 pattern1) case keyword.
3 # code to execute if var matches pattern1
4 ;;
5 pattern2)
6 # code to execute if var matches pattern2
7 ;;
8 *)
9 # code to execute if var does not match
any pattern
10 ;;
11 esac
The case keyword is followed by the variable to
match against, and the in keyword. Then we have
multiple patterns, each followed by a block of code
to execute if the variable matches the pattern.
The ;; keyword is used to denote the end of a
block of code, and the start of the next pattern. This
ensures that the script does not continue running
the statements of the other blocks after it as well.
The * pattern is a wildcard pattern, and matches
any value that does not match any of the other
patterns.
The patterns in the switch case are similar to globs
rather than regular expressions, so we cannot use
regular expressions in the switch case.
400 9 Shell Scripting
Although sometimes we actually want to execute
the blocks after the match. This is called fall through.
9.12.1 Fall Through
If we replace the ;; with ;& or ;;& then the control
In this example, the script
will flow into the next block.
will print both the state-
ments "Multiple of 10" and 1 $ cat switch.sh
"Multiple of 5" for numbers
2 read -p "Enter number: " num
ending with zero because of
the fall through ;& instead 3 case "$num" in
of ;;. 4 *0) echo "Multiple of 10" ;&
5 *5) echo "Multiple of 5" ;;
6 *) echo "Not Multiple of 5" ;;
7 esac
8 $ bash switch.sh
9 Enter number: 60
10 Multiple of 10
11 Multiple of 5
9.12.2 Multiple Patterns
We can also have multiple patterns for a single
block of code by separating the patterns with the |
operator.
1 $ cat switch.sh
2 read -p "Enter digit: " num
3 case "$num" in
4 1|2|3) echo "Small number" ;;
5 4|5|6) echo "Medium number" ;;
6 7|8|9) echo "Large number" ;;
7 *) echo "Invalid number" ;;
8 esac
9 $ bash switch.sh
10 Enter digit: 5
11 Medium number
9.13 Select Loop 401
9.13 Select Loop
Definition 9.13.1 (Select Loop) A select loop is
a construct that is used to create a menu in the
shell script. It takes a list of items and displays
them to the user, and then waits for the user to
select an item. The selected item is stored in a
variable, and the block of code corresponding
to the selected item is executed. This repeats
infinitely, until the user stops it.
1 $ cat select.sh
2 select choice in dog cat bird stop
3 do
4 case $choice in
5 dog) echo "A dog barks" ;;
6 cat) echo "A cat meows" ;;
7 bird) echo "A bird chirps" ;;
8 stop) break ;;
9 *) echo "Invalid choice" ;;
10 esac
11 done
12 $ bash select.sh
13 1) dog
14 2) cat
15 3) bird
16 4) stop
17 #? 1
18 A dog barks
19 #? 2
20 A cat meows
21 #? 3
22 A bird chirps
23 #? 4
A select loop is simply a syntactical sugar for a
while loop that displays a menu to the user and
waits for the user to select an option and reads it
using read. The conditional branching is done using
402 9 Shell Scripting
a case statement.
The menu output is actually displayed on the stan-
dard error stream, so that it is easy to split the
conditional output from the menu output.
1 $ bash select.sh > output.txt
2 1) dog
3 2) cat
4 3) bird
5 4) stop
6 #? 1
7 #? 2
8 #? 3
9 #? 4
10 $ cat output.txt
11 A dog barks
12 A cat meows
13 A bird chirps
To stop the select loop, we can use the break key-
word. We will see more about loops in the next
section.
9.14 Loops
If we want to execute a block of code multiple times,
we can use loops. There are three types of loops in
bash:
▶ For loop - Used to iterate over a list of items
or a fixed number of times
▶ While loop - Used to execute a block of code
as long as a condition is true
▶ Until loop - Used to execute a block of code
as long as a condition is false
9.14 Loops 403
9.14.1 For loop
In bash, we can use the for loop to iterate over a
list of items, called a for-each loop 3 , or a range of 3: This is similar to how the
numbers 4 . for loop in Python works.
4: This is similar to how the
for loop in C and C like lan-
guages works.
For-each loop
1 $ cat foreach.sh
2 for item in mango banana strawberry; do
3 echo "$item shake"
4 done
5 $ bash foreach.sh
6 mango shake
7 banana shake
8 strawberry shake
As we saw in Chapter 8, there are three ways to
iterate over an array in bash. We can either treat
the entire array as a single element ("${arr[*]}"),
we can break the elements by spaces (${arr[@]}), or
we can split as per the array elements, preserving
multi-word elements ("${arr[@]}").
Treating entire array as a single element:
1 $ cat forsplit.sh
2 name=("Sayan" "Alice" "John Doe")
3 for name in "${name[*]}"; do
4 echo "Name: $name"
5 done
6 $ bash forsplit.sh
7 Name: Sayan Alice John Doe
Splitting by spaces:
1 $ cat forsplit.sh
2 name=("Sayan" "Alice" "John Doe")
3 for name in ${name[@]}; do
4 echo "Name: $name"
5 done
404 9 Shell Scripting
6 $ bash forsplit.sh
7 Name: Sayan
8 Name: Alice
9 Name: John
10 Name: Doe
Preserving multi-word elements:
1 $ cat forsplit.sh
2 name=("Sayan" "Alice" "John Doe")
3 for name in "${name[@]}"; do
4 echo "Name: $name"
5 done
6 $ bash forsplit.sh
7 Name: Sayan
8 Name: Alice
9 Name: John Doe
We can also dynamically generate a list of numbers
in a range using the {start..end} syntax or the seq
command.
1 $ cat range.sh
2 for i in {1..5}; do
3 echo "Number: $i"
4 done
5 $ bash range.sh
6 1
7 2
8 3
9 4
10 5
1 $ cat range.sh
2 for i in $(seq 1 5); do
3 echo "Number: $i"
4 done
5 $ bash range.sh
6 1
7 2
8 3
9 4
9.14 Loops 405
10 5
Differences between range expansion and seq
command
Although both seq and the range expansion bash
syntax have similar functionality, they have slightly
different behaviour, as seen in Table 9.1.
Table 9.1: Differences between range expansion and seq command
Range Expansion Seq Command
It is a bash feature hence it is faster It is an external command
Only integer step size Fractional step size is allowed
Works on letters Works only on numbers
Step size should be the third argument Step size should be the second argument
Output is space separated Output is newline separated
Start range cannot be omitted Start range is 1 by default
Letters:
1 $ echo {a..e}
2 a b c d e
3 $ seq a e
4 seq: invalid floating point argument: ’a’
5 Try ’seq --help’ for more information.
Position of step size:
1 $ echo {1..10..2}
2 1 3 5 7 9
3 $ seq 1 2 10
4 1
5 3
6 5
7 7
8 9
Note that if the shell is not
able to expand a range ex-
Fractional step size:
pansion due to invalid syn-
1 $ seq 1 0.5 2 tax it will not throw an er-
ror, rather it will simply keep
2 1.0
it unexpanded. This is sim-
3 1.5
ilar to how path expansion
works.
406 9 Shell Scripting
4 2.0
5 $ echo {1..2..0.5}
6 {1..2..0.5}
Default start range:
1 $ seq 5
2 1
3 2
4 3
5 4
6 5
Different between python’s range and seq
command:
The seq command is similar to the range function
in Python, but there are some differences between
the two.
Table 9.2: Differences between seq and python’s range function
Python’s Range Seq Command
Start of range is 0 by default Start range is 1 by default
Order of parameters is start,end,step Order of parameters is start,step,end
End range is exclusive End range is inclusive
9.14.2 C style for loop
Bash also supports C-style for-loops, where we
declare a variable, a condition for the loop, and an
increment command.
1 $ cat cstyle.sh
2 for ((i=0; i<5; i++)); do
3 echo "Number: $i"
4 done
5 $ bash cstyle.sh
9.14 Loops 407
6 Number: 0
7 Number: 1
8 Number: 2
9 Number: 3
10 Number: 4
We can also have multiple variables in the C-style
for loop.
1 $ cat cstyle.sh
2 begin1=1
3 begin2=10
4 finish=10
5 for (( a=$begin1, b=$begin2; a < $finish; a++,
b-- )); do
6 echo $a $b
7 done
8 $ bash cstyle.sh
9 1 10
10 2 9
11 3 8
12 4 7
13 5 6
14 6 5
15 7 4
16 8 3
17 9 2
9.14.3 IFS
By default the for loop splits the input by tabs,
spaces, and newlines. We can change the delimiter
by changing the IFS variable.
Definition 9.14.1 (IFS) The IFS variable is the
Internal Field Separator, and is used to split the
input into fields.
408 9 Shell Scripting
It is used by for loop and other word splitting
operations in bash.
Default value of IFS is $’ \t\n’. If set to multiple
characters, it will split the input by any of the
5: The $’’ syntax is used
characters. 5
to denote ANSI escape se-
quences in the string in bash. 1 $ cat ifs.sh
In this example we are 2 IFS=:
reading the $PATH variable, 3 for i in $PATH; do
which is a colon separated
4 echo $i
list of directories. We are
5 done
changing the IFS variable to
a colon, so that the for loop 6 $ bash ifs.sh
splits the input by colon. 7 /usr/local/sbin
8 /usr/local/bin
9 /usr/bin
We should remember to reset the IFS variable after
using it, as it can cause unexpected behaviour in
other parts of the script.
1 $ cat unsetifs.sh
2 IFS=:
3 # some code
4
5 var="a b c:d"
6 for i in $var; do
7 echo $i
8 done
9 $ bash unsetifs.sh
10 a b c
11 d
Although we wanted to iterate over the elements by
spliting by space, we ended up splitting by colon
because we forgot to reset the IFS variable.
To reset the IFS variable, we can simply unset it.
1 $ cat unsetifs.sh
2 IFS=:
3 # some code
4
9.14 Loops 409
5 unset IFS
6 var="a b c:d"
7 for i in $var; do
8 echo $i
9 done
10 $ bash unsetifs.sh
11 a
12 b
13 c:d
Unsetting the IFS variable will reset it to the default
value of $’ \t\n’.
However, setting and resetting the IFS variable can
be cumbersome, so we can use a subshell to change
the IFS variable only for the for loop.
1 $ cat subshellifs.sh
2 var="a b c:d"
3 (
4 IFS=:
5 for i in $var; do
6 echo $i
7 done
8 )
9
10 for i in $var; do
11 echo $i
12 done
13 $ bash subshellifs.sh
14 a b c
15 d
16 a
17 b
18 c:d
9.14.4 While loop
The while loop is used to execute a block of code as
long as a condition is true. This is useful if we do not
410 9 Shell Scripting
know the number of iterations beforehand, rather
we know the condition that should be satisfied.
Just like the if statement, the while loop also takes
a command, and executes the block of code if the
command is successful. This means we can run
any arbritary command in the while loop condition
checking.
1 $ cat while.sh
2 i=5
3 while [ $i -gt 0 ]; do
4 echo "i is $i"
5 ((i--))
6 done
7 $ bash while.sh
8 i is 5
9 i is 4
10 i is 3
11 i is 2
12 i is 1
Here we are using the test command to check if
the variable i is greater than 0, and if it is, we print
the value of i and decrement it by 1.
However, test is not the only command that we can
Here we are using the grep use in the while loop condition.
command to check if the
password contains any 1 $ cat pass.sh
special character or not, and 2 read -p "Enter password: " pass
looping as long as the user 3
enters a strong password. 4 while ! grep -q ’[[:punct:]]’ <<< "$pass" ; do
5 echo "Password must contain at least one
Ideally when reading
special character"
passwords, we should use
6 read -p "Enter password: " pass
the read -s command to
hide the input from the 7 done
user, however it is ommited 8 echo "Password is set"
from the example so that 9 $ bash pass.sh
the input can be seen by the10 Enter password: 123
reader.
11 Password must contain at least one special
character
9.14 Loops 411
12 Enter password: abc
13 Password must contain at least one special
character
14 Enter password: sayan
15 Password must contain at least one special
character
16 Enter password: $ayan
17 Password is set
9.14.5 Until loop
The until loop is used to execute a block of code as
long as a condition is false. This is simply a negation
of the while loop. It is a syntactical sugar. The last
example we saw in the while loop can be rewritten
using the until loop to omit the ! symbol.
1 $ cat pass.sh
2 read -p "Enter password: " pass
3
4 until grep -q ’[[:punct:]]’ <<< "$pass" ; do
5 echo "Password must contain at least one
special character"
6 read -p "Enter password: " pass
7 done
8 echo "Password is set"
9 $ bash pass.sh
10 Enter password: 123
11 Password must contain at least one special
character
12 Enter password: abc
13 Password must contain at least one special
character
14 Enter password: sayan
15 Password must contain at least one special
character
16 Enter password: $ayan
17 Password is set
412 9 Shell Scripting
9.14.6 Read in while
The read command is used to read one line of input
from the user, however if we want to read multiple
lines of input, and we do not know the number of
lines beforehand, we can use the while loop to read
input until the user enters a specific value, or the
6: The end of input is de-
input stops. 6
noted by the EOF charac-
ter, if we are reading input 1 $ cat read.sh
from standard input, we can
2 while read line; do
press Ctrl+D to send the
3 echo "Line is $line"
EOF character and mark the
end of input. 4 done
5 $ bash read.sh
6 hello
7 Line is hello
8 this is typed input
9 Line is this is typed input
10 now press ctrl+D
11 Line is now press ctrl+D
We can also read from a file using the < operator.
Since the entire while loop is a command, we can
use the < operator to redirect the input to the while
Here we are reading the / loop after the ending done keyword.
etc/passwd file line by line
1 $ cat read.sh
and printing the username.
The username is the first 2 while read line; do
field in the /etc/passwd 3 echo ${line%%:*}
file, which is separated by 4 done < /etc/passwd
a colon. The username is ex- 5 $ bash read.sh
tracted by removing every- 6 root
thing after the first colon us-
7 bin
ing shell variable manipula-
tion. 8 daemon
9 mail
10 ftp
11 http
12 nobody
13 dbus
14 systemd-coredump
15 systemd-network
9.14 Loops 413
We can also use the IFS variable to split each row
by a colon, and extract the username.
In this example, we are set-
1 $ cat read.sh ting the IFS only for the
2 while IFS=: read username pass uid gid gecos while loop, and it gets re-
home shell; do set after the loop. If we pro-
vide multiple variables to the
3 echo $username - ${gecos:-$username}
read command, it will split
4 done < /etc/passwd
the input by the IFS variable
5 $ bash read.sh and assign the split values to
6 root - root the variables.
7 bin - bin
8 daemon - daemon
9 mail - mail
10 ftp - ftp
11 http - http
12 nobody - Kernel Overflow User
13 dbus - System Message Bus
14 systemd-coredump - systemd Core Dumper
15 systemd-network - systemd Network Management
If the number of variables provided to the read
command is less than the number of fields in the
input, the remaining fields are stored in the last
variable. This can be utilized to read only the first
field of the input, and discard the rest. In this example, we are read-
ing only the first field of
1 $ cat read.sh the input, and discarding the
2 while IFS=: read username _; do rest. In bash (and many other
3 echo $username languages) the underscore
4 done < /etc/passwd variable is used to denote
5 $ bash read.sh
a variable that is not to be
used.
6 root
7 bin
8 daemon
9 mail
10 ftp
11 http
12 nobody
13 dbus
14 systemd-coredump
15 systemd-network
414 9 Shell Scripting
9.14.7 Break and Continue
The break and continue keywords are used to con-
trol the flow of the loop.
▶ Break - The break keyword is used to exit
the loop immediately. This skips the current
iteration, and also all the next iterations.
▶ Continue - The continue keyword is used to
skip the rest of the code in the loop and go to
the next iteration.
Break:
1 $ cat pat.sh
2 for i in {1..5}; do
3 for j in {1..5}; do
4 if [[ "$j" -eq 3 ]]; then
5 break;
6 fi
7 echo -n $j
8 done
9 echo
10 done
11 $ bash pat.sh
12 12
13 12
14 12
15 12
16 12
Continue:
1 $ cat pat.sh
2 for i in {1..5}; do
3 for j in {1..5}; do
4 if [[ "$j" -eq 3 ]]; then
5 continue;
9.14 Loops 415
6 fi
7 echo -n $j
8 done
9 echo
10 done
11 $ bash pat.sh
12 1245
13 1245
14 1245
15 1245
16 1245
Unlike most languages, the break and continue key-
words in bash also takes an optional argument. This
argument is the number of loops to break or con-
tinue. If we change the break keyword to break 2,
it will break out of the outer loop, and not just the
inner loop.
1 $ cat pat.sh
2 for i in {1..5}; do
3 for j in {1..5}; do
4 if [[ "$j" -eq 3 ]]; then
5 break 2;
6 fi
7 echo -n $j
8 done
9 echo
10 done
11 $ bash pat.sh
12 12
The same works for the continue keyword as well.
1 $ cat pat.sh
2 for i in {1..5}; do
3 for j in {1..5}; do
4 if [[ "$j" -eq 3 ]]; then
5 continue 2;
6 fi
7 echo -n $j
8 done
416 9 Shell Scripting
9 echo
10 done
11 $ bash pat.sh
12 1212121212
9.15 Functions
If the script is too long, it is usually better to split
the script into multiple concise functions that does
only one task.
There are three ways to define a function in bash:
Without the function keyword:
1 abc(){
2 commands
3 }
function keyword with brackets
1 function abc(){
2 commands
3 }
function keyword without brackets
1 function abc {
2 commands
3 }
Example:
1 $ cat functions.sh
2 sayhi(){
3 echo "Hello, $1"
4 }
5
6 sayhi "John"
7 $ bash functions.sh
8 Hello, John
9.15 Functions 417
Arguments in Functions
Just like we can use $1 inside a script to refer to the
first argument, we can use $1 inside a function to
refer to the first argument passed to the function.
The arguments passed to the script are not available
inside the function, instead the arguments passed
to the function are available inside the function.
1 $ cat fun.sh
2 fun(){
3 echo $1 $2
4 }
5 echo "Full name:" $1 $2
6 fun "Hello" "$1"
7 $ bash fun.sh John Appleseed
8 Full name: John Appleseed
9 Hello John
Return value of a function
Functions in bash do not return a value, rather they
exit with a status code. However, functions can print
the value to the standard output, and the caller can
capture the output of the function using command
substitution. In this example, we are creat-
ing a simple calculator script
1 $ cat calc.sh that takes two operands and
2 add(){ an operator, and returns the
3 echo $(($1 + $2)) result. The operator selection
4 } is done using a select loop,
and the operands are read
5
using read. We have modu-
6 mul(){ larized the script by creating
7 echo $(($1 * $2)) two functions, add and mul,
8 } that takes two operands and
9 returns the result.
10 select operator in plus multiply ; do
11 read -p "Operand 1: " op1
12 read -p "Operand 2: " op2
418 9 Shell Scripting
13 case "$operator" in
14 plus) answer=$(add $op1 $op2) ;;
15 multiply) answer=$(mul $op1 $op2) ;;
16 *) echo "Invalid option" ;;
17 esac
18 echo "Answer is $answer"
19 done
20 $ bash calc.sh
21 1) plus
22 2) multiply
23 #? 1
24 Operand 1: 5
25 Operand 2: 4
26 Answer is 9
27 #? 2
28 Operand 1: 6
29 Operand 2: 3
30 Answer is 18
31 #?
However, sometimes we may want to return from a
function as a means to exit the function early. We
may also want to signal the success or failure of
the function. Both of these can be done using the
In this example, we have cre- return shell built-in.
ated a function that prints
the numbers from 0 to the 1 $ cat return.sh
number passed to the func- 2 fun(){
tion, but if the number
3 if [[ $1 -lt 0 ]]; then
passed is negative, the func-
4 return 1
tion returns early with a sta-
tus code of 1. If the number if 5 fi
greater than 9, it only prints 6 local i=0
till 9, and then returns. 7 while [[ $i -lt $1 ]]; do
8 if [[ $i -gt 9 ]]; then
9 echo
10 return
11 fi
12 echo -n $i
13 ((i++))
14 done
15 echo
9.15 Functions 419
16 }
17
18 fun 5
19 echo return value: $?
20 fun 15
21 echo return value: $?
22 fun -5
23 echo return value: $?
24 $ bash return.sh
25 01234
26 return value: 0
27 0123456789
28 return value: 0
29 return value: 1
Remark 9.15.1 If no value is provided to the
return command, it returns the exit status of the
last command executed in the function.
Local variables in functions
By default, all variables in bash are global, and are
available throughout the script. Even variables de-
fined inside a function are global, and are available
outside the function. This might cause confusion,
as the variable might be modified by the function,
and the modification might affect the rest of the
script.
To make a variable local to a function, we can use
the local keyword.
1 $ cat local.sh
2 fun(){
3 a=5
4 local b=10
5 }
6
7 fun
420 9 Shell Scripting
8 echo "A is $a"
9 echo "B is $b"
10 $ bash local.sh
11 A is 5
12 B is
If a variable is declared using
the declare keyword, it is As it is seen in the example, the variable a is avail-
global if defined outside a able outside the function, but the variable b is not
function and local if defined
inside a function.
available outside the function since it is defined
using the local shell built-in.
9.16 Debugging
Sometimes, especially when writing and debugging
scripts, we may want to print out each line that the
interpreter is executing, so that we can trace the
control flow and find out any logical errors.
This can be done by setting the x flag of bash using
set -x.
1 $ cat debug.sh
2 fun(){
3 echo $(($1 + $2))
4 }
5
6 fun $(fun $(fun 1 2) 3) 4
7 $ bash debug.sh
8 10
In the above script, we are calling the fun function
multiple times, and it is difficult to trace the control
flow.
To trace the control flow, we can set the x flag using
the set command.
1 $ cat debug.sh
2 set -x
3 fun(){
9.17 Recursion 421
4 echo $(($1 + $2))
5 }
6
7 fun $(fun $(fun 1 2) 3) 4
8 $ bash debug.sh
9 +++ fun 1 2
10 +++ echo 3
11 ++ fun 3 3
12 ++ echo 6
13 + fun 6 4
14 + echo 10
15 10
Now we can see the control flow of the script, and
we can trace the execution of the script. The + is the
PS4 prompt, which denotes that the line printed is
a trace line and not real output of the script.
Remark 9.16.1 The trace output of a script is
printed to the standard error stream, so that it
does not interfere with the standard output of
the script.
The PS4 prompt is repeated for each level of the
function call, and is incremented by one for each
level. This helps us visualize the call stack and order
of execution of the script.
9.17 Recursion
Just like other programming languages, bash also
supports recursion. However, since functions in
bash do not return a value, it becomes terse to
use recursion in bash by using command substitu-
tions.
1 fibo(){
422 9 Shell Scripting
2 if [[ $1 -le 2 ]]; then
3 echo 1
4 else
5 echo $(($(fibo $(($1-1))) + $(fibo $(($1
-2)))))
6 fi
7 }
8
9 for i in {1..10}; do
10 fibo $i
11 done
In the above example, we are calculating the first ten
elements of the fibonacci series using recursion. We
need to use command substitution to capture the
output of the function, and then use mathematical
evaluation environment to add the two values.
However, most recursive solutions, including this
one, are not efficient, and are not recommended for
use in bash as they will be very slow. If we time the
function with a argument of 20, we see it takes a lot
of time.
1 $ time fibo 20
2 6765
3
4 real 0m13.640s
5 user 0m10.010s
6 sys 0m3.187s
An iterative solution would be much faster and
efficient.
1 $ cat fibo.sh
2 fibo(){
3 declare -i a=0
4 declare -i b=1
5 for (( i=1; i<=$1; i++ )); do
6 declare -i c=$a+$b
7 a=$b
8 b=$c
9.18 Shell Arithmetic 423
9 done
10 echo $a
11 }
12
13 time fibo 40
14 $ bash fibo.sh
15 102334155
16
17 real 0m0.001s
18 user 0m0.001s
19 sys 0m0.000s
The iterative solution is much faster and efficient
than the recursive solution.
9.18 Shell Arithmetic
We have already seen the mathematical evaluation
environment in bash, which is used to evaluate
mathematical expressions.
1 $ echo $((1 + 2))
2 3
However, the mathematical evaluation environ-
ment is limited to integer arithmetic, and does
not support floating point arithmetic.
9.18.1 bc
A more powerful way to do arithmetic in bash is to
use the bc command.
Definition 9.18.1 (BC) bc is an arbitrary pre-
cision calculator language, and is used to do
floating point arithmetic in bash.
424 9 Shell Scripting
1 $ bc <<< "1.5 + 2.5"
2 4.0
However, bc by default will set the scale to 0, and
will truncate the result to an integer.
1 $ bc <<< "10/3"
2 3
This can be changed by setting the scale variable
in bc.
1 $ bc <<< "scale=3; 10/3"
2 3.333
We can also use the -l flag to load the math library
in bc, which provides more mathematical functions
and also sets the scale to 20.
1 $ bc -l <<< "10/3"
2 3.33333333333333333333
7: REPL - Read, Evaluate, bc can also be started in an interactive mode, which
Print, Loop is an interactive is a REPL. 7
interpreter of a language.
bc is not just a calculator, rather it is a full fledged
programming language, and can be used to write
bc supports all the basic scripts.
arithmetic operations, and
also supports the if state- 1 $ cat factorial.bc
ment, for loop, and while 2 define factorial(n) {
loop and other program-
3 if(n==1){
ming constructs. It is simi-
4 return 1;
lar to C in syntax, and is a
powerful language. 5 }
6 return factorial(n-1) * n;
7 }
8 $ bc factorial.bc <<< "factorial(5)"
9 120
Read the man page of bc to know more about the
language.
1 $ man bc
9.18 Shell Arithmetic 425
9.18.2 expr
Definition 9.18.2 (EXPR) expr is a command
line utility that is used to evaluate expressions.
As expr is a command line utility, it is not able to
access the shell variables directly, rather we need to
use the $ symbol to expand the variable with the
value before passing the arguments to expr.
1 $ expr 1 + 2
2 3
The spaces around the operator are necessary, as
expr is a command line utility, and the shell will
split the input by spaces.
The operand needs to be escaped or quoted if it has
a special meaning in the shell.
The * symbol is a special
1 $ expr 10 \* 3 character in the shell, and
2 30 is used for path expansion.
If the folder is empty, then
3 $ expr 10 ’*’ 3
it remains as *, but if the di-
4 30
rectory has files, then * will
5 $ ls expand to the sorted list of all
6 $ expr 10 * 3 the files. In this example we
7 30 show this by creating a file
8 $ touch ’+’ with the name as +, thus 10∗3
actually expands to 10 + 3
9 $ expr 10 * 3
and gives the output of 13.
10 13
Like the mathematical environment, expr exits a
zero exit code if the expression evaluates to a non-
zero value, and exits with a non-zero exit code if
the expression evaluates to zero.
This inversion is useful in if and while loops.
1 $ expr 5 ’>’ 6
2 0
3 $ echo $?
4 1
426 9 Shell Scripting
5 $ expr 5 ’<’ 6
6 1
7 $ echo $?
8 0
expr can also be used to match regex patterns. Un-
like most other commands, the regex pattern is
always anchored to the start of the string.
As seen, the regex pattern is
always anchored to the start 1 $ expr hello : h
of the string. The matching is 2 1
done greedily, and the maxi-
3 $ expr hello : e
mum possible match is taken.
expr prints the length of the 4 0
match, and not the match it- 5 $ expr hello : .*e
self. 6 2
7 $ expr hello : .*l
8 4
However, if we want to actually print the match
instead of the length, we can enclose the regex
pattern in escaped parentheses.
1 $ expr hello : ’\(.*l\)’
2 hell
Other string operations that expr supports are:
▶ length - Returns the length of the string.
▶ index - Returns the index of the first occur-
rence of the substring.
▶ substr - Returns the substring of the string.
1 $ expr length hello
2 5
1 $ expr index hello e
2 2
1 $ expr substr hello 2 3
2 ell
The index is 1 based in expr, and not 0 based. For the
substr command, the first argument is the string,
9.19 Running arbritary commands using source, eval and exec 427
the second argument is the starting index, and the
third argument is the length of the substring.
9.19 Running arbritary
commands using source, eval
and exec
Just like we can use the source command to run a
script file in the current shell itself, we can also run
any arbritary command directly using eval without
needing a file.
1 $ eval date
2 Wed Jul 31 11:00:42 PM IST 2024
3 $ cat script.sh
4 echo "Hello"
5 $ source script.sh
6 Hello
7 $ eval ./script.sh
8 Hello
As it can run any command, it can also run a script
file. But the source command cannot run a com-
mand without a file. However, this can be circum-
vented by using the /dev/stdin file.
1 $ source /dev/stdin <<< "echo Hello"
2 Hello
Warning 9.19.1 We should be careful when using
the eval command, as it can run any arbritary
command in the current shell, and can be a
security risk.
428 9 Shell Scripting
9.19.1 exec
Similar to the eval command, the exec command
can also be used to run arbritary commands in the
same environment without creating a new envi-
ronment. However, the shell gets replaced by the
command being run, and when the command exits,
the terminal closes. If the command fails to run
then the shell is preserved.
Exercise 9.19.1 Open a new terminal and run
the command exec sleep 2. Observe that the
terminal closes after 2 seconds.
9.20 Getopts
Getopts is a built-in command in bash that is used
to parse command line arguments. It is a syntactical
sugar that helps us parse command line arguments
easily.
The first argument to the getopts command is the
string that contains the options that the script ac-
cepts. The second argument is the name of the vari-
able that will store the option that is parsed. . . getopts
reads each argument passed to the script one by
one, and stores the option in the variable, and the
argument to the option in the $OPTARG variable. It
needs to be executed as many times as the number
of options passed to the script. As this number is of-
ten unknown, it is usually executed in a while loop,
since the getopts command returns a non-zero exit
code once all the passed options are checked.
1 $ cat optarg.sh
2 while getopts ":a:bc:" flag; do
9.20 Getopts 429
3 echo "flag -$flag, Argument $OPTARG";
4 done
5 $ bash optarg.sh -a 1 -b -c 2
6 flag -a, Argument 1
7 flag -b, Argument
8 flag -c, Argument 2
The colon after the option denotes that the option
requires an additional argument. The colon at the
start of the string denotes that the script should
not print an error message if an invalid option is
passed.
Without the leading colon:
1 $ cat optarg.sh
2 while getopts "a:" flag; do
3 echo "flag -$flag, Argument $OPTARG";
4 done
5 $ bash optarg.sh -a 1 -b
6 flag -a, Argument 1
7 optarg: illegal option -- b
8 flag -?, Argument
9 $ bash optarg.sh -a
10 optarg: option requires an argument -- a
11 flag -?, Argument
With the leading colon:
1 $ cat optarg.sh
2 while getopts ":a:" flag; do
3 echo "flag -$flag, Argument $OPTARG";
4 done
5 $ bash optarg.sh -a 1 -b
6 flag -a, Argument 1
7 flag -?, Argument b
8 $ bash optarg.sh -a
9 flag -:, Argument a
If the option is not passed, the $OPTARG variable is
set to the option itself, and the flag variable is set
to :.
430 9 Shell Scripting
If an illegal option is passed, the flag variable is
set to ?, and the $OPTARG variable is set to the illegal
option.
These let the user print a custom error message if an
illegal option is passed or if an option that requires
an argument is passed without an argument.
9.20.1 With case statement
Usually the getopts command is used with a case
statement to execute the code for each option.
1 $ cat optarg.sh
2 time="Day"
3 while getopts ":n:mae" opt; do
4 case $opt in
5 n) name=$OPTARG ;;
6 m) time="Morning" ;;
7 a) time="Afternoon" ;;
8 e) time="Evening" ;;
9 \?) echo "Invalid option: $OPTARG" >&2 ;;
10 esac
11 done
12 echo -n "Good $time"
13 if [ -n "$name" ]; then
14 echo ", $name!"
15 else
16 echo "!"
17 fi
18 $ bash optarg
19 Good Day!
20 $ bash optarg -a
21 Good Afternoon!
22 $ bash optarg -e
23 Good Evening!
24 $ bash optarg -m
25 Good Morning!
26 $ bash optarg -an Sayan
27 Good Afternoon, Sayan!
9.21 Profile and RC files 431
28 $ bash optarg -mn Sayan
29 Good Morning, Sayan!
30 $ bash optarg -en Sayan
31 Good Evening, Sayan!
32 $ bash optarg -n Sayan
33 Good Day, Sayan!
34 $ bash optarg -n Sayan -a
35 Good Afternoon, Sayan!
The error printing can also be supressed by setting
the OPTERR shell variable to 0.
9.21 Profile and RC files
There are two kinds of bash environments:
▶ Login shell - A login shell is a shell that is
started after a user logs in. It is used to set up
the environment for the user.
▶ Non-login shell - A non-login shell is a shell
that is started after the user logs in, and is
used to run commands.
When a non-login shell is started, it reads the ~/.
bashrc file, and the /etc/bash.bashrc file. 8 8: The rc in bashrc stands
for run command. This is a
When a login shell is started, along with the run common naming convention
command files, it reads the /etc/profile file, and in Unix-like systems, where
then reads the ~/.bash_profile and ~/.profile file. the configuration files are
named with the extension rc.
We make most of the configuration changes in the
~/.bashrc file, as it is read by both login and non-
login shells, and is the most common configuration
file. Make sure to backup the ~/.bashrc file before
making any changes, as a misconfiguration can
cause the shell to not start.
432 9 Shell Scripting
9.22 Summary
In this chapter, we learned about the control flow
constructs in bash, and how to use them to author
scripts to automate your work.
Table 9.3: Summary of the bash constructs
Construct Description
if Execute a block of code based on a condition.
case Execute a block of code based on a pattern.
for Iterate over a list of elements.
while Execute a block of code as long as a condition is true.
until Execute a block of code as long as a condition is false.
break Exit the loop immediately.
continue Skip the rest of the code in the loop and go to the next iteration.
read Read input from the user.
unset Unset a variable.
local Declare a variable as local to a function.
return Return from a function.
source Run a script file in the current shell.
eval Run arbritary commands in the current shell.
exec Run arbritary commands in the current shell.
getopts Parse command line arguments.
Stream Editor 10
Stream Editor (sed) is a powerful text stream editor.
It is used to perform basic text transformations on an
input stream (a file or input from a pipeline). While Figure 10.1: Filtering
in some ways similar to an editor which permits Streams
scripted edits (such as ed), sed works by making
only one pass over the input(s), and is consequently
more efficient. But it is sed’s ability to filter text in
a pipeline which particularly distinguishes it from
other types of editors.
This means that sed can be used in a pipeline of
other commands to filter, refine, and transform
the data in the stream. This is illustrated in Figure
10.1.
Remark 10.0.1 sed is short for Stream EDitor.
10.1 Basic Usage Figure 10.2: The different in-
terfaces to sed
Sed needs an input (stream or file) and a script to
work. The script is a series of commands that sed
will execute on the input. The script can be passed
as a command line argument or as a file as well. Here we are using the s com-
mand of sed to perform a
For example, we can provide the stream to ma- find-and-replace style substi-
nipulate as standard input, and provide a single tution. This is similar to the %
s/regex/string/ present
command as the script directly through the com- in vim. The first argument
mand line. can be a regex, but the sec-
ond argument has to be a
1 $ sed ’s/World/Universe/’ <<< "Hello World" string, however it may con-
2 Hello Universe tain backreferences to the
matched pattern.
434 10 Stream Editor
As sed can work on standard input if no file is pro-
vided, it is very useful when paired with other tools;
It can transform or filter output of commands.
For example, the default output of date does not
zero-pad the date. If we want to do that using sed,
we can do it as follows:
This does not add an ex-
tra zero if the date is al- 1 $ date
ready two digits long. As
2 Tue Aug 6 05:00:23 PM IST 2024
we have already covered reg-
ular expressions in depth, 3 $ date | sed -E ’s/^([[:alpha:]]*[[:space
it should be easy to under- :]]*[[:alpha:]]*)[[:space:]]*([[:digit:]])
stand the regex used here. [[:space:]]/\1 0\2 /’
We are grouping the regex to 4 Tue Aug 06 05:00:32 PM IST 2024
use it in the back reference 5 $ date -d "20241225"
and add an extra zero if the
6 Wed Dec 25 12:00:00 AM IST 2024
date is a single digit.
7 $ date -d "20241225" | sed -E ’s/^([[:alpha
:]]*[[:space:]]*[[:alpha:]]*)[[:space
:]]*([[:digit:]])[[:space:]]/\1 0\2 /’
8 Wed Dec 25 12:00:00 AM IST 2024
If the pattern being searched does not match on the
input, sed will not make any changes to the input.
10.2 Addressing
All of the commands in sed runs on all the lines by
default. However, we can specify the lines to run
on by providing the optional address field before
the command.
1 $ cat data
2 hello world
3 hello universe
4 $ sed ’1s/hello/hi/’ data
5 hi world
6 hello universe
The address can be the line number of a particular
line, range of line numbers, or a regex pattern to
10.2 Addressing 435
match. We can also use $ to match the last line and
1to match the first line.
1 $ seq 10 20 | sed ’5,10d’
2 10
3 11
4 12
5 13
6 20
1 $ seq 10 20 | sed ’/15/,/20/d’
2 10
3 11
4 12
5 13
6 14
1 $ seq 10 20 | sed ’5d’
2 10
3 11
4 12
5 13
6 15
7 16
8 17
9 18
10 19
11 20
Furthermore, we can also specify the range of ad-
dresses to run the command on, using regex as the
start, end, or both the ends.
1 $ seq 10 20 | sed ’/13/,$d’
2 10
3 11
4 12
The range usually uses the start and end addresses
separated by a comma. But we can also use +n to
include n lines starting from the start range.
1 $ seq 10 20 | sed ’/13/,+5d’
436 10 Stream Editor
2 10
3 11
4 12
5 19
6 20
To match every nth line, we can use ~n.
1 $ seq 10 20 | sed ’0~2d’
2 10
3 12
4 14
5 16
6 18
7 20
10.2.1 Negation
We can negate the address by using the ! operator
before the address.
1 $ seq 8 5 32 | sed ’/1./!d’
2 13
3 18
In this example we first use the seq command to
generate a sequence of numbers from 8 to 32 with
a step of 5. Then we use sed to delete all the lines
that do not match the pattern 1., printing only the
numbers that start with 1.
10.3 Commands
10.3.1 Command Syntax
The general syntax of the sed command is as fol-
lows:
1 [:label] [address] command [arguments]
10.3 Commands 437
10.3.2 Available Commands
Sed has a lot of commands that can be used to
manipulate the input stream. Here are some of the
most commonly used commands:
▶ p - Print the pattern space.
▶ d - Delete the pattern space.
▶ s - Substitute the pattern using regex match
with string.
[address]s/search regex/replace string/[flags
]
▶ - Print the current line number.
=
▶ - Comment.
#
▶ i - Insert text before the current line.
▶ a - Insert text after the current line.
▶ c - Change the current line.
▶ y - Transliterate the characters in the pattern
space.
▶ q [exit code] - Quit the script.
10.3.3 Branching and Flow Control
Other than these, there are other control flow com-
mands like b, t, :, which let us create more complex
and powerful transformations with state informa-
tion.
▶ b label- Branch unconditionally to label.
▶ :label - Define Label for branching.
▶ n - Read the next line to the pattern space.
▶ N - Append the next line to the pattern space.
▶ t label - Branch to label on successful substi-
tution.
▶ T label - Branch to label on failed substitu-
tion.
▶ w file - Write the pattern space to file.
438 10 Stream Editor
▶ r file - Append the contents of file to the
pattern space.
▶ h - Copy the pattern space to the hold space.
▶ H - Append the pattern space to the hold
space.
▶ g - Copy the hold space to the pattern space.
▶ G - Append the hold space to the pattern
space.
▶ x - Exchange the pattern space and the hold
space.
▶ D - Delete the first line of the pattern space.
▶ P - Print the first line of the pattern space.
More details on these commands can be found in
the man and info pages of sed.
10.3.4 Printing
The p command prints a line in sed. By default the
line in pattern space is printed by default. However
this can be suppresesed using the -n flag. Thus
when the p command is used along with the -n
flag, only the lines that are explicitly printed are
shown.
1 $ seq 10 20 | sed -n ’5p’
2 14
10.3.5 Deleting
The d command deletes the line in the pattern space.
Here we are deleting two This is useful if the -n flag is not used to suppress
ranges of lines from the input
stream. Multiple commands the default printing.
can be separated by a semi-
1 $ seq 10 20 | sed ’1,4d;6,$d’
colon.
2 14
10.3 Commands 439
Thus there are two ways of filtering a stream of data
as seen in the last two examples:
▶ Print only the lines that match a pattern.
▶ Delete the lines that do not match a pattern.
Remark 10.3.1 If we use the p command without
using the -n flag, the matched lines will be
printed twice, and the other lines will be printed
once.
1 $ seq 1 5 | sed ’2p’
2 1
3 2
4 2
5 3
6 4
7 5
10.3.6 Substitution
This is one of the most widely used commands in
sed. The sustitute command is used to replace a
pattern with a string. The search pattern can be a
fixed string, or a regex pattern.
Remark 10.3.2 sed supports two types of regex:
Basic and Extended. Perl Compatible Regular
Expressions (PCRE) are not supported by sed.
Like the other commands, the substitution com-
mand can be used with an address to specify the
lines to run on, or on all the lines.
1 $ seq 8 12 | sed ’s/1/one/’
2 8
3 9
4 one0
440 10 Stream Editor
5 one1
6 one2
Observe that although the pattern 1 is present twice
in 11, it is only replaced once as the substitution
command only replaces the first occurrence of the
pattern in the line.
To substitute all the occurrences of the pattern in the
line, we can use the g option to the s command.
1 $ seq 8 12 | sed ’s/1/one/g’
2 8
3 9
4 one0
5 oneone
6 one2
The g argument stands for global, and it replaces
all the occurrences of the pattern in the line.
If we want to replace only the nth occurrence of the
pattern, we can mention the number instead of g.
1 $ seq 8 12 | sed ’s/1/one/2’
2 8
3 9
4 10
5 1one
6 12
Here the 2 flag replaces the second occurrence of
the pattern in the line.
Observe that the lines with only one occurrence of
the pattern are not changed at all.
Remark 10.3.3 If we ask sed to replace the nth
occurrence of the pattern, and the pattern does
not occur the pattern n times in the line, then
the line is not changed at all.
10.3 Commands 441
Case Insensitive Substitution
If we want to perform a case insensitive substitution,
we can use the i option to the s flag.
1 $ sed ’s/Hello/Hi/i’ <<< "hello world"
2 Hi world
Backreferences
Just like in grep we can use groups and backrefer-
ences in sed as well.
1 $ sed ’s/\([^,]*\),/\1\n/g’ <<< "1,2,3,4,5"
2 1
3 2
4 3
5 4
6 5
Here we are matching each of the comma separated
values and replacing them with the same value
followed by a newline. The group needs to be es-
caped with a backslash in BRE. However we can
use Extended Regular Expressions (ERE) to avoid In these regex, we are match-
this by using the -E flag. ing any character except a
comma greedily as long as
1 $ sed -E ’s/([^,]*),/\1\n/g’ <<< "apple,banana possible, and replacing it
,cherry,donut,eclairs" with the same pattern fol-
2 apple lowed by a newline.
3 banana
4 cherry
5 donut
6 eclairs
We are using groups and backreferences in this case
as we are dropping the comma and replacing it
with a newline. However, if we want to preserve
the entire match and add some text before or after
442 10 Stream Editor
it, we can use the & backreference without explicitly
grouping the entire match.
1 $ sed ’s/[^,]*,/&\n/g’ <<< "apple,banana,
cherry,donut,eclairs"
2 apple,
3 banana,
4 cherry,
5 donut,
6 eclairs
Remark 10.3.4 The & is a backreference to the
entire match.
Uppercase and Lowercase
We can use the \u and \l backreferences to convert
the first character of what follows (usually a dy-
namically fetched backreference) to uppercase or
lowercase.
Similarly, we can use the \U and \L backreferences
1: The uppercasing or low-
ercasing is done till the end to convert the entire string 1 to uppercase or lower-
of the string or till the next E case.
backreference.
1 $ sed ’s/.*/\u&/’ <<< hello
2 Hello
1 $ sed ’s/.*/\U&/’ <<< hello
2 HELLO
To reset the effect of the \U or \L backreference, we
can use the \E backreference.
1 $ sed ’s/\([^ ]*\) *\([^ ]*\)/\U\1 \E\u\2/’
<<< "hello world"
2 HELLO World
Here we are matching two groups of non-space
characters (words) using the [^ ]* regex, separated
10.3 Commands 443
by zero or more spaces. Thus the first world is
refered to by \1 and the second by \2. Then we
are converting the first word to fully uppercase
using \U\1, and the first letter of the second word
to uppercase, using \u\2.
To reset the effect of the \U backreference, we use
the \E backreference.
10.3.7 Print Line Numbers
The = command prints the line number of the cur-
rent line.
1 $ seq 7 3 19 | sed ’=’
2 1
3 2
4 3
5 4
6 5
Here we are using the seq command to generate a
sequence of numbers from 7 to 19 with a step of 3.
Then we are using sed to print the line number of
each line. We use the -n flag to suppress the default
printing of the line.
wc emulation
Since we can print the line number of any line, we
can use this to emulate the wc command by printing
only the line number of the last line.
1 $ seq 7 3 19 | sed -n ’$=’
2 5
3 $ seq 7 3 19 | wc -l
4 5
444 10 Stream Editor
10.3.8 Inserting and Appending Text
The i command is used to insert text before the
current line. The a command is used to append text
after the current line.
1 $ seq 1 5 | sed ’2ihello’
2 1
3 hello
4 2
5 3
6 4
7 5
1 $ seq 1 5 | sed ’2ahello’
2 1
3 2
4 hello
5 3
6 4
7 5
Here we are using the seq command to generate a
sequence of numbers from 1 to 5. Then we are using
sed to insert the text hello before the second line.
If we drop the address, the text is inserted before
every line.
1 $ seq 1 5 | sed ’ihello’
2 hello
3 1
4 hello
5 2
6 hello
7 3
8 hello
9 4
10 hello
11 5
1 $ seq 1 5 | sed ’ahello’
10.3 Commands 445
2 1
3 hello
4 2
5 hello
6 3
7 hello
8 4
9 hello
10 5
11 hello
We can also insert multiple lines by escaping the
newline character.
1 $ seq 1 5 | sed ’2i\
2 hello\
3 how are you’
4 1
5 hello
6 how are you
7 2
8 3
9 4
10 5
10.3.9 Changing Lines
The c command is used to change the current line.
This is sometimes more convenient that substituting
the entire line.
Let’s say we want to remove all the lines that con-
tains 8 or more digits consecutively, as these may
be confidential information, and replace it with
"REDACTED".
1 $ cat data
2 Hello
3 This is my phone number:
4 9999999998
5 and
446 10 Stream Editor
6 this is my aadhaar card number:
7 9999999999999998
8 my bank cvv is
9 123
10 $ sed -E ’/[0-9]{8}/cREDACTED’ data
11 Hello
12 This is my phone number:
13 REDACTED
14 and
15 this is my aadhaar card number:
16 REDACTED
17 my bank cvv is
18 123
Here we are addressing all the lines that contain 8
or more digits consecutively, and replacing them
with "REDACTED". The [0-9]{8} regex matches
any sequence of 8 digits.
10.3.10 Transliteration
The y command is used to transliterate the charac-
ters in the pattern space. This is similar to the tr
command. However, it is not as powerful as tr as it
does not support ranges.
1 $ sed ’y/aeiou/12345/’ <<< "hello world"
2 h2ll4 w4rld
Unlike the tr command, the y command does not
support unequal length strings.
10.4 Combining Commands
We can combine multiple commands in a single
script by separating them with a semicolon.
10.4 Combining Commands 447
1 $ seq 10 20 | sed ’1,4d;6,$d’
2 14
Here we are deleting two ranges of lines from the
input stream.
However, sometimes we may want to perform mul-
tiple commands on the same address range, this can
be done by enclosing the commands in braces. Here we are addressing the
line by matching the regex
1 $ cat data # which marks the start of a
2 from sys import argv, exit comment. Then on that ad-
3 dress we are performing a
4 def main(): compound command using
braces. The first command
5 if len(argv) != 3+1: # TODO: move magic
(=) prints the line number,
number to variable and the second command (s
6 print("Incorrect Usage") ) deletes the # and anything
7 exit(1) before it.
8
9 kloc,a,b = list(map(float, argv[1:]))
10 # TODO: add adjustment factor
11 return a * (kloc ** b)
12
13
14 if __name__ == "__main__":
15 print(main())
16 $ sed -n ’/#/{=;s/^.*# //p}’ data
17 4
18 TODO: move magic number to variable
19 9
20 TODO: add adjustment factor
Here we are performing two actions on the match-
ing lines.
▶ Printing the line number of the matching line.
▶ Printing only the TODO comment, not the
code before it.
This is a handy operation to extract TODO com-
ments from the codebase.
448 10 Stream Editor
Instead of using a semicolon to separate the com-
mands and addressing each command, we are en-
closing them in braces and addressing the group
only once.
10.5 In-place Editing
As we saw in Figure 10.2, sed can be used to filter
the lines of a file as well. We have also seen that
in the examples till now. However, we have always
printed the output to the terminal. The file remains
unchanged.
However, we can use the -i flag to edit the file in
place.
1 $ cat data
2 hello world
3 $ sed -i ’s/\b[[:alpha:]]/\u&/g’ data
4 $ cat data
5 Hello World
6 $ sed -i ’s/\b[[:alpha:]]/\l&/g’ data
7 $ cat data
In this example we are
searching for a line in the 8 hello world
configuration file that starts
with LAUNCH_ICBM and are This is useful when working with large files, as
then running a compound we do not have to write the output to a tempo-
command. The command
rary file and then replace the original file with the
first tries to substitute FALSE
with TRUE. If it was success- temporary file. It is also used widely to modify
ful, it then branches to the configuration files in-place without using program-
end of the script to avoid ming languages.
changing it back to FALSE.
If the substitution was not 1 $ cat usa.conf
successful then it tries to 2 LAUNCH_ICBM=FALSE
replace TRUE with FALSE
3 $ sed -i ’/^LAUNCH_ICBM/{s/FALSE/TRUE/;tx;s/
. Thus the command tog-
TRUE/FALSE/}; :x’ usa.conf
gles the boolean variable be-
tween TRUE and FALSE. We 4 $ cat usa.conf
will cover branching in de- 5 LAUNCH_ICBM=TRUE
tails later in the chapter.
10.6 Sed Scripts 449
6 $ sed -i ’/^LAUNCH_ICBM/{s/FALSE/TRUE/;tx;s/
TRUE/FALSE/}; :x’ usa.conf
7 $ cat usa.conf
8 LAUNCH_ICBM=FALSE
We can also preserve the original file as a backup
by providing a suffix to the -i flag.
1 $ cat data
2 hello
3 $ sed -i.backup ’s/hello/bye/’ data
4 $ cat data
5 bye
6 $ cat data.backup
7 hello
10.6 Sed Scripts
Instead of listing out the commands as a string in
the command line, we can also provide a file of sed
commands, called a sed script, where each com-
mand is delimited by a newline character instead
of the semicolon character.
1 $ cat script.sed
2 s/hello/hi
3 $ sed -f script.sed <<< "hello world"
4 hi world
Different commands should be present in separate
lines in the sed script.
1 $ cat script.sed
2 5p
3 10p
4 $ seq 101 120 | sed -n -f script.sed
5 105
6 110
450 10 Stream Editor
10.6.1 Order of Execution
The sed script is executed for each line in the input
stream. The order of checking is as per the order of
the stream, and then the order of the script, that is,
for each line in the stream from top to bottom, each
2: If the address matches line of the script is executed 2 from top to bottom.
1 $ cat script.sed
2 10p
3 5p
4 $ seq 10 | sed -n -f script.sed
5 5
6 10
Even though the script has the command to print
the 10th line before the command to print the 5th
line, the output is in the order of the stream.
However, if for a line in the stream, multiple com-
mands in the script match, then the order of the
script is followed.
1 $ cat script.sed
2 /10/{
3 p
4 a Multiple of 10
5 }
6 /5\|10/{
7 p
8 a Multiple of 5
9 }
10 $ seq 10 | sed -n -f script.sed
11 5
12 Multiple of 5
13 10
14 10
15 Multiple of 10
16 Multiple of 5
In this example, we are printing the line if it is a
multiple of 5 or 10. The case of 5 comes first as the
10.6 Sed Scripts 451
order of the stream is in increasing order.
However, for the number 10, both the conditions
are satisfied, and the order of the script is followed,
first the text "Multiple of 10" is printed, and then
"Multiple of 5".
Remark 10.6.1 Observe that the p command
prints the line, and the a command appends text
after the line. Even though the first a is present
before the second p, still both the p are printed
first and then the a are printed. This is because
the p command prints the line immediately, but
the a command appends the text to the hold
space, and prints it only when the next line is
read.
10.6.2 Shebang
If we are running a sed script directly from the shell
without specifying the interpreter, we need to add
the shebang in the first line of the script.
1 #!/bin/sed -f
or
1 #!/usr/bin/sed -f
This indicates to the shell that the interpretter to
use to run this script is sed.
The -f flag is required to make sed read the script
from the script file.
Remark 10.6.2 The bash shell simply invokes
452 10 Stream Editor
the command in the script’s shebang along with
all of the arguments and appends the path of the
script file as the last argument of the paramater
list.
Note that the script file has
to be made executable using 1 $ cat script.sed
the chmod command. This 2 #!/bin/sed -nf
is a security feature to pre- 3 a Hello
vent accidental execution of
4 bx
random untrusted scripts.
5 a World
Note that we have also speci-
6 :x
fied the -n flag to suppress
7 a Universe
the default printing of the
line. 8 $ chmod u+x script.sed
9 $ ./script.sed <<< ""
10 Hello
11 Universe
10.7 Branching and Flow Control
Sed supports branching and flow control com-
mands to create more complex transformations
using loop and if-else conditions. However, sed
does not support syntax like if-else, for, or while
loops, thus to perform these control flow operations,
we have to use the branching commands.
This is similar to how high level languages are
compiled to assembly language GOTO commands,
and then to machine code.
10.7.1 Labels
Labels are used to mark a point in the script to
branch to. They themselves do not perform any
action, but are used as a reference point for branch-
ing.
10.7 Branching and Flow Control 453
1 $ cat script.sed
2 :label1
3 5p
4 :label2
5 10p
6 $ seq 10 | sed -n -f script.sed
7 5
8 10
10.7.2 Branching
Now that we have defined labels in our script, we
can use branching to instruct sed to move the control
flow to any arbritary label.
Remark 10.7.1 The label can be defined before,
or after the branching call
There are two kinds of branching in sed,
▶ Unconditional Branching - the branching
does not depend on success or failure of pre-
vious command.
▶ Conditional Branching - the branching is
conditioned on the success or failure of the
substitution done before the branching call.
Unconditional Branching
The b command is used to branch unconditionally
to a label. This will always branch to the label and
skip any command after it.
1 $ cat script.sed
2 a Hello
3 bx
4 a World
454 10 Stream Editor
5 :x
6 a Universe
7 $ sed -nf script.sed <<< "test"
8 Hello
9 Universe
In this example the command a Hello is executed,
and then the bx command is executed. So, the com-
mand a World is skipped, and the control flow is
moved to the label x, and the command a Universe
is executed.
If the label is after the branch, it can be used to skip
certain parts of the script. However, if the label is
before the branch, it can be used to loop over the
same part of the script multiple times. But if done
using an unconditional branch, it will result in an
infinite loop.
1 $ cat script.sed
2 :x
3 p
4 bx
5 $ sed -nf script.sed <<< "hello"
6 hello
7 hello
8 hello
..
.
The bx command branches to the label x uncon-
ditionally, and thus the command p is executed
multiple times. To stop the infinite loop, we can
press Ctrl+C.
However, we can use a regex address to limit the
number of times the loop is executed.
1 $ cat script.sed
2 :x
3 s/\b[a-z]/\u&/
4 /\b[a-z]/bx
10.7 Branching and Flow Control 455
5 $ cat data
6 this is a long line with lots of words, we
want to capitalize each first letter of a
word.
7 $ sed -f script.sed data
8 This Is A Long Line With Lots Of Words, We
Want To Capitalize Each First Letter Of A
Word.
In this example, we are first declaring a branch label
x. Then we are using the substitution command to
capitalize the first letter of the first word, observe
that we are not using the g flag to capitalize all
the words. Then we are using a regex address to
check if there are any more words in the line. Only
if there are more words, we branch to the label x.
Thus we are using a unconditional branch but we
are conditionally branching to the label using the
regex address.
This can be simplified using the t command.
Conditional Branching
The t command is used to branch to a label if the
previous substitution was successful.
1 $ cat script.sed
2 :x
3 s/\b[a-z]/\u&/
4 tx
5 $ cat data
6 this is a long line with lots of words, we
want to capitalize each first letter of a
word.
7 $ sed -f script.sed data
8 This Is A Long Line With Lots Of Words, We
Want To Capitalize Each First Letter Of A
Word.
456 10 Stream Editor
Here we are using the t command to branch to the
label x if the substitution was successful. This will
terminate once all the words are capitalized as in
the next iteration the substitution will fail.
Thus this implementation will run one iteration
more than the previous implementation since it
terminates when the substitution fails inside the
iteration.
10.7.3 Appending Lines
We can use the N command to append the next
line to the pattern space. This will keep the same
iteration cycle, but simply append the next line to
the current line. The next line is delmited from the
current line by a newline character. After processing
the pattern space, the iteration cycle directly moves
to the line after the appended line.
A line read normally by sed does not include the
newline character at the end of the line, however, if
the next line is appended to pattern space using N,
then the newline character is included in the pattern
space.
1 $ cat script.sed
2 1~2N
3 =
4 $ seq 10 | sed -n -f script.sed
5 2
6 4
7 6
8 8
9 10
In this example, we are appending the next line to
the pattern space for every odd line. Then we are
printing the line number of the current line.
10.7 Branching and Flow Control 457
Since we are appending the next line to the pat-
tern space, the line number is printed for the even
lines.
1 $ cat script.sed
2 N
3 N
4 s/\n/;/g
5 $ seq 10 | sed -n -f script.sed
6 1;2;3
7 4;5;6
8 7;8;9
9 10
In this example, we are appending the next two
lines to the pattern space. Then we are replacing
the newline characters with a semicolon. Observe
the g flag in the substitution command, this is used
to replace all the newline characters in the pattern
space.
Since sed does not re-process the appended lines
in the next cycle, the next line starts from 4.
Remark 10.7.2 The N command terminates the
script on the last line in GNU sed, but is unde-
fined in the POSIX standard.
Exercise 10.7.1 Try the previous example with-
out the g flag in the substitution command.
Predict the output before running it, observe
and justify the output after running it. How
many lines are printed? How many iteration
cycles does sed go through in this?
458 10 Stream Editor
10.7.4 Join Multiline Strings
If we have a file where long lines are hard-wrapped
using a \\ character at the end of the line, we can
use the N command to join the lines and recreate
the real file.
1 $ cat data
2 Hello, this is a ver\
3 y long line, but I h\
4 ave broken it into s\
5 everal lines of fixe\
6 d width and marked f\
7 orced breaks with a \
8 backwards slash befo\
9 re the newline chara\
10 cter.
11 Real newlines do not\
12 have the backslash \
13 though.
14 Like this.
15 $ cat script.sed
16 :x
17 /\\$/{
18 N
19 s/\\\n//
20 bx
21 }
22 $ sed -f script.sed data
23 Hello, this is a very long line, but I have
broken it into several lines of fixed
width and marked forced breaks with a
backwards slash before the newline
character.
24 Real newlines do not have the backslash though
.
25 Like this.
In this script, we first are defining a label x. Then we
are checking if the line ends with a \\ character. If
it does, then we are appending the next line to the
10.7 Branching and Flow Control 459
pattern space. Then we are substituting the \\ and
the newline character with an empty string, this
joins the word-wrapped lines back. Finally, we are
branching back to the label x to check if the current
pattern space still has a \\ at the end.
There are other ways of performing the same action,
such as using t and P.
Exercise 10.7.2 Try to implement the same
script using the t and P commands.
10.7.5 If-Else
We can also use labels and branching to emulate
the if-else syntax. We need to use Extended
Regular Expressions (ERE) if
1 $ cat script.sed we are using the parenthesis
2 /(1|3|5|7|9)$/{ grouping and pipe-or syntax
3 s/$/ is an odd number/ of ERE without escaping
them. This is why we have
4 bx
used the -E flag. We can
5 }
also drop it and escape the
6 s/$/ is an even number/ parantheses and pipes.
7 :x
8 $ seq 10 | sed -Ef script.sed The substitution adds
9 1 is an odd number the text "is an odd/even
10 2 is an even number number" at the end of each
number. Substituting $
11 3 is an odd number
simply appends to the end
12 4 is an even number of the line.
13 5 is an odd number
14 6 is an even number
15 7 is an odd number
16 8 is an even number
17 9 is an odd number
18 10 is an even number
Here, the first command group is addressed explic-
itly for only those lines which end in odd digits. If
460 10 Stream Editor
the number is odd, it will enter this block, append
the addage that the number is odd, and then branch
to the end of the script and move on to next line.
However, if the number is not odd, it will not en-
ter this command group, and thus not execute the
branch. Thus it will execute the second substitu-
tion.
10.7.6 If-elif-else
Similarly we can construct a if-elif-else ladder.
1 $ cat script.sed
2 /(1|3|5|7|9)$/{
3 s/$/ is an odd number/
4 bx
5 }
6 /0$/{
7 s/$/ is zero/
8 bx
9 }
10 s/$/ is an even number/
11 :x
12 $ seq 0 5 | sed -Ef script.sed
13 0 is zero
14 1 is an odd number
15 2 is an even number
16 3 is an odd number
17 4 is an even number
18 5 is an odd number