cli_text_processing_coreutils_sample
cli_text_processing_coreutils_sample
cli_text_processing_coreutils_sample
Preface 4
Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Feedback and Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Author info . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Book version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Introduction 6
Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
tr 24
Transliteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Different length sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Escape sequences and character sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Deleting characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Squeeze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
cut 29
Individual field selections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Field ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Input field delimiter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Output field delimiter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2
Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Suppress lines without delimiters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Character selections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
NUL separator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
seq 35
Integer sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Floating-point sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Customizing separator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Leading zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
printf style formatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
shuf 40
Randomize input lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Limit output lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Repeated lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Specify input lines as arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Generate random numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Specifying output file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
NUL separator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3
Preface
You might be already aware of popular coreutils commands like head , tail , tr , sort
and so on. This book will teach you more than twenty of such specialized text processing tools
provided by the GNU coreutils package.
My Command Line Text Processing repo includes chapters on some of these coreutils commands.
Those chapters have been significantly edited for this book and new chapters have been added
to cover more commands.
Prerequisites
You should be familiar with command line usage in a Unix-like environment. You should be
comfortable with concepts like file redirection and command pipelines.
If you are new to the world of command line, check out my Computing from the Command Line
ebook and curated resources on Linux CLI and Shell scripting before starting this book.
Conventions
• The examples presented here have been tested on the GNU bash shell and version 9.1
of the GNU coreutils package.
• Code snippets shown are copy pasted from the bash shell and modified for presentation
purposes. Some commands are preceded by comments to provide context and explana-
tions. Blank lines have been added to improve readability, only real time is shown for
speed comparisons and so on.
• Unless otherwise noted, all examples and explanations are meant for ASCII characters.
• External links are provided throughout the book for you to explore certain topics in more
depth.
• The cli_text_processing_coreutils repo has all the code snippets and files used in examples,
exercises and other details related to the book. If you are not familiar with the git
command, click the Code button on the webpage to get the files.
Acknowledgements
• GNU coreutils documentation — manual and examples
• /r/commandline/, /r/linux4noobs/ and /r/linux/ — helpful forums
• stackoverflow and unix.stackexchange — for getting answers to pertinent questions on
coreutils and related commands
• tex.stackexchange — for help on pandoc and tex related questions
• canva — cover image
• Warning and Info icons by Amada44 under public domain
• oxipng, pngquant and svgcleaner — optimizing images
• SpikePy — for spotting an ambiguous explanation
4
You can reach me via:
Author info
Sundeep Agarwal is a lazy being who prefers to work just enough to support his modest lifestyle.
He accumulated vast wealth working as a Design Engineer at Analog Devices and retired from the
corporate world at the ripe age of twenty-eight. Unfortunately, he squandered his savings within
a few years and had to scramble trying to earn a living. Against all odds, selling programming
ebooks saved his lazy self from having to look for a job again. He can now afford all the fantasy
ebooks he wants to read and spends unhealthy amount of time browsing the internet.
When the creative muse strikes, he can be found working on yet another programming ebook
(which invariably ends up having at least one example with regular expressions). Researching
materials for his ebooks and everyday social media usage drowned his bookmarks, so he main-
tains curated resource lists for sanity sake. He is thankful for free learning resources and open
source tools. His own contributions can be found at https://github.com/learnbyexample.
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 In-
ternational License.
Resources mentioned in Acknowledgements section above are available under original licenses.
Book version
2.0
5
Introduction
I’ve been using Linux since 2007, but it took me ten more years to really explore coreutils when
I wrote tutorials for the Command Line Text Processing repository.
Any beginner learning Linux command line tools would come across the cat command within
the first week. Sooner or later, they’ll come to know popular text processing tools like grep
, head , tail , tr , sort , etc. If you were like me, you’d come across sed and awk ,
shudder at their complexity and prefer to use a scripting language like Perl and text editors like
Vim instead (don’t worry, I’ve already corrected that mistake).
Knowing power tools like grep , sed and awk can help solve most of your text processing
needs. So, why would you want to learn text processing tools from the coreutils package? The
biggest motivation would be faster execution since these tools are optimized for the use cases
they solve. And there’s always the advantage of not having to write code (and test that solution)
if there’s an existing tool to solve the problem.
This book will teach you more than twenty of such specialized text processing tools provided by
the GNU coreutils package. Plenty of examples and exercise are provided to make it easier
to understand a particular tool and its various features.
Writing a book always has a few pleasant surprises for me. For this one, it was discovering a
sort option for calendar months, regular expressions in the tac and nl commands, etc.
Installation
On a GNU/Linux based OS, you are most likely to already have GNU coreutils installed. This
book covers the version 9.1 of the coreutils package. To install a newer/particular version, see
the coreutils download section for details.
If you are not using a Linux distribution, you may be able to access coreutils using these options:
• Windows Subsystem for Linux — compatibility layer for running Linux binary executables
natively on Windows
• brew — Package Manager for macOS (or Linux)
Documentation
It is always a good idea to know where to find the documentation. From the command line, you
can use the man and info commands for brief manuals and full documentation respectively.
I prefer using the online GNU coreutils manual which feels much easier to use and navigate.
See also:
6
cat and tac
cat derives its name from concatenation and provides other nifty options too.
tac helps you to reverse the input line wise, usually used for further text processing.
In the above example, the output of cat is redirected to a file named greeting.txt . If you
don’t redirect the stdout data, each line will be echoed as you type. You can check the contents
of the file you just created by using cat again.
$ cat greeting.txt
Hi there
Have a nice day
Here Documents is another popular way to create such files. In this case, the termination
condition is a line matching a predefined string which is specified after the << redirection
operator. This is especially helpful for automation, since pressing Ctrl+d interactively isn’t
desirable. Here’s an example:
# > and a space at the start of lines represents the secondary prompt PS2
# don't type them in a shell script
# EOF is typically used as the identifier
$ cat << 'EOF' > fruits.txt
> banana
> papaya
> mango
> EOF
$ cat fruits.txt
banana
papaya
mango
The termination string is enclosed in single quotes to prevent parameter expansion, command
substitution, etc. You can also use \string for this purpose. If you use <<- instead of << ,
you can use leading tab characters for indentation purposes. See bash manual: Here Documents
and stackoverflow: here-documents for more examples and details.
7
Note that creating files as shown above isn’t restricted to cat , it can be applied to
any command waiting for stdin .
# 'tr' converts lowercase alphabets to uppercase in this example
$ tr 'a-z' 'A-Z' << 'end' > op.txt
> hi there
> have a nice day
> end
$ cat op.txt
HI THERE
HAVE A NICE DAY
Concatenate files
Here are some examples to showcase cat ’s main utility. One or more files can be passed as
arguments.
$ cat greeting.txt fruits.txt nums.txt
Hi there
Have a nice day
banana
papaya
mango
3.14
42
1000
Visit the cli_text_processing_coreutils repo to get all the example files used in this
book.
$ cat op.txt
Hi there
Have a nice day
banana
papaya
mango
3.14
42
1000
8
# only stdin (- is optional in this case)
$ echo 'apple banana cherry' | cat
apple banana cherry
# here's an example without a newline character at the end of the first input
$ printf 'Some\nNumbers' | cat - nums.txt
Some
Numbers3.14
42
1000
world
apple
You can use the -s option to squeeze consecutive empty lines to a single empty line. If present,
leading and trailing empty lines will also be squeezed (won’t be completely removed). You can
modify the below example to test it out.
$ printf 'hello\n\n\nworld\n\nhave a nice day\n\n\n\n\n\napple\n' | cat -s
hello
world
apple
9
Prefix line numbers
The -n option will prefix line numbers and a tab character to each input line. The line numbers
are right justified to occupy a minimum of 6 characters, with space as the filler.
$ cat -n greeting.txt fruits.txt nums.txt
1 Hi there
2 Have a nice day
3 banana
4 papaya
5 mango
6 3.14
7 42
8 1000
Use the -b option instead of -n if you don’t want empty lines to be numbered.
# -n option numbers all the input lines
$ printf 'apple\n\nbanana\n\ncherry\n' | cat -n
1 apple
2
3 banana
4
5 cherry
2 banana
3 cherry
Use the nl command if you want more customization options like number formatting,
separator string, regular expression based filtering and so on.
10
# NUL character
$ printf 'car\0jeep\0bus\0' | cat -v
car^@jeep^@bus^@
The -v option doesn’t cover the newline and tab characters. You can use the -T option to
spot tab characters.
$ printf 'good food\tnice dice\napple\tbanana\tcherry\n' | cat -T
good food^Inice dice
apple^Ibanana^Icherry
The -E option adds a $ marker at the end of input lines. This is useful to spot trailing
whitespace characters.
$ printf 'ice \nwater\n cool \n chill\n' | cat -E
ice $
water$
cool $
chill$
Most commands that you’ll see in this book can directly work with file arguments, so you
shouldn’t use cat to pipe the contents for such cases. Here’s a single file example:
# useless use of cat
$ cat greeting.txt | sed -E 's/\w+/\L\u&/g'
Hi There
Have A Nice Day
11
# sed can handle file arguments
$ sed -E 's/\w+/\L\u&/g' greeting.txt
Hi There
Have A Nice Day
If you prefer having the file argument before the command, you can use the shell’s redirection
feature to supply input data instead of cat . This also applies to commands like tr that do
not accept file arguments.
# useless use of cat
$ cat greeting.txt | tr 'a-z' 'A-Z'
HI THERE
HAVE A NICE DAY
Such useless use of cat might not have a noticeable negative impact for most cases. But it
becomes important if you are dealing with large input files. Especially for commands like tac
and tail which will have to wait for all the data to be read instead of directly processing from
the end of the file if they had been passed as arguments (or using shell redirection).
If you are dealing with multiple files, then the use of cat will depend upon the desired result.
Here are some examples:
# match lines containing 'o' or '0'
# -n option adds line number prefix
$ cat greeting.txt fruits.txt nums.txt | grep -n '[o0]'
5:mango
8:1000
$ grep -n '[o0]' greeting.txt fruits.txt nums.txt
fruits.txt:3:mango
nums.txt:3:1000
For some use cases like in-place editing with sed , you can’t use cat or shell redirection at
all. The files have to be passed as arguments only. To conclude, don’t use cat just to pass the
input as stdin to another command, unless necessary.
tac
tac will reverse the order of the input lines. If you pass multiple input files, each file content
will be reversed separately. Here are some examples:
12
# won't be the same as: cat greeting.txt fruits.txt | tac
$ tac greeting.txt fruits.txt
Have a nice day
Hi there
mango
papaya
banana
If the last input line doesn’t end with a newline, the output will also not have that
newline character.
$ printf 'apple\nbanana\ncherry' | tac
cherrybanana
apple
Reversing input lines makes some of the text processing tasks easier. For example, if there are
multiple matches but you want only the last one. See my ebooks on GNU sed and GNU awk for
more such use cases.
$ cat log.txt
--> warning 1
a,b,c,d
42
--> warning 2
x,y,z
--> warning 3
4,3,1
In the above example, log.txt has multiple lines containing warning . The task is to fetch
lines based on the last match, which isn’t usually supported by CLI tools. Matching the first
occurrence is easy with tools like grep and sed . Hence, tac is helpful to reverse the
condition from the last match to the first match. After processing with tools like sed , the
result is then reversed again to get back the original order of input lines. Another benefit is that
the first tac command will stop reading the input contents after the match is found.
Use the rev command if you want each input line to be reversed character wise.
13
Customize line separator for tac
By default, the newline character is used to split the input content into lines. You can use the
-s option to specify a different string to be used as the separator.
# use NUL as the line separator
# -s $'\0' can also be used instead of -s '' if ANSI-C quoting is supported
$ printf 'car\0jeep\0bus\0' | tac -s '' | cat -v
bus^@jeep^@car^@
# as seen before, the last entry should also have the separator
# otherwise it won't be present in the output
$ printf 'apple banana cherry' | tac -s ' ' | cat -e
cherrybanana apple $
$ printf 'apple banana cherry ' | tac -s ' ' | cat -e
cherry banana apple $
When the custom separator occurs before the content of interest, use the -b option to print
those separators before the content in the output as well.
$ cat body_sep.txt
%=%=
apple
banana
%=%=
teal
green
The separator will be treated as a regular expression if you use the -r option as well.
$ cat shopping.txt
apple 50
toys 5
Pizza 2
mango 25
Banana 10
14
# alternate solution for: tac log.txt | sed '/warning/q' | tac
# separator is zero or more characters from the start of a line till 'warning'
$ tac -b -rs '^.*warning' log.txt | awk '/warning/ && ++c==2{exit} 1'
--> warning 3
4,3,1
See Regular Expressions chapter from my GNU grep ebook if you want to learn about
regexp syntax and features.
Exercises
All the exercises are also collated together in one place at Exercises.md. For solutions,
see Exercise_solutions.md.
The exercises directory has all the files used in this section.
1) The given sample data has empty lines at the start and end of the input. Also, there are
multiple empty lines between the paragraphs. How would you get the output shown below?
# note that there's an empty line at the end of the output
$ printf '\n\n\ndragon\n\n\n\nunicorn\nbee\n\n\n' | ##### add your solution here
1 dragon
2 unicorn
3 bee
2) Pass appropriate arguments to the cat command to get the output shown below.
$ cat greeting.txt
Hi there
Have a nice day
$ echo '42 apples and 100 bananas' | cat ##### add your solution here
42 apples and 100 bananas
Hi there
Have a nice day
• -e option is equivalent to
• -t option is equivalent to
• -A option is equivalent to
5) Will the two commands shown below produce the same output? If not, why not?
15
$ cat fruits.txt ip.txt | tac
6) Reverse the contents of blocks.txt file as shown below, considering ---- as the separator.
$ cat blocks.txt
----
apple--banana
mango---fig
----
3.14
-42
1000
----
sky blue
dark green
----
hi hello
7) For the blocks.txt file, write solutions to display only the last such group and last two
groups.
##### add your solution here
----
hi hello
8) Reverse the contents of items.txt as shown below. Consider digits at the start of lines as
the separator.
16
$ cat items.txt
1) fruits
apple 5
banana 10
2) colors
green
sky blue
3) magical beasts
dragon 3
unicorn 42
17
head and tail
cat is useful to view entire contents of files. Pagers like less can be used if you are working
with large files ( man pages for example). Sometimes though, you just want a peek at the starting
or ending lines of input files. Or, you know the line numbers for the information you are looking
for. In such cases, you can use head or tail or a combination of both these commands to
extract the content you want.
By default, head and tail will display the first and last 10 lines respectively.
$ head sample.txt
1) Hello World
2)
3) Hi there
4) How are you
5)
6) Just do-it
7) Believe it
8)
9) banana
10) papaya
$ tail sample.txt
6) Just do-it
7) Believe it
8)
9) banana
10) papaya
11) mango
12)
13) Much ado about nothing
18
14) He he he
15) Adios amigo
If there are less than 10 lines in the input, only those lines will be displayed.
# seq command will be discussed in detail later, generates 1 to 3 here
# same as: seq 3 | tail
$ seq 3 | head
1
2
3
You can use the -nN option to customize the number of lines ( N ) needed.
# first three lines
# space between -n and N is optional
$ head -n3 sample.txt
1) Hello World
2)
3) Hi there
19
Multiple input files
If you pass multiple input files to the head and tail commands, each file will be processed
separately. By default, the output is nicely formatted with filename headers and empty line
separators.
$ seq 2 | head -n1 greeting.txt -
==> greeting.txt <==
Hi there
You can use the -q option to avoid filename headers and empty line separators.
$ tail -q -n2 sample.txt nums.txt
14) He he he
15) Adios amigo
42
1000
Byte selection
The -c option works similar to the -n option, but with bytes instead of lines. In the below
examples, the shell prompt at the end of the output aren’t shown for illustration purposes.
# first three characters
$ printf 'apple pie' | head -c3
app
Since -c works byte wise, it may not be suitable for multibyte characters:
# all input characters in this example occupy two bytes each
$ printf 'αλεπού' | head -c2
α
# g
̈ requires three bytes
$ printf 'cag
̈e' | tail -c4
̈e
g
20
Range of lines
You can select a range of lines by combining both the head and tail commands.
# 9th to 11th lines
# same as: head -n11 sample.txt | tail -n +9
$ tail -n +9 sample.txt | head -n3
9) banana
10) papaya
11) mango
NUL separator
The -z option sets the NUL character as the line separator instead of the newline character.
$ printf 'car\0jeep\0bus\0' | head -z -n2 | cat -v
car^@jeep^@
Further Reading
• wikipedia: File monitoring with tail -f and -F options
∘ toolong — terminal application to view, tail, merge, and search log files
• unix.stackexchange: How does the tail -f option work?
• How to deal with output buffering?
Exercises
The exercises directory has all the files used in this section.
1) Use appropriate commands and shell features to get the output shown below.
$ printf 'carpet\njeep\nbus\n'
carpet
jeep
bus
21
$ c=##### add your solution here
$ echo "$c"
car
2) How would you display all the input lines except the first one?
$ printf 'apple\nfig\ncarpet\njeep\nbus\n' | ##### add your solution here
fig
carpet
jeep
bus
3) Which command would you use to get the output shown below?
$ cat fruits.txt
banana
papaya
mango
$ cat blocks.txt
----
apple--banana
mango---fig
----
3.14
-42
1000
----
sky blue
dark green
----
hi hello
4) Use a combination of head and tail commands to get the 11th to 14th characters from
the given input.
$ printf 'apple\nfig\ncarpet\njeep\nbus\n' | ##### add your solution here
carp
5) Extract the starting six bytes from the input files ip.txt and fruits.txt .
##### add your solution here
it is banana
6) Extract the last six bytes from the input files fruits.txt and ip.txt .
22
##### add your solution here
mango
erish
7) For the input file ip.txt , display except the last 5 lines.
##### add your solution here
it is a warm and cozy day
listen to what I say
go play in the park
come back before the sky turns dark
8) Display the third line from the given stdin data. Consider the NUL character as the line
separator.
$ printf 'apple\0fig\0carpet\0jeep\0bus\0' | ##### add your solution here
carpet
23
tr
tr helps you to map one set of characters to another set of characters. Features like range,
repeats, character sets, squeeze, complement, etc makes it a must know text processing tool.
To be precise, tr can handle only bytes. Multibyte character processing isn’t supported yet.
Transliteration
Here are some examples that map one set of characters to another. As a good practice, always
enclose the sets in single quotes to avoid issues due to shell metacharacters.
# 'l' maps to '1', 'e' to '3', 't' to '7' and 's' to '5'
$ echo 'leet speak' | tr 'lets' '1375'
1337 5p3ak
You can use - between two characters to construct a range (ascending order only).
# uppercase to lowercase
$ echo 'HELLO WORLD' | tr 'A-Z' 'a-z'
hello world
# swap case
$ echo 'Hello World' | tr 'a-zA-Z' 'A-Za-z'
hELLO wORLD
# rot13
$ echo 'Hello World' | tr 'a-zA-Z' 'n-za-mN-ZA-M'
Uryyb Jbeyq
$ echo 'Uryyb Jbeyq' | tr 'a-zA-Z' 'n-za-mN-ZA-M'
Hello World
tr works only on stdin data, so use shell input redirection for file inputs.
$ tr 'a-z' 'A-Z' <greeting.txt
HI THERE
HAVE A NICE DAY
24
# c-z will be converted to C
$ echo 'apple banana cherry' | tr 'a-z' 'ABC'
ACCCC BACACA CCCCCC
You can use the -t option to truncate the first set so that it matches the length of the second
set.
# d-z won't be converted
$ echo 'apple banana cherry' | tr -t 'a-z' 'ABC'
Apple BAnAnA Cherry
You can also use [c*n] notation to repeat a character c by n times. You can specify n in
decimal format or octal format (starts with 0 ). If n is omitted, the character c is repeated
as many times as needed to equalize the length of the sets.
# a-e will be translated to A
# f-z will be uppercased
$ echo 'apple banana cherry' | tr 'a-z' '[A*5]F-Z'
APPLA AANANA AHARRY
Certain commonly useful groups of characters like alphabets, digits, punctuation, etc have
named character sets that you can use instead of manually creating the sets. Only [:lower:]
and [:upper:] can be used by default, others will require -d or -s options.
# same as: tr 'a-z' 'A-Z' <greeting.txt
$ tr '[:lower:]' '[:upper:]' <greeting.txt
HI THERE
HAVE A NICE DAY
To override the special meaning for - and \ characters, you can escape them using the \
character. You can also place the - character at the end of a set to represent it literally. Can
you reason out why placing the - character at the start of a set can cause issues?
$ echo '/python-projects/programs' | tr '/-' '\\_'
\python_projects\programs
25
See the tr manual for more details and a list of all the escape sequences and character
sets.
Deleting characters
Use the -d option to specify a set of characters to be deleted.
$ echo '2024-08-12' | tr -d '-'
20240812
Complement
The -c option will invert the first set of characters. This is often used in combination with the
-d option.
$ s='"Hi", there! How *are* you? All fine here.'
If you use -c for transliteration, you can only provide a single character for the second set.
In other words, all the characters except those provided by the first set will be mapped to the
character specified by the second set.
$ s='"Hi", there! How *are* you? All fine here.'
Squeeze
The -s option changes consecutive repeated characters to a single copy of that character.
# squeeze lowercase alphabets
$ echo 'HELLO... hhoowwww aaaaaareeeeee yyouuuu!!' | tr -s 'a-z'
HELLO... how are you!!
26
# delete and squeeze
$ echo 'hhoowwww aaaaaareeeeee yyouuuu!!' | tr -sd '!' 'a-z'
how are you
Exercises
The exercises directory has all the files used in this section.
3) Similar to rot13, figure out a way to shift digits such that the same logic can be used both
ways.
$ echo '4780 89073' | ##### add your solution here
9235 34528
4) Figure out the logic based on the given input and output data. Hint: use two ranges for the
first set and only 6 characters in the second set.
$ echo 'apple banana cherry damson etrog' | ##### add your solution here
1XXl5 21n1n1 3h5XXX 41mXon 5XXog
5) Which option would you use to truncate the first set so that it matches the length of the second
set?
27
$ echo '""hi..."", good morning!!!!' | ##### add your solution here
"hi.", good morning!
10) Figure out the logic based on the given input and output data.
$ echo 'Aapple noon banana!!!!!' | ##### add your solution here
:apple:noon:banana:
11) The books.txt file has items separated by one or more : characters. Change this
separator to a single newline character as shown below.
$ cat books.txt
Cradle:::Mage Errant::The Weirkey Chronicles
Mother of Learning::Eight:::::Dear Spellbook:Ascendant
Mark of the Fool:Super Powereds:::Ends of Magic
28
cut
cut is a handy tool for many field processing use cases. The features are limited compared to
awk and perl commands, but the reduced scope also leads to faster processing.
cut will always display the selected fields in ascending order. And you cannot display a field
more than once.
# same as: cut -f1,3
$ printf 'apple\tbanana\tcherry\n' | cut -f3,1
apple cherry
By default, cut uses the newline character as the line separator. cut will add a newline
character to the output even if the last input line doesn’t end with a newline.
$ printf 'good\tfood\ntip\ttap' | cut -f2
food
tap
Field ranges
You can use the - character to specify field ranges. You can skip the starting or ending range,
but not both.
# 2nd, 3rd and 4th fields
$ printf 'apple\tbanana\tcherry\tfig\tmango\n' | cut -f2-4
banana cherry fig
29
Input field delimiter
Use the -d option to change the input delimiter. Only a single byte character is allowed. By
default, the output delimiter will be same as the input delimiter.
$ cat scores.csv
Name,Maths,Physics,Chemistry
Ith,100,100,100
Cy,97,98,95
Lin,78,83,80
Complement
The --complement option allows you to invert the field selections.
30
# except the second field
$ printf 'apple ball cat\n1 2 3 4 5' | cut --complement -d' ' -f2
apple cat
1 3 4 5
If a line contains the specified delimiter but doesn’t have the field number requested,
you’ll get a blank line. The -s option has no effect on such lines.
$ printf 'apple ball cat\n1 2 3 4 5' | cut -d' ' -f4
Character selections
You can use the -b or -c options to select specific bytes from each input line. The syntax is
same as the -f option. The -c option is intended for multibyte character selection, but for
now it works exactly as the -b option. Character selection is useful for working with fixed-width
fields.
31
$ printf 'apple\tbanana\tcherry\n' | cut -c2,8,11
pan
NUL separator
Use the -z option if you want to use NUL character as the line separator. In this scenario,
cut will ensure to add a final NUL character even if not present in the input.
$ printf 'good-food\0tip-tap\0' | cut -zd- -f2 | cat -v
food^@tap^@
Alternatives
Here are some alternate commands you can explore if cut isn’t enough to solve your task.
• hck — supports regexp delimiters, field reordering, header based selection, etc
• choose — negative indexing, regexp based delimiters, etc
• xsv — fast CSV command line toolkit
• rcut — my bash+awk script, supports regexp delimiters, field reordering, negative index-
ing, etc
• awk — my ebook on GNU awk one-liners
• perl — my ebook on Perl one-liners
Exercises
The exercises directory has all the files used in this section.
2) Display the second and fifth fields. Consider , as the field separator.
$ echo 'tea,coffee,chocolate,ice cream,fruit' | ##### add your solution here
coffee,fruit
32
3) Why does the below command not work as expected? What other tools can you use in such
cases?
# not working as expected
$ echo 'apple,banana,cherry,fig' | cut -d, -f3,1,3
apple,cherry
# expected output
$ echo 'apple,banana,cherry,fig' | ##### add your solution here
cherry,apple,cherry
4) Display except the second field in the format shown below. Can you construct two different
solutions?
# solution 1
$ echo 'apple,banana,cherry,fig' | ##### add your solution here
apple cherry fig
# solution 2
$ echo '2,3,4,5,6,7,8' | ##### add your solution here
2 4 5 6 7 8
5) Extract the first three characters from the input lines as shown below. Can you also use the
head command for this purpose? If not, why not?
$ printf 'apple\nbanana\ncherry\nfig\n' | ##### add your solution here
app
ban
che
fig
6) Display only the first and third fields of the scores.csv input file, with tab as the output
field separator.
$ cat scores.csv
Name,Maths,Physics,Chemistry
Ith,100,100,100
Cy,97,98,95
Lin,78,83,80
7) The given input data uses one or more : characters as the field separator. Assume that no
field content will have the : character. Display except the second field, with : as the output
field separator.
$ cat books.txt
Cradle:::Mage Errant::The Weirkey Chronicles
Mother of Learning::Eight:::::Dear Spellbook:Ascendant
Mark of the Fool:Super Powereds:::Ends of Magic
33
##### add your solution here
Cradle : The Weirkey Chronicles
Mother of Learning : Dear Spellbook : Ascendant
Mark of the Fool : Ends of Magic
8) Which option would you use to not display lines that do not contain the input delimiter char-
acter?
10) Figure out the logic based on the given input and output data.
$ printf 'apple\0fig\0carpet\0jeep\0' | ##### add your solution here | cat -v
ple^@g^@rpet^@ep^@
34
seq
The seq command is a handy tool to generate a sequence of numbers in ascending or descend-
ing order. Both integer and floating-point numbers are supported. You can also customize the
formatting for numbers and the separator between them.
Integer sequences
You need three numbers to generate an arithmetic progression — start, step and stop. When
you pass only a single number as the stop value, the default start and step values are assumed
to be 1 .
# start=1, step=1 and stop=3
$ seq 3
1
2
3
Passing two numbers are considered as start and stop values (in that order).
# start=25434, step=1 and stop=25437
$ seq 25434 25437
25434
25435
25436
25437
When you want to specify all the three numbers, the order is start, step and stop.
# start=1000, step=5 and stop=1010
$ seq 1000 5 1010
1000
1005
1010
By using a negative step value, you can generate sequences in descending order.
# no output
$ seq 3 1
$ seq 5 -5 -10
5
35
0
-5
-10
Floating-point sequences
Since 1 is the default start and step values, you need to change at least one of them to get
floating-point sequences.
$ seq 0.5 3
0.5
1.5
2.5
Customizing separator
You can use the -s option to change the separator between the numbers of a sequence. Multiple
characters are allowed. Depending on your shell you can use ANSI-C quoting to use escapes like
\t instead of a literal tab character. A newline is always added at the end of the output.
$ seq -s' ' 4
1 2 3 4
$ seq -s$'\n\n' 3
1
36
Leading zeros
By default, the output will not have leading zeros, even if they are part of the numbers passed
to the command.
$ seq 008 010
8
9
10
The -w option will equalize the width of the output numbers using leading zeros. The largest
width between the start and stop values will be used.
$ seq -w 8 10
08
09
10
$ seq -w 0002
0001
0002
Limitations
As per the manual:
On most systems, seq can produce whole-number output for values up to at least 2^53
. Larger integers are approximated. The details differ depending on your floating-point
implementation.
37
However, when limited to non-negative whole numbers, an increment of less than 200 ,
and no format-specifying option, seq can print arbitrarily large numbers.
Exercises
The exercises directory has all the files used in this section.
# expected output
##### add your solution here
45
44
43
42
4) Is the sequence shown below possible to generate with seq ? If so, how?
##### add your solution here
01.5,02.5,03.5,04.5,05.5
5) Modify the command shown below to customize the output numbering format.
38
$ seq 30.14 3.36 40.72
30.14
33.50
36.86
40.22
39
shuf
The shuf command helps you randomize the input lines. And there are features to limit the
number of output lines, repeat lines and even generate random positive integers.
$ shuf purchases.txt
tea
coffee
tea
toothpaste
soap
coffee
washing powder
tea
You can use the --random-source=FILE option to provide your own source for ran-
domness. With this option, the output will be the same across multiple runs. See Sources
of random data for more details.
shuf doesn’t accept multiple input files. Use cat for such cases.
As seen in the example above, shuf will add a newline character even if it is not
present for the last input line.
40
Repeated lines
The -r option helps if you want to allow input lines to be repeated. This option is usually paired
with -n to limit the number of lines in the output.
$ cat fruits.txt
banana
papaya
mango
If a limit using -n is not specified, shuf -r will produce output lines indefinitely.
The shell will autocomplete unquoted glob patterns (provided there are files that match the given
expression). You can thus easily construct a solution to get a random selection of files matching
the given glob pattern.
$ echo *.csv
marks.csv mixed_fields.csv report_1.csv report_2.csv scores.csv
41
$ shuf -n2 -e *.csv
scores.csv
marks.csv
$ shuf -i 18446744073709551612-18446744073709551616
shuf: invalid input range: ‘18446744073709551616’:
Value too large for defined data type
seq can also help when you need negative and floating-point numbers.
$ seq -10 -8 | shuf
-9
-10
-8
42
$ seq -f'%.4f' 100 0.25 3000 | shuf -n3
1627.7500
1303.5000
2466.2500
See unix.stackexchange: generate random strings if numbers aren’t enough for you.
NUL separator
Use the -z option if you want to use NUL character as the line separator. In this scenario,
shuf will ensure to add a final NUL character even if not present in the input.
$ printf 'apple\0banana\0cherry\0fig\0mango' | shuf -z -n3 | cat -v
banana^@mango^@cherry^@
Exercises
The exercises directory has all the files used in this section.
43
# expected output
##### add your solution here
banana
Cradle:::Mage Errant::The Weirkey Chronicles
Mother of Learning::Eight:::::Dear Spellbook:Ascendant
papaya
Mark of the Fool:Super Powereds:::Ends of Magic
mango
2) What do the -r and -n options do? Why are they often used together?
4) Which option would you use to generate random numbers? Given an example.
5) How would you generate 5 random numbers between 0.125 and 0.789 with a step value
of 0.023 ?
# output shown below is a sample, might differ for you
##### add your solution here
0.378
0.631
0.447
0.746
0.723
44