0% found this document useful (0 votes)
10 views32 pages

Linux Lecture-6 Finale

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 32

Unit - 1_Lecture -6

Filters and Regular Expressions


By
Dr. Virabhadrasinh A. Gohil (Ph.D.) (Visiting Faculty)
Department of Physics
Maharaja Krishnakumarsinhji Bhavnagar University
Bhavnagar
Outline
1. INTRODUCTION
2. USING wc, head, tail AND cut COMMANDS
3. TRANSLATING CHARACTERS (tr)
4. SPECIFYING FILE AND SEARCHING FOR A PATTERN (grep)
5. PERFORMING SUBSTITUTION (sed)
6. ENHANCING POWER OF grep AND sed WITH REGULAR
EXPRESSIONS
7. HOMEWORK
8. Answer key for the last HOMEWORK

24-12-2020 vag@mkbhavuni.edu.in 2
INTRODUCTION

24-12-2020 vag@mkbhavuni.edu.in 3
1. INTRODUCTION(Contd.)
• Filters are UNIX commands that take input from the standard input
(keyboard), perform some utility operation on the data and then send the
result to the standard output (monitor). For example, the commands cat,
sort, cut, more, pg, tail, wc etc. are all filters. In this chapter we will be
learning some commonly used filters.
• Regular Expressions are sequence of characters that can match a set of
strings in a text. These are used in searching for and replacing strings that
match a particular pattern, in a text. The symbol * that matches zero or
more of the preceding character, [] that matches any one of the enclosed
character, [^] that matches any character that is not enclosed and . That
matches any character except new-line are some of the symbols used in
regular expressions. In this chapter, we will learn regular expressions as a
powerful tool that UNIX provides for data manipulation.
24-12-2020 vag@mkbhavuni.edu.in 4
2. USING wc, head, tail AND cut COMMANDS
• In order to learn and test the commands that we will be discussing in this chapter, let
us use the file called book.lst. The file book.lst contains details of a few books. Each
record holds the book-id, subject , publisher name, date of purchase and author’s
name. the contents of the file are as given in Figure 6.1.
• The command wc is used to count the number of lines, words and characters in a
given text or file.
• When given without any options, wc will give an output as shown in Figure 6.2. The
wc command shows an output that indicates that there are “10” lines, “126” words
and “702” characters in the file book.lst. If multiple filenames are given as input , wc
produces a line for each file, as shown in Figure 6.2. you will note that it also
produces a total count of lines, words, and characters in all the given files.
• One can also use wc to list only line count or only word count or only character count
or a combination of them. Table 6.1 lists the different options that wc provides to do
such a specific count.

24-12-2020 vag@mkbhavuni.edu.in 5
2. USING wc, head, tail AND cut
COMMANDS(Contd.)

Figure 6.1 Contents of file book.lst


24-12-2020 vag@mkbhavuni.edu.in 6
2. USING wc, head, tail AND cut
COMMANDS(Contd.)
Option Used to
-l Count only the number of lines in a file
-w Count only the number of words in a file
-c Count only the number of characters in a file
Table 6.1 Options of wc

24-12-2020 vag@mkbhavuni.edu.in 7
2. USING wc, head, tail AND cut
COMMANDS(Contd.)

Figure 6.2a A Linux session illustrating use of wc and cut commands


24-12-2020 vag@mkbhavuni.edu.in 8
2. USING wc, head, tail AND cut
COMMANDS(Contd.)
• To count both the number of lines and number of words in a file, give the
option -lw and similarly for other combinations that you desire.
• The commands head and tail are explained earlier in chapter 2.
• The command cut is used to filter particular columns or fields out of a file. To
extract columns, cut is used with -c option and to extract fields cut is used
with -f option.
• For example, to extract the columns containing subject name and author
name, we may give the following command (See Figure 6.2):
$ cut -c 7-27, 58-72 book.lst
• Note that there should be no whitespaces, but ranges and commas in the
column-list. We can also specify the beginning and end of the line in cut. If we
write -27, it means character position 1-27 and if we write 58- it means
character position 58 to the end of the line.

24-12-2020 vag@mkbhavuni.edu.in 9
2. USING wc, head, tail AND cut
COMMANDS(Contd.)

Figure 6.2b A Linux session illustrating use of wc and cut commands


24-12-2020 vag@mkbhavuni.edu.in 10
2. USING wc, head, tail AND cut
COMMANDS(Contd.)
• In the above example, using cut -c we had to count the length of the
maximum value in each column so that we could give the range. This can be
eliminated with the -f option. Here, we have to specify the delimiter of the
field using the –d option. The default delimiter is the tab. In our file, it is pipe
(|) sign. Therefore, to extract the second and last fields we give the command
(See Figure 6.2):
$ cut –d \| -f 2,5 book.lst
Note that we have escaped the pipe (|) with the backslash (\) so that the shell
does not interpret as the metacharacter pipe (|).
We can redirect the output of the cut operation to another file, using the
output redirection operator (>) as follows:
$ cut –d \| -f 2,5 book.lst > subj-Auth.lst
24-12-2020 vag@mkbhavuni.edu.in 11
2. USING wc, head, tail AND cut
COMMANDS(Contd.)

Figure 6.2c A Linux session illustrating use of wc and cut commands

24-12-2020 vag@mkbhavuni.edu.in 12
3. TRANSLATING CHARACTERS (tr)
• The filters that we learned so far, were useful for handling either entire lines or entire columns in a file.
However, the tr command can work on individual characters in a file. It takes two expressions as input. It
translates each character of the first expression with its counterpart in the second expression. For example,
suppose we wish to translate all | in out book.lst file with ‘,’ and all / with -, then we have to give the following
tr command:
$ tr ‘|/’ ‘,-’ < book.lst
• Note that the two expressions are given in single quotes. The first character (|) in the first expression is
replaced by the first character (,) in the second expression. And the second character (/) in the first expression
is replaced with the second character (-) in the second expression. The output is given in Figure 6.3.
• By default, tr takes its input from the standard input. Therefore, we had to redirect the input (so that tr takes
input from the file) using the left chevron (<) operator in the command line. The output of the tr command
can be stored in a file using right chevron (>) operator.
• tr can also work on ranges in the expression list. For example, to convert all the lowercase letters in book.lst
file to uppercase, we can give a tr command as follows
$ tr ‘[a-z]’ ‘[A-Z]’ < book.lst
• When used with -d option, tr can delete strings that match a particular expression from the file. For example,
to delete all pipes (|) from the file book.lst, give the following command:
$ tr –d ‘|’ < book.lst

24-12-2020 vag@mkbhavuni.edu.in 13
3. TRANSLATING CHARACTERS (tr)(Contd.)

Figure 6.3a A Linux session illustrating use of tr command


24-12-2020 vag@mkbhavuni.edu.in 14
4. SPECIFYING FILE AND SEARCHING FOR A
PATTERN (grep)
• The command grep is one of the most popular and most useful filters. It searches for a particular pattern of
characters that you specify and displays every line in the file(s) that matches this pattern.
• grep stands for globally find regular expression and print, i.e. search through an entire file (global search) for
the specified regular expression (pattern to be matched) and then print the line or lines that contain the
pattern.
• By default, the grep command reads from the file specified in the command line, but if no files are specified,
it reads from the standard input. The syntax for the grep command is:
grep [options] pattern filename
• The pattern is searched for its occurrences in the specified file. You can also specify more than one file. The
first argument is always treated as the expression (pattern) and the rest are treated as the filenames.
• The command to display lines containing “R&R Corp.” is:
$ grep ‘R & R Corp.’ book.lst
• Note that the pattern is quoted. The output is shown in Figure 6.3.
• The options that can be used with grep are given in Table 6.2.

24-12-2020 vag@mkbhavuni.edu.in 15
4. SPECIFYING FILE AND SEARCHING FOR A
PATTERN (grep)(Contd.)
Option Used to
-c Count the number of occurrences of the pattern in the
file(s)
-n Display line numbers along with the lines containing
the pattern
-v Select lines not containing the pattern
-l Display only the names of the files where a pattern
has been found
-i Ignore case for pattern matching
-h Omit filenames when handling multiple files
Table 6.2 Options of grep

24-12-2020 vag@mkbhavuni.edu.in 16
4. SPECIFYING FILE AND SEARCHING FOR A
PATTERN (grep)(Contd.)
• Suppose we need to find out the number of books published by
“SSN” we can use the -c option and give the command as: (See Figure
6.3)
$ grep –c ‘SSN’ book.lst
• Similarly, try out other grep options as well.

24-12-2020 vag@mkbhavuni.edu.in 17
4. SPECIFYING FILE AND SEARCHING FOR A
PATTERN (grep)(Contd.)

Figure 6.3b A Linux session illustrating use of grep command


24-12-2020 vag@mkbhavuni.edu.in 18
5. PERFORMING SUBSTITUTION (sed)
• There are many expressions used with sed command for performing a wide variety of tasks. But, we will
focus on using sed to substitute one pattern for another in a file.
• sed stands for stream editor.
• The format of the sed command is sed followed by an expression in quotes and the name of the file to read
for input. For substitution, the expression will be the substitution command.
• The general format of the substitution command is: s/old/new/flags, where old is the existing pattern to be
replaced by new pattern. s is the abbreviation for the substitute command. And the two most helpful flags
are g (to replace all occurrences of the old pattern, in each line) and n (to replace only the nth occurrence of
the old pattern, in each line).
• For example, to replace all occurrence of “Organization” by “Architecture” in the book.lst file, you should
give the following command (See Figure 6.4):
$ sed ‘s/organization/Architecture/’ book.lst
• Note that the changes (substitutions) are made only for display. These changes are not permanently made in
the actual file. To make substitutions in the actual file use the output redirection operator (>) as:
$ sed ‘s/Organization/Architecture/’ book.lst > book.lst

24-12-2020 vag@mkbhavuni.edu.in 19
5. PERFORMING SUBSTITUTION (sed)(Contd.)

Figure 6.4a A Linux session illustrating use of sed commands


24-12-2020 vag@mkbhavuni.edu.in 20
5. PERFORMING SUBSTITUTION (sed)(Contd.)
• We can also provide the line address if we want to limit the substitution to certain lines only. For example,
suppose we need to replace all occurrences of “Organization” by “Architecture” in the lines 7 to 10 only, we
should give the command as:
$ sed ‘7, 10s/Organization/Architecture/’ book.lst
• To substitute all occurrences from lines 7 to the end of the file, the range is specified as 7,$ where $
represents the last line.
• By default, sed command replaces only the left-most occurrence of a pattern in a line. That is, had there
been two ‘Organization’ in a line, it would have replaced only the first ‘Organization’ and the second
occurrence will be left unchanged. To make sed to replace all occurrences of a pattern in a line, we have to
use the g (global) flag as follows (See Figure 6.4):
$ sed ‘s/9/8/g’ book.lst | head -3
• The above command replaces all occurrences of 9 with 8. this is known as global substitution. The head -3 is
used (connected by pipe |) to display only the top three lines of the file.
• Instead of the line addresses, one can also specify pattern to be searched-for only in lines which contain a
specific string. For example, suppose you need to change the book name from ‘Organization’ to
‘Architecture’ only for the publisher ‘R & R Corp.’ you can give the command as:
$ sed ‘/R & R Corp./s/Organization/Architecture/’ book.lst

24-12-2020 vag@mkbhavuni.edu.in 21
5. PERFORMING SUBSTITUTION (sed)(Contd.)

Figure 6.4b A Linux session illustrating use of sed commands


24-12-2020 vag@mkbhavuni.edu.in 22
5. PERFORMING SUBSTITUTION (sed)(Contd.)

Figure 6.4c A Linux session illustrating use of sed commands


24-12-2020 vag@mkbhavuni.edu.in 23
6. ENHANCING POWER OF grep AND sed
WITH REGULAR EXPRESSIONS
• Suppose you need to get a list of books that have been purchased in and after year 2000. one way is that we give four separate
grep commands each time specifying the four different patterns, namely, 2000, 2001, 2002 and 2003. But this is a very tedious job.
UNIX provides the solution for this in the form of regular expressions.
• Regular expressions are like the wild-cards. While the wild-cards are used to match similar filenames with a single expression,
regular expression is used to match a group of similar patterns with a single expression. The regular expressions are listed in Table
6.3.
• Some of the symbols listed in Table 6.3 are same as the metacharacters of the shell, such as the * symbol. Hence, these symbols
should be escaped (using \) when you use them, so that the shell does not interpret these as its metacharacters.
• For example, to see the list of books that have been purchased in and after year 2000, the grep command would be:
$ grep “200[0-3]” book.lst
• The [0-3] will match 2000, 2001, 2002 and 2003. when such ranges are used, character on the left side of the range must be lesser
(in ascii) than the character on the right side. That is, the pattern “200[3-0]” or “[k-i]” would not work. Ranges can be used both for
alphabets as well as numerals. Thus the pattern “[a-zA-Z]” matches a single alphabetic character and the pattern “[a-zA-Z0-9]”
matches a single alphanumeric character. The pattern “[^a-zA-Z]” matches a single non-alphabetic string.
• Negation in regular expression is done using the ^ symbol.

24-12-2020 vag@mkbhavuni.edu.in 24
6. ENHANCING POWER OF grep AND sed
WITH REGULAR EXPRESSIONS(Contd.)
Symbol Description
* Matches zero or more occurrence of the previous
character
. Matches any single character
[xyz] Matches a single character x, y or z
[c1-c2] Matches a single character which is within the ascii
range represented by characters c1 and c2
[^xyz] Matches a single character which is not an x, y or z
^xyz Matches a pattern xyz at the beginning of line
xyz$ Matches a pattern xyz at the end of line
Table 6.3 Regular expression symbols

24-12-2020 vag@mkbhavuni.edu.in 25
6. ENHANCING POWER OF grep AND sed
WITH REGULAR EXPRESSIONS(Contd.)
• Let us take some more examples. Suppose you want to see the list of the books on “Computer
Organization”, but you do not remember the exact spelling, i.e. is it “Organization” or
“Organisation” in the file. This is solved by regular expression as follows:
$ grep “Organi[sz]ation” book.lst
• The * symbol is used when you want to indicate that the previous character can occur many
times. For example, the pattern “s*” indicates that s can occur many times. That is, it can match
“s”, “ss”, “sss”, “ssssssss” and so on.
• For example, the command:
$ grep “S*N” book.lst
• will match the publishers SN, SSN, SSSN or any number of S’s before N.
• The . symbol matches any single character. This is similar to the ? wild-card. For example, the
command:
$ grep “U…” book.lst
• will match a four-character pattern beginning with U.

24-12-2020 vag@mkbhavuni.edu.in 26
6. ENHANCING POWER OF grep AND sed
WITH REGULAR EXPRESSIONS(Contd.)
• The ^ symbol signifies that the pattern should be matched only at the beginning
of a line. This should not be confused with ^ symbol used in regular expressions.
The $ symbol signifies that the pattern should be matched only at the end of the
line. For example, suppose we need to extract the books where the book-id starts
with 2. the command for this is:
$ grep “^2” book.lst
• To select lines where the book-id does not begin with 2, the command is:
$ grep “^[^2”] book.lst
• These regular expressions can also be used with sed command. For example, in
sed, to replace all occurrences of Organization (or Organisation) with
Architecture.
$ sed –n ‘s/Organi[sz]ation/Architecture/’ book.lst
• Similarly, other regular expressions can also be used with sed.

24-12-2020 vag@mkbhavuni.edu.in 27
7. HOMEWORK
• State True or False:
1. We can replace one character by another for all lines in a file using the tr command.
2. The * symbol in the regular expression h* matches one or more occurrences of
character h.
3. The wc command returns the numbers of characters in a word.
4. The tail command displays the last n lines of a file.
5. The grep command searches and prints lines containing a certain pattern.

24-12-2020 vag@mkbhavuni.edu.in 28
8. Answer key for the last HOMEWORK
1. True
2. False
3. True
4. True
5. False

24-12-2020 vag@mkbhavuni.edu.in 29
TEST – 6 Coming Soon…

24-12-2020 vag@mkbhavuni.edu.in 30
LINKS AND REFERENCES
• Book:
• ‘A’ Level Made Simple – Basics of OS, UNIX and Shell Programming by Vineeta
Pillai and Satish Jain
• Images:
• www.google.com
• www.nasa.gov

24-12-2020 vag@mkbhavuni.edu.in 31
Thank You

“I hear, I forget
I see, I remember
I do, I understand”

-CONFUCIUS

24-12-2020 vag@mkbhavuni.edu.in 32

You might also like