CH 08
CH 08
CH 08
The Shell
T his chapter introduces the agency that sits between the user and the UNIX
system. It is called the shell. All the wonderful things you can do with UNIX
are possible because this agency understands so much by seeing so little code. It’s like
an efficient secretary who understands your directives from your gestures, and carries
them out by specially devised means you don’t need to know. The shell is a command
processor; it processes the instructions you issue to the machine.
Conceptually, this is one of the most important chapters of the book, and it imple-
ments some of the brilliant ideas of the architects of UNIX. The concepts highlighted
here must be understood clearly. They are based on the Bourne shell, named after its
founder Steve Bourne. It’s the earliest shell that came with the UNIX system. You prob-
ably won’t be using this shell, but the Bourne shell is the lowest common denominator
of them all, and most of its features are available in modern shells as well.
Objectives
• Understand what the shell does to a command. (8.1)
• Use wild-card characters in matching filenames. (8.2)
• Use the \ to escape (remove) the meaning of a special character. (8.3)
• Use single and double quotes to protect a group of characters and understand the
difference between them. (8.4)
• Use the escape sequences used by the echo command. (8.5)
• Understand streams and how the shell treats them as files. (8.6)
• Redirect standard output to a file. (8.6.1)
• Redirect standard input to originate from a file. (8.6.2)
• Redirect standard error to a file. (8.6.3)
• Understand the significance of the files /dev/null and /dev/tty. (8.7)
• Learn the properties of a filter and how the | is used to set up a pipeline for con-
necting two or more commands. (8.8)
• Use command substitution to embed commands in command lines of other com-
mands. (8.10)
• Learn the properties of shell variables. (8.11)
• Learn how commands can be grouped together in a shell script. (8.12)
227
228 Your UNIX: The Ultimate Guide
You probably don’t want to know this right now, but you could also be using any one of
these widely used shells—the C shell, Korn shell and bash. Korn and bash are supersets of
Bourne, so anything that applies to Bourne also applies to them. However, just a few of the
shell’s features discussed in this chapter don’t apply to the C shell. These differences are
noted as and when they are encountered.
Note
To know the shell you are using, invoke the command echo $SHELL. The output could
show /bin/sh (Bourne shell), /bin/csh (C shell), /bin/ksh (Korn shell) or /bin/bash (bash
shell). It does pay to know the shell you are using at this stage.
Command Command
entered $ completed
Shell
Scanning
command
Kernel
Command
running
Since the filenames here have a common string chap, the lengthy command line using
this string repeatedly looks rather wasteful. Why can’t we have a single pattern com-
prising the string chap, along with one or two special characters? Fortunately, the shell
does offer such a solution.
The shell recognizes some characters as special. You can use them to devise a
generalized pattern or model that can often match a group of similar filenames. In that
case, you can use this pattern as an argument to a command rather than supply a long
list of filenames which the pattern represents. The shell itself performs this expansion
on your behalf and supplies the expanded list to the command.
$ ls -x chap*
chap chap01 chap02 chap03 chap04 chap15 chap16 chap17 chapx chapy
chapz
When the shell encounters this command line, it immediately identifies the * as a meta-
character. It then creates a list of files from the current directory that match this pattern.
It reconstructs the command line as below, and passes it on to the kernel for execution:
ls -x chap chap01 chap02 chap03 chap04 chap15 chap16 chap17 chapx chapy chapz
$ echo *
array.pl back.sh calendar cent2fah.pl chap chap01 chap02 chap03 chap04 chap15 ch
ap16 chap17 chapx chapy chapz count.pl date_array.pl dept.lst desig.lst n2words.
pl name.pl name2.pl odfile operator.pl profile.sam rdbnew.lst rep1.pl
You simply see a list of files! The shell uses the * to match files in the current direc-
tory. All files match, so you see all of them in the output.
Windows users may be surprised to know that the * may occur anywhere in a filename, and
not merely at the end. Thus, *chap* matches all the following filenames—chap newchap
Note chap03 chap03.txt.
The next metacharacter is the ?. This matches a single character. When used with
the same string chap (as chap?), the shell matches all five-character filenames beginning
with chap. Place another ? at the end of this string, and you have the pattern chap??. Use
both these expressions separately, and the meaning of the ? becomes obvious:
$ ls -x chap?
chapx chapy chapz
$ ls -x chap??
chap01 chap02 chap03 chap04 chap15 chap16 chap17
These metacharacters relating to filenames are also known as wild cards (something like
the joker that can match any card). The complete list of the shell’s wild cards is shown
with examples in Table 8.1. We’ll now take up the significance of the other wild cards.
The wild-card characters the shell uses to match patterns bear some resemblance to the ones
used by vi and emacs in their regular expressions. But make no mistake, the similarities are
only superficial. Regular expressions are understood and interpreted by the command (like
Note
vi and emacs) and have nothing to do with the shell.
Examples
Command Significance
The character class uses two more metacharacters represented by a pair of brack-
ets []. You can have multiple characters inside this enclosure, but matching takes place
for a single character in the class. For example, a single character expression that can
take one of the values 1, 2 or 4, can be represented by the expression
[124] Either 1, 2 or 4
This can be combined with any string or another wild-card expression, so selecting the
files chap01, chap02 and chap04 now becomes a simple matter:
$ ls -x chap0[124]
chap01 chap02 chap04
You can specify ranges inside the class with a - (hyphen); [a-h] is a character class
using a range. This is normally done with numerals and alphabets because these are the
characters mostly used in filenames. So, to select the first four numbered chapters, you
have to use the range [1-4]:
232 Your UNIX: The Ultimate Guide
$ ls -x chap0[1-4]
chap01 chap02 chap03 chap04
A valid range specification requires that the character on the left have a lower ASCII
value than the one on the right. Using this property, the files chapx, chapy and chapz
can also be listed in a similar manner:
$ ls -x chap[x-z]
chapx chapy chapz
The expression [a-zA-Z]* matches all filenames beginning with an alphabet, irrespective of
case. You can match a word character by including numerals and the underscore character
Note as well—[a-zA-Z0-9_].
[!a-zA-Z]*
When organizing information in groups of files, you should choose the filenames with
care so that one, or at most two, metacharacter patterns can match all of them. If you
don’t do that, be prepared to specify them separately every time you use a command
that accesses all of them!
[!!] matches a single character filename that is not a !. This doesn’t work in the C shell and
Note bash, which use the ! for a different purpose.
Chapter 8: The Shell 233
$ ls -x .???*
.exrc .news_time .profile
However, if the filename contains a dot anywhere but at the beginning, it need not be
matched explicitly. For example, the expression emp*lst matches a dot embedded in
the filename:
$ ls -x emp*lst
emp.lst emp1.lst emp22lst emp2.lst empn.lst
The * doesn’t match all files beginning with a dot. Such files must be matched explicitly. One
*, however, can match any number of embedded dots. For instance, the pattern fw*gz
Note matches the filename fwtk2.1.tar.gz.
rm chap*
which removes all the chapters, you inadvertently introduce a space between chap and
*:
The error message here masks a disaster that has just occurred; the rm command has
removed all files in this directory! A singular * used with rm can be extremely danger-
ous as the shell treats it as a separate argument. In such situations, you should pause
and check the command line before you finally press [Enter].
What if the shell fails to match a single file with the expression chap*? There’s
a surprise element here; the shell also looks for a file named chap*. You should avoid
using metacharacters when choosing filenames, but if you have to handle one, then you
have to turn off the meaning of the * so that the shell treats it literally. This deactiva-
tion feature is taken up in the next section.
Wild cards represent a feature of the shell and not of the command using them.
The shell has to do the wild-card expansion because chap* means nothing to rm—nor
to any command that uses a filename as argument. The design of the UNIX system pre-
vents the execution of a command till the shell has expanded all wild-card expressions.
234 Your UNIX: The Ultimate Guide
It’s not wholly true to suggest that wild cards mean nothing to a command, but only to the
shell. The find command (7.15) accepts wild cards (probably the only UNIX command hav-
ing this feature) as parameters to the -name keyword:
To eliminate the danger of accidental deletion of your files, it makes sense to customize the
rm command so that it always invokes the rm -i command. You can then make a decision
on each file individually. This requires the use of an alias (17.4), which is supported by the
Tip other shells. The alias definitions can be placed in a startup file (17.9), which the shell reads
every time a user logs in.
The silent return of the prompt suggests that the file has been created. A suitable wild-
card pattern used with ls confirms this:
$ ls -x chap*
chap chap* chap01 chap02 chap03 chap04 chap15 chap16 chap17 chapx
chapy chapz
There’s indeed a file with the name chap* in the current directory! The wild-card pat-
tern matched this file along with the others. This file can be a great nuisance and should
be removed immediately. But that won’t be easy. You can’t use rm chap* because that
would remove all files in this list, and not this one only.
How do you remove this file then, without deleting the other files? For this to be
possible, the shell has to treat the asterisk literally instead of interpreting it as a
metacharacter. The answer lies in the \ (backslash)—yet another metacharacter, but
one that removes the meaning of any metacharacter placed after it. Use the \ before the
* and it solves the problem:
The expression chap\* literally matches the string chap*. This is a necessary feature
provided by the shell, and you’ll see how this concept can be extended to other areas
also. The use of the \ in removing the magic from any special character is called escap-
ing or despecializing.
If you have the files chap01, chap02 and chap03 in your current directory, and
then create a file chap0[1-3] by using
then you should escape the two rectangular brackets when accessing the file:
$ ls -x chap0\[1-3\]
chap0[1-3]
$ rm chap0\[1-3\] Deletes chap0[1-3]— one file
$ ls -x chap0\[1-3\]
chap0[1-3] not found File removed
Sometimes, you would need to escape the \ character itself. Since the shell treats this
as a special character, you need another \ to escape it:
$ echo \\
\
$ echo "The newline character is \\n"
The newline character is \n
Apart from the wild cards, there are other characters that the shell considers special.
Many of them will often need escaping. Here are five of them:
$ echo \|\<\>\'\"
|<>'"
The shell uses these characters for its interpretive work. The |, < and > are required for
handling command input and output. The ' and " also protect special characters, but a
group of them. We’ll take a detailed look at these characters in this chapter.
This is the find command at work—a command often used with several arguments.
Escaping is the best way of imparting readability to these lengthy command lines. The
\ here escapes the meaning of the newline character generated by [Enter]. It also pro-
236 Your UNIX: The Ultimate Guide
duces the second prompt (which could be a > or a ?), which indicates that the command
line is incomplete.
Escaping is an unsatisfactory solution when you need to despecialize the mean-
ing of a group of characters instead of a single one. Quoting is a better alternative.
The second prompt could be a > or a ?, but in either case it implies that the command is not
complete. The C shell uses the ?, but other shells use the >.
Note
8.4 Quoting
There’s another way to turn off the meaning of a special character. When a command
argument is enclosed in quotes, the meanings of all enclosed special characters are
turned off:
The argument above is said to be quoted. Double quotes would also have served the
purpose, but in some cases, use of double quotes does permit interpretation of some of
the special characters (especially the $ and `). This will become clear as more features
of the shell are exposed. For a beginner, single quotes are the safest as they protect all
special characters (except the quotes themselves!).
The above arguments to echo could have been preserved by escaping the space char-
acter wherever it occurs at least twice:
When you have a large number of characters which you need to protect from the shell,
quoting is preferable to escaping:
We used double quotes this time, and it worked just as well. Quotes also protect the \:
$ echo '\'
\
Chapter 8: The Shell 237
echo is an unusual command. So far, we used the \ to keep the shell out of the picture.
The same character can be used to make echo behave differently—while the shell con-
tinues to stay out. This feature follows next.
Observe that the prompt has been returned, not in the next line, but at the end of the
echoed string. The shell can’t interpret the \ this time because of the quotes. It’s echo
that interprets it and treats the character c as special. \c used here represents an escape
sequence, which positions the cursor immediately after the argument instead of the
next line.
echo also accepts other escape sequences that manipulate the cursor motion in a
number of ways:
\t—A tab
\f—A formfeed (page skip)
\n—A newline
This is how they are used:
echo also accepts ASCII octal values as arguments. For instance, [Ctrl-g] results in the
sounding of a beep. This key has the octal value 007. You can use this value as an argu-
ment to the command, but only after preceding it with a \:
This is the first time we see ASCII octal values used by a UNIX command. (Very few
commands use them.) Some people use echo to display the box-drawing characters on
their terminal using their ASCII values.
The escape sequences described with echo won’t work in this form with the bash
shell used in Linux. For using them, echo must be used with the -e option also:
BASH Shell
echo -e "Enter your name:\c"
238 Your UNIX: The Ultimate Guide
We’ll be using these escape sequences extensively in this text, so if you are a
Linux user, you must commit this option to memory. We’ll also be designing a script
(19.13) that checks the shell that is used and inserts this option automatically!
8.6 Redirection
Many of the commands that we used sent their output to the terminal. You’ve seen the
cat (6.14.1) and bc (3.12) commands also taking input from the keyboard. Were these
commands designed that way to accept only fixed sources and destinations? No, far
from it. They are actually designed to use a character stream without knowing its
source and destination. A stream is just a sequence of bytes that many commands see
as input and output.
UNIX treats these streams as files, and a group of UNIX commands reads from
and writes to these files. A command is usually not designed to send output to the
terminal—but to this file. Likewise, it is not designed to accept input from the keyboard
either—but only from a standard file which it sees as a stream. There’s a third stream
for all error messages thrown out by a program. This stream is the third file.
It’s here that the shell comes in. The shell sets up these three standard files (for
input, output and error) and attaches them to a user’s terminal at the time of logging in.
Any program that uses streams will find them open and available. The shell also closes
these files when the user logs out.
The standard file for input is known as standard input and that for output is
known as standard output. The error stream is known as standard error. By them-
selves, these standard files are not associated with any physical device, but the shell has
set some physical devices as defaults for them:
• Standard input—the default source is the keyboard.
• Standard output—the default destination is the terminal.
• Standard error—the default destination is the terminal.
It’s by design that both standard output and standard error share the same default
device—the terminal. In the ensuing topics, you’ll see how the shell reassigns (replaces)
any of these files by a physical file in the disk the moment it sees some special char-
acters in the command line. This means that instead of input coming from the keyboard
and output and error going to the terminal, they can be redirected to come from or go
to any disk file or some other device.
The shell looks at the >, understands that standard output has to be redirected, opens
the file newfile, writes the stream into it and then closes the file. And all this happens
without who knowing absolutely anything about it! newfile now contains a list of users
(the output of who). This is the way we save command output in files.
If the output file doesn’t exist, the shell creates it before executing the command.
If it exists, the shell overwrites it; so use this operator with caution. Alternatively, you
can append to a file using the >> (the right chevron used twice) symbols:
Redirection also becomes a useful feature when concatenating the standard output of a
number of files. Using wild cards you can set up an abbreviated command line:
You can also combine two or more commands and redirect their aggregate output to a
file. A pair of parentheses groups the commands, and a single > symbol can be used to
redirect both of them:
The previous chapters couldn’t prove convincingly that UNIX makes very little dis-
tinction between various types of files. Redirection often doesn’t care about file type.
It can work with a device name to echo a message on someone’s terminal. The follow-
ing command redirects a message to the terminal /dev/tty02, and a user working on
this terminal could see it (provided the terminal is set up accordingly).
The terminal is the first (and default) destination of standard output. The disk file is the
second. There’s a third destination—as input to another program, which we’ll take up
when discussing pipelines. The handling of the standard output stream using these
three destinations is shown in Fig. 8.2.
Pipe
Terminal
command (Default)
File
brown fox jumbed over the
lazy dog. The quick brown fox
jumbed over the lazy dog. The
quick brown fox jumbed over
the lazy dog.
240 Your UNIX: The Ultimate Guide
When the output of a command is redirected to a file, the output file is created by the shell
before the command is executed. Any idea what cat foo > foo does?
Note
$ wc No filename!
2 ^ 32 Spaces provided deliberately
25 * 50
30*25 + 15^2
[Ctrl-d]
3 9 39 No filename in output
You used cat (6.14.1) in this way too. Enter the three lines of text (a group of mathe-
matical expressions), signify the end of input with [Ctrl-d], and then press [Enter]. wc
immediately counts 3 lines, 9 words and 39 characters in its standard input.
This input can be similarly redirected to originate from a file (the second source).
First fill up the file calc.lst with the three expressions (using cat > calc.lst).
Now, when the metacharacter < (left chevron) is used in this way, the shell redirects
wc’s standard input to come from this file:
$ wc < calc.lst
3 9 39
This too is standard input, but in its second form. Note once more that wc didn’t open
the file. It can do so, but only when it uses a filename as an argument:
$ wc calc.lst
3 9 39 calc.lst
Note that wc this time shows the filename; it very well can because it opens the file itself.
You may have already framed your next question. Why bother to redirect the
standard input from a file if the command can read the file itself as above? The answer
is that there are times when you need to keep the command ignorant of the source of
its input. This aspect, representing one of the most deep-seated features of the system,
will gradually expose itself as you progress through these chapters.
To sum up, the standard input stream also has three sources:
• The keyboard, the default source.
• A file using redirection with <.
• Another program using a pipeline (to be taken up later).
The handling of the standard input stream from these three sources is shown in
Fig. 8.3.
Chapter 8: The Shell 241
Pipe
Keyboard
command
(Default)
The quick brown fox jumbed
over the lazy dog. The quick
brown fox jumbed over the
lazy dog. The quick brown fox
File
jumbed over the lazy dog. The
quick brown fox jumbed over
the lazy dog.
When the standard input is redirected to come from a file (with <), it’s the shell that opens
the file. The command here is totally ignorant of the shell’s activities. However, when the file-
name is supplied as an argument to a command, it’s the command which opens the file and
Note
not the shell.
Making Calculations in a Batch You have used the bc command (3.12) as a cal-
culator. This command still has a few surprises in store for us. Take, for instance, the
file calc.lst that contains some arithmetic expressions. You can redirect bc’s standard
input to come from this file:
Look what’s happened here. bc took each line from calc.lst and evaluated it. You
don’t need to perform your calculations “on-line” anymore. You can place them in a
file and run the whole job as a batch. You can also save the output in a separate file
(using >). It would be better still if we could have each expression in calc.lst beside
the computed result. We’ll do that too, but only after we learn how to use the paste
command.
Input Both from File and Standard Input When a command takes input from
multiple sources—say a file and standard input, the - symbol must be used to indicate
the sequence of taking the input. The meaning of the following sequences should be
quite obvious:
cat - foo First from standard input and then from foo
cat foo - bar First from foo, then standard input, and then bar
There’s a fourth form of standard input which we have not considered here. It’s the here
document that has application in shell programming and hence discussed in Chapter 19.
242 Your UNIX: The Ultimate Guide
$ cat bar
cat: cannot open bar: No such file or directory
The standard error stream can also be reassigned to a file. Using the symbol for stan-
dard output obviously won’t do:
You tried to “cat” a file that doesn’t exist, but the error message still shows up on the
terminal. Before we proceed any further, you should know that each of these three stan-
dard files has a number, called a file descriptor, which is used for identification:
0—Standard input < is same as 0<
1—Standard output > is same as 1>
2—Standard error Must be 2> only
These descriptors are implicitly prefixed to the redirection symbols. For instance, > and
1> mean the same thing to the shell, while < and 0< also are identical. You normally
don’t need to use the numbers 0 and 1 to prefix the redirect symbols because they are
the default values. However, we need to use the descriptor 2> for the standard error:
This works. You can also append diagnostic output in a manner similar to the one in
which you append standard output:
You can now save error messages in a separate file. This enables you to run long pro-
grams and save error output to be viewed at the end of the day.
The standard error is handled differently by the C shell, so the examples of this sec-
tion won’t work with it. In fact, the C shell merges the standard error with the stan-
C Shell dard output; it has no separate symbol for handling standard error only.
You used the first command (6.14.1) to create a file by redirecting both input and out-
put. You can also combine the < and > operators; there’s no restriction imposed on their
sequence either:
The <, > and the >> operators are indifferent to the presence of spaces around them. In
all these cases, the shell keeps the command ignorant of both source and destination.
The last example illustrates a significant departure from a statement made previously
(2.4) that the first word in the command line is the command. In the last example, wc
is the last word in the command line.
The standard output and error symbols can also be used in the same command line:
Sometimes, you’ll need to direct both the standard output and standard error streams to
the same file. You’ll have to use some more special symbols, and you’ll learn about
them in Chapter 19.
The indifference of a command to the source of its input and destination of its
output is one of the most profound features of the UNIX system. It raises the possibil-
ity of commands “talking” to one another, so that the output of one command can be
used as the input to another. We’ll set up pipelines later that permit this communica-
tion. The handling of the three streams is shown in Fig. 8.4.
command
Standard Error
The device file /dev/null simply incinerates all output directed towards it. Its size
always remains zero. This facility is useful in redirecting error messages away from the
terminal so that they don’t appear on the screen. The following sequence attempts to
“cat” a nonexistent file without cluttering the display:
/dev/null is actually a pseudo-device because, unlike all other device files, it’s not
associated with any physical device.
The second special file in the UNIX system is the one indicating one’s terminal—
/dev/tty. Consider, for instance, that romeo is working on terminal /dev/tty01 and
juliet on /dev/tty02. However, both romeo and juliet can refer to their own terminals
with a single device file—/dev/tty. Thus, if romeo issues the command
who >/dev/tty
the list of current users is sent to the terminal he is currently using—/dev/tty01. Sim-
ilarly, juliet can use an identical command to see the output on her terminal
/dev/tty02. Like /dev/null, /dev/tty is another special file that can be accessed
independently by several users without conflict.
Chapter 8: The Shell 245
You may ask why one should need to specifically redirect any output to one’s
own terminal since the default output goes to the terminal anyway. Sometimes, you do
need to specify that explicitly. Apart from its use in redirection, this file can also be
used as an argument to some UNIX commands. Section 8.9 makes use of this feature,
while some situations are presented in Chapter 19 (featuring shell programming).
If you use find from an ordinary nonprivileged account to start its search from root, the
command will generate a lot of error messages on being unable to “cd” to a directory. Since
you might miss the selected file in an error-dominated list, the standard error of find should
Tip be directed to /dev/null—like find / -name typescript -print 2>/dev/null.
8.8 Pipes
To understand pipes, we’ll set ourselves the task of counting the number of users cur-
rently logged in. We’ll first attempt the task using the knowledge we possess already.
who produces a list of users—one user per line, and we’ll save this output in a file:
If we now redirect the standard input of the wc -l command (1.10.5) to come from
user.lst, we would have effectively counted the number of users:
$ wc -l < user.lst
3 The number of users
This method of using two commands in sequence has certain obvious disadvantages:
• The process is slow. The second command can’t act unless the first has com-
pleted its job.
• You require an intermediate file that has to be removed after the wc command has
completed its run.
• When handling large files, temporary files can build up easily and eat up disk
space in no time.
Here, who’s standard output was redirected, and so was wc’s standard input. You
may ask: Can’t the shell connect these streams together so that one command takes
input from the other? Yes, the shell can, using a special operator as the connector of
two commands—the | (pipe). You can make who and wc work in tandem so that one
takes input from the other:
$ who | wc -l
3
Here, who is said to be piped to wc. No intermediate files are created when they are
used. When a sequence of commands is combined together in this way, a pipeline is
246 Your UNIX: The Ultimate Guide
$ ls | wc -l
15
Note that no separate command was designed to tell you that, though the designers
could easily have provided another option to ls to perform this operation. And because
wc uses standard output, you can redirect this output to a file:
ls | wc -l > fkount
There’s no restriction on the number of commands you can use in a pipeline. But you
must know the behavioral properties of these commands to place them there. Consider
this generalized command line:
command1 | command2 | command3 | command4
It should be pretty obvious that command2 and command3 must support both
standard input and standard output. command1 requires to use standard output only,
while command4 must be able to read from standard input. If you can ensure that, then
you can have a chain of these tools connected together as shown in Fig. 8.5. The com-
mands command2 and command3 who support both streams are called filters. Filters
are the central tools of the tool kit, and are discussed later in four entire chapters.
Printing the man Pages The online man pages of a command often show the key-
words in boldface. These pages contain a number of control characters which have to
be removed before you can print them. The col -b command can remove these char-
acters from its input, which means that the man output has to be piped to col -b:
who grep lp
>foo >bar
Chapter 8: The Shell 247
This sequence sends clear text to a text file, but we can pipe it again to print the page.
The lp command (6.16) prints a file, but also accepts standard input:
grep (15.2) locates lines containing a specified pattern in its input. Here it displays
lines containing the string print. Since grep is also a filter, it should be able take input
from the standard input as well:
This also produces the same output, so what difference does it really make? This ques-
tion was posed before (8.6.2), but we’ll have to answer it this time. To know why we
sometimes need grep to handle a stream rather than a file, consider that grep also
accepts multiple filenames:
grep prints the filenames this time; it can since it opens the files and knows their
names. But sometimes you could be interested in only the content with the filenames
removed. You can have that if you make grep ignorant of the source of its input. Con-
catenate the files with cat and pipe the combined output to grep:
Since grep acts on a stream this time, the filenames vanish from the output. You’ll
come across similar situations as you work your way through.
248 Your UNIX: The Ultimate Guide
In a pipeline, the command on the left of the | must use standard output, and the one on
the right must use standard input.
Note
You can crosscheck the display with the contents of the file user.lst:
$ cat user.lst
romeo tty01 May 18 09:32
juliet tty02 May 18 11:18
andrew tty03 May 18 13:21
How do you use tee to display, both the list of users and its count on the terminal?
Since the terminal is also a file, you can use the device name /dev/tty as an argument
to tee:
The advantage of treating the terminal as a file is apparent from the above example. You
couldn’t have done so if tee (or for that matter, any UNIX command) had placed
restrictions on the type of file it could handle. Here the terminal is treated in the same
way as any disk file. tee also uses the -a (append) option which appends the output
rather than overwrites it.
Chapter 8: The Shell 249
Now the last part of the statement (beginning from “Wed”) represents the output of the
date command. How does one incorporate this date command into the echo state-
ment? With command substitution, it’s a simple matter. Use the expression `date` as
an argument to echo:
When scanning the command line, the ` (backquote or backtick) is another metachar-
acter that the shell looks for. There is a special key on your keyboard (generally at the
top-left) that generates this key, and should not be confused with the single quote (').
The shell then executes the enclosed command, and replaces the enclosed command
text with the output of the command. For command substitution to work, the command
so “backquoted” must use standard output. date does; that’s why command substitu-
tion worked.
You can use this feature to generate useful messages. For example, you can use
two commands in a pipeline, and then use the output as the argument to a third:
The command worked properly even though the arguments were double-quoted. It’s a
different story altogether when single quotes are used:
We encounter the first difference between the use of single and double quotes. The ` is
one of the few characters interpreted by the shell when placed within double quotes. If
you want to echo a literal `, you have to use single quotes.
Command substitution has interesting application possibilities. It speeds up work
by letting you combine a number of instructions in one. You’ll see more of this feature
in subsequent chapters.
Command substitution is enabled when the backquotes and the enclosed command are
placed within double quotes. If you use single quotes, then it’s not.
Note
250 Your UNIX: The Ultimate Guide
The Korn shell and bash also offer a more readable synonym for the backquote. You
can place the command inside parentheses and precede the string with a $:
KORN Shell
$ echo $(date)
Mon Sep 20 20:09:23 EST 1999
BASH Shell If you are using either of these shells, then you should adopt this form rather
than the unreadable and archaic form using backquotes. This is the form recom-
mended by POSIX as well.
A variable name comprises the letters of the alphabet, numerals and the underscore
character; the first character must be a letter. Moreover, the shell is sensitive to case;
the variable x is different from X. To remove a variable, use unset:
$ unset x
$ echo $x Variable removed
$ _
All shell variables are initialized to null strings by default. Sometimes, you’ll need to
explicitly set them to null values:
x= x='' x=""
For assigning values to shell variables, make sure that there are no spaces on either side of
the =. If you provide them, the shell will treat the variable as a command and the = and value
Caution as its arguments!
To assign multiword strings to a variable, you can escape the space character, but
quoting is the preferred solution:
Now that you have another special character to deal with ($) that is gobbled up by the
shell, you may still need to interpret it literally without it being evaluated. This can be
done by either single-quoting the expression containing the $ or by escaping the $:
The output is predictable enough, but when you enclose the arguments within double
quotes, you get a different result:
Here is the second difference between the use of single and double quotes. Like the
backquote, the $ is also evaluated by the shell when it is double-quoted. Here, the shell
evaluated a “variable” $1; it’s undefined, so a null string was output. $1 belongs to a
set of parameters called positional parameters (18.4) that signify the arguments you
pass to a script.
Like command substitution, variable evaluation doesn’t take place within single quotes but
Note only within double quotes.
The C shell uses the set statement to set variables. There either has to be whitespace
on both sides of the = or none at all:
C Shell
set x = 10
set mydir=`pwd`
The evaluation is done in the normal manner by prefixing a $ to the variable name.
The C shell uses another statement, setenv, to set a different type of variable; you’ll
meet them in Chapter 17.
Setting a Pathname to a Variable In the command line, you can set a pathname
to a variable and then use its shorthand representation with the cd command:
$ mfile='/usr/spool/mail'
$ cd $mfile
$ pwd
/usr/spool/mail
Now, suppose you have to use this absolute pathname (/usr/spool/mail/) several
times in a script. You can assign it to a variable at the beginning of the script and then
252 Your UNIX: The Ultimate Guide
use it everywhere—even in other scripts run by this script. Later, you may decide to
change the location of the mail directory to /var/spool/mail. For everything to work
as before, you need to just change the variable definition—nothing else.
The output you see is from the execution of the tar command, a UNIX utility used for
backing up files. Here again, you can appreciate the advantages of defining a variable
and using it everywhere. If the backup device changes, just replace fd0h1440 by the
new device name in the definition.
Using Command Substitution to Set Variables You can also use the feature of
command substitution to set variables. For instance, if you were to set the complete
pathname of the present directory to a variable mydir, you could use
$ mydir=`pwd`
$ echo $mydir
/usr/romeo
Variable usage isn’t restricted to the user alone. The UNIX system also uses a number
of variables to control its behavior. There are variables that tell you the type of termi-
nal you are using, the prompt string that you use, or the directory where the incoming
mail is kept. These variables are often called environment variables because they can
alter the operation of the environment in many ways. A detailed discussion of the sig-
nificance of these special shell variables will be taken up in Chapter 17.
The extension .sh is used only for the purpose of identification; it can have any exten-
sion, or even none. Try executing the file containing these commands by simply invok-
ing the filename:
$ script.sh
script.sh: execute permission denied
Executable permission is usually necessary for any shell procedure to run, and by
default, a file doesn’t have this permission on creation. Use chmod to first accord exe-
cutable status to the file before executing it:
The script executes the three statements in sequence. Even though we used the shell as
an interpreter, it is also a programming language. You can have all the standard con-
structs like if, while and for in a shell script. The behavior of the UNIX system is
controlled by many prewritten shell scripts that are executed during system startup and
those written by the system administrator. Two chapters in this text are reserved for
shell programming. (Chapters 18 and 19).
further. There are other characters which are acted upon, and you’ll come across them
as the shell’s features are gradually revealed. This revelation will be spread across sev-
eral chapters.
➤ GOING FURTHER
GOING FURTHER
The comma here acts as the delimiter between the uncommon expressions placed
within the braces. There must not be any whitespace on either side. And here’s how you
can copy the .txt and .gz versions of the files README and INSTALL:
cp {INSTALL,README}.{gz,txt} ../doc
This feature shortens the command line considerably; with Bourne, you would have
had to specify all the four filenames separately. It also means that you can access mul-
tiple directories using a shortened syntax:
cp /home/romeo/{project,html,scripts}/* .
Chapter 8: The Shell 255
This copies all files from the three directories (project, html and scripts) to the cur-
rent directory. Isn’t this convenient? This feature is also available in the C shell, but not
the one that is discussed next.
The Invert Selection Feature If you have used Windows Explorer, you would no
doubt have used the Invert Selection feature. This option reverses the selection you
make with your mouse and highlights the rest. bash and Korn also provide a similar
feature of matching all filenames except those in the expression. For instance, this
expression
matches all except the .exe files. If you want to include multiple expressions in the
exception list, then use the | as the delimiter:
cp !(*.jpg|*.jpeg|*.gif) ../text
This copies all except the graphic files in GIF or JPEG format to the text directory.
Note that the parentheses and | can be used to group filenames only if the ! precedes
the group.
The exclusion feature won’t work in bash unless you make the setting shopt -s extglob.
Even if you don’t understand what this means, simply place this statement in .bash_
Tip profile or .profile, whichever is your startup file.
SUMMARY
The shell is a command that runs when a user logs in, and terminates when she logs
out. It waits for a command to be entered and scans it for special characters (metachar-
acters). It rebuilds the command line before turning it over to the kernel for execution.
The shell matches filenames with wild cards that must be expanded before the
command is executed. It can match any character (*) or a single one (?). It can also
match a range ([]) and negate a match (!). These characters mean nothing to a com-
mand. However, find uses its own set of wild cards.
Any wild card or special character is escaped with a \ to be treated literally, and
if there are a number of them, then they should be placed within quotes. The \ also
escapes the [Enter] key, enabling you to split a lengthy command line into multiple
lines.
Sometimes, escaping is used by a command to attach (rather than remove) a spe-
cial meaning to a character. The echo command uses special escape sequences for
echoing the formfeed character (\f), newline (\n) and tab (\t). echo also uses octal
values. echo \007 produces a beep.
Many commands use data in the form of a character stream. They take input
from the standard input stream and direct output to the standard output stream. By
default, they are set to the keyboard and terminal, respectively. They can also be redi-
rected to come from or go to a disk file or pipeline.
256 Your UNIX: The Ultimate Guide
The symbol > overwrites an existing file and >> appends to it by redirecting stan-
dard output. < redirects standard input. Commands using standard input and standard
output are called filters, a number of which are in the UNIX system.
The standard error represents error messages. Its default destination is the ter-
minal, but it can also be redirected with 2>.
The file /dev/null is a special file that never grows in size even when a stream
of data is directed to it. /dev/tty is a generic device name for every terminal which
every user can use to direct output to.
Using a pipeline, the standard output of one command can be connected to the
standard input of another. A combination of filters placed in pipelines can be used to
perform complex tasks which the commands can’t perform individually.
The tee command breaks the output into two streams. One stream goes to the
standard output, and the other is saved in a file. tee is an external UNIX command and
not a feature of the shell.
Command substitution enables a command’s output to become the arguments of
another command. It is specified within a pair of backquotes (``).
Shell variables are used to store values that can be used in script logic. They are
of the form variable = value but are evaluated by prefixing a $ to the variable name.
The variables that control the workings of the UNIX system are known as environment
variables.
Single quotes protect all special characters, while double quotes enable variable
evaluation and command substitution.
The shell is also a programming language with its own set of constructs like if,
for and while. These constructs can be used in combination with UNIX commands
and variables in a shell script. A shell script generally requires executable permission.
The Bourne shell (sh) is the universal shell, though the C shell (csh) also has a
significant user base. The Korn shell (ksh) and the bash shell (bash) are superior alter-
natives to the Bourne shell and C shell.
GOING FURTHER
GOING FURTHER
The Korn shell and bash extend the wild-card matching features of the Bourne shell.
They use the symbols {} to group multiple patterns using the , as the delimiter of pat-
terns. The ! is used with the grouping operators () and the delimiter | for selecting all
files except those matching an expression.
SELF-TEST
8.1 Why does the shell have to expand the wild cards?
8.2 What does the shell do when it encounters the * as a single argument to a com-
mand?
8.3 Match the filenames chapa, chapb, chapc, chapx, chapy and chapz with one
expression.
8.4 Does rm * remove all files?
8.5 How do you list all filenames that have at least four characters?
Chapter 8: The Shell 257
8.6 Which UNIX command uses wild cards as part of its syntax?
8.7 When using cat > foo, what happens if foo already contains something?
8.8 What happens when you use who >> foo and foo doesn’t exist?
8.9 You have a long command sequence which you want to split into multiple lines.
What precautions do you need to take?
8.10 What is this command meant to do? Is it legitimate in the first place?
>foo <bar bc
8.11 What is the best method of ensuring that error messages are not seen on the ter-
minal?
8.12 Make this setting at the command prompt. Can you execute $x?
x='ls | more'
8.13 Enter the commands echo “$SHELL” and echo ‘$SHELL’. What difference do
you notice?
8.14 How do you find out the number of users logged in?
8.15 Attempt the variable assignment x = 10 (space on both sides of the =). Does it
work?
8.16 What is the difference between directory='pwd' and directory=`pwd`?
8.17 What is the standard shell used in Linux?
8.18 The command echo “Enter your name\c” didn’t put the cursor at the end of
the prompt in Linux. Why?
EXERCISES
8.1 Using wild cards, frame a pattern where the first character is alphabetic and the
last character is not numeric.
8.2 What is the significance of the command ls *.*? Does it match files that don’t
contain a dot?
8.3 Consider the pattern .*.*[!.] How many dots could there be in filenames that
match this pattern?
8.4 How do you remove only the hidden files of your directory?
8.5 How do you remove a file beginning with a hyphen in the foo directory if you
are not using the C shell?
8.6 Is the expression [3-h]* valid?
8.7 Match all filenames not beginning with a dot.
8.8 Will ls .*swp show the filename .ux.2.swp if it exists?
8.9 How do you mark the completion of a command with a beep?
8.10 When does cd * work?
8.11 What happens when you use cat foo > foo?
8.12 Execute the command ls > newlist. What interesting observation can you
make from the contents of newlist?
8.13 You want to concatenate two files, foo1 and foo2, but also insert some text in
between from the terminal. How will you do this?
258 Your UNIX: The Ultimate Guide
3 20 103 infile
3 20 103
8.17 What is a filter? Where does a filter get its input from?
8.18 What are the two consequences of using double quotes?
8.19 Using command substitution, write a command sequence which always prints
the calendar of the current month.
8.20 For command substitution to work with a command, does the command have
to be a filter?
8.21 A shell script foo.sh contains just this line—who >/dev/tty. Since the out-
put of the command comes to the terminal, can you redirect the script by using
foo.sh > bar?
GOING FURTHER
GOING FURTHER
8.22 Without using a script, can you copy all files not having the .bak extension to
a directory foobar? When will the command not work?
KEY TERMS
character class (8.2.2) quoting (8.4)
character stream (8.6) redirection (8.6)
command parsing (8.13) shell (8.1)
command substitution (8.10) shell script (8.12)
despecializing (8.3) shell variable (8.11)
environment variable (8.11.1) sleeping (8.1)
escape sequence (8.5) standard error (8.6.3)
escaping (8.3) standard input (8.6.2)
file descriptor (8.6.3) standard output (8.6.1)
filter (8.8) waiting (8.1)
metacharacter (8.2.1) waking (8.1)
pipeline (8.8) wild card (8.2.1)