UNIX
NETWORK
PROGRAMMING
SECOND EDITIONW a = ~, <>
\Wellimplemented interpracess communications (IPC) are key to the performance af
irwslly every non-trivial UNIX program. In UNIX Network Programming,
Moiume 2, Second Edition, legendary UNIK expert W. Richart Stevens presents
8 comprebersive guide to every form of IPG, mluding message passing, synchronization,
shared memory, and Remote Procedure Calls (RPC).
you've read Stevens’ bes-stling first edition of UNIX Network Programming,
this book expands is 1°C coverage by @ factor of five! You won't just. learn about IC
"from the outside." You'll acwally create implenientations of Pesix message queues,
read-weite locks, and semaphores, gaining an in-depth understanding of these
capabilities you simply can't get anywhere else.
‘The book contains extensive new source cade—allcarcfully optimized snd available
(on the Web. You'll even find 2 complete guide to meaSuring IPC performance
with message passing bandwidth and latency programs, and thread and process
synchronization programs.
The better you understand IPC, the belter your UNIX software will run, ‘This Look
contains all you need to know.
ABOUT THE AUTHOR
W. RICHARD STEVENS is author of UNIX Network Programming, Virst Ktion,
‘widely recognized as the classic text m UNIX networking and LINK Network
Programming, Volume 1, Second dition, Ke is also author of Advanced Programming
in the UNIX Environment and the TCPAP ftusirater Series, Stevens
{san ackrowledged UNIX and networking expert, sought-after
‘instructor, and occasional consultant,
PRENTICE HALL
‘Uppes Sale River. RS 07458
prec ppt camFunction protetype =
Dool.t elmt_control (CLIENT “d, unsigned int requsl, char “pln;
CLIENT “elmt create (const char *hat, unsigned Long prognum,
‘unsigned long veremum, const char protect);
void elnt_deserey (chew cl)
ine oor binaline fl
ine oor _cata tint fil, door arse *ayp):
Ane door exente(Door_server_proe “proc, void *eedie, int all:
ine door ered deor_eredt ‘ed :
Ant door tato( ine ff, door info *#fo);
int door xetura(char ‘ditty, size_t datas, door desc ¢ ‘dowplt, size t mies);
ine door revoke (ine fl)
Beorereate proc *door_server_creste(Door create proc “proc):
int door uabina(voia) 7
old exr dump(const char “fal, 7
void erx_magiconst char “ft, <0:
a8
BeleeegRgeeeesiss
void aur quiticonst char ‘ft, 3) 512
void exe ret (const char “fat. 01: on
vote eer ayalconat char ‘fat, 0: 5
Ine femea (ine ff, ine omd, .. /* struct flock “ag */ Vy 198
int fmeaeint ff, stzvct stat “hyf); 8
key t ftek(conse char *pallneme, ime i) Ea
int feruncateline fi, off ¢ lexgtht: a7
int maetowe crgt miles) id
int mg_getaterngd e migdes, struct me attr tal: %
ine mametléyingd © myles, const struct sigevent *nelfation): 9
nat maepaniconst char ‘namie, int ofiag,
(* weet made, struct mqactr *ettr*/ 1+ %
ine maunLlakiconst char *name) 7
Tne mage (ine mg, ine ond, aeruct seq ae 7m
int maggot (key t fey, Sint ofl) 150
PILE ‘*popaniconst char “commund, const char ype):
lsFunction prototype
Ent pthread cancel (pthread « tid)
void permed cleanup peptist cule);
void pthread_cleanup push(void (-furcion) (void *), void tang)
int pthread create(pthread «via, const pthread attr © all,
vota * (fine) (void *), wosd Mag);
int pthread detach{pehread t tid):
void pthread exte (void *status);
int pthread joim(prhresd ¢ Hd, void **etne)
pthread ¢ pthread self (void)
int pthread condater destroy(pthread condater t “wily;
int pthread condattr getpshared(const pthread condattr t veltr, int “vate;
int pebreadcondattr init (pthread condattr_t *altr)
int pUbread_condattz setpshared (pthread condatert *etir, int vale)
int pthread cond broadoast (pthread cond *epr)
int pebread cond destroy ipthreedcond_t “ptr;
Ane pthvead cond init (pthread cond © “gir, const pthread condattr e ‘altr)
int pthread cond atgnal (pthrasd cond t *
ere porte 1:
1 PSEC orien dled
AL) taystem persistent IPC:
Ciisynter > jis tl PCat is
‘explicily deleted
Figure 12 Persistence IPC objects
1. A process-persistent IPC object remains in existence until the last process that
holds the object open closes the object. Examples are pipes and FIFOs.
2. A kemnel-persistent IPC object remains in existence until the kemel reboots or
until the object is explicitly deleted. Examples are System V message queues,
semaphores, and shared memory. Posix message queues, semaphores, and
shared memory must be at least kemekpersistent, but may be file-
system-persistent, depending, on the implementation.Section 14 Name Spaces 7
14
3. A filesystem: persistent IPC object remains in existence until the object is explicitly
deleted. The object retains its value even if the kernel reboots. Posix message
‘queues, semaphores, and shared memory have this property, if they are imple-
mented using mapped files (not a requirement)
‘We must be careful when defining the persistence of an IPC object because it is not
always as it seems, For example, the data within a pipe is maintained within the kernel,
but pipes have process persistence and not kernel persistence—after the last process
that has the pipe open for reading closes the pipe, the kernel discards all the data and
removes the pipe. Similarly, even though FIFOs have names within the filesystem, they.
also have process persistence because all the data in a FIFO is discarded after the last
process that has the FIFO open closes the FIFO.
Figure 13 summarizes the persistence of the IPC objects that we describe in this,
text
‘TypeolTeC (| Persistence
Tipe process
FIFO process
Pesix mutex process
esx condition variable process
esx read-writ lock process
sent record locking process
Pex meseage queue ered
Peeix named setaphore kere
esix memory-based semaphore | process
ori shared memory Kernel
‘System V mesrage queue ‘kernel
System V semaphore ern
Sytem V shared memory kernel
TCP socket recess)
UDP socket process |
Unix domain socket procese_|
Figure L3 Persistence of various types of IPC objects
Note that no type of IPC has filesystem persistence, but we have mentioned that the
three types of Posix IPC may, depending on the implementation. Obviously, writing
data to a file provides filesystem Persistence, but this is normally not used as a form of
IPC. Most forms of IPC are not intended to survive a system reboot, because the pro-
cesses do not survive the reboot. Requiring filesystem persistence would probably
degrade the performance for a given form of IPC, and a common design goal for IPC is,
high performance.
Name Spaces
‘When two unrelated processes use some type of IPC to exchange information between
themselves; the IPC object must have a name or identifier of some form so that one8 Introduction Chapter]
process (often a server) can create the IPC object and other processes (often one oF more
clients) can specify that same IPC object.
Pipes do not have names (and therefore cannot be used between unrelated pro-
cesses), but FIFOs have a Unix pathname in the filesystem as their identifier (and can
therefore be used between unrelated processes). As we move to other forms of IPC in
the following chapters, we use additional naming conventions. The set of possible
names for a given type of IPC is called its name space. The name space is important,
because with all forms of IPC other than plain pipes, the name is how the client and
server connect with each other to exchange messages
Figure 1.4 summarizes the naming, conventions used by the different forms of IPC.
Namne space aentibeation Voix
pectic toopncrerste | _alteriPCopened | ‘owe. | UX
Tre torame) Sener ae
Fro pathname descptor scat
Posi atox Goname) | pehread mato tp [ep
Postc condition vanable Gone | ‘pthread contre |
Rox ron-we lock Gonsme | pthread cvieck-t pt
fers wecantlocking athe ‘dseriptor eis
Tor meage queue Poser IPC name wet sive 7
Pes remed semaphore Posi PC rare sc poiner cee
oxi memory teced semaphore | fo rae pee paieer tt
Posi shred Pemory PosixIPC name fescaemi | is cfs
Spe Vmessoge qa Key_tley | Syriem VIPC deren >
Stern Vsenmphor® rey-they | System VIPCidentfer :
Stee V tart mee wevlehey | SytemvirCideniier | |
eos ralhame ‘lscripor
San RFC program’ version ReChandle
TCP socket IP addr & TCP port descriptor |g: .
Upped adds £ UDP pert desrptor fe
Unix domain socket pathrame | deceriptor ripe [ee
Figure 1.4. Name paces for the various forms of IPC.
Weealso indicate which forms of IPC are standardized by the 1996 version of Posix.1 and
Unix 98, both of which we say more about in Section 1.7. For comparison purposes, we
include three types of sockets, which are described in detail in UNPv1. Note that the
sockets API (application program interface) is being standardized by the Posix.1g work-
ing group and should eventually become part of a future Posix. standard.
Even though Posix.1 standardizes semaphores, they are an optional feature. Fig-
ture 1.5 summarizes which features are specified by Posix.1 and Unix 98. Each feature is
mandatory, not defined, or optional. For the optional features, we specify the name of
the constant (eg,, _POSTX_THREADS) that is defined (normally in the
header) ifthe feature is supported. Note that Unix 98 is a superset of Posix.1Section 15 Effect of fork, exec, and exit on IPC Objects 9
15
Type of PC Posie 1586 Tae
Pipe ‘mandatory mandatory
HO, mandatory mandatory
oat POST THREADS mandatory
Psx condition variable TPOSTX_THREADS smandatory
processshared mutex/CV | POSIX THREAD_PROCESS.suaReD | mandaiory
Posixread-write lock (otdefined) mandatory
one record locking, smundaiory mandatory
asic message queue “POSTH_MGESSAGE_PASSIIG | _KOPER_REALTINE
osx somaphores =POSTX_ SEMAPHORES Sore pear:
Posi shared memory __POSDE_ SHARED MewoRY_opyects | “xopEN REALTIME
System V message queue ‘ot defined Tmandaiony
‘System V semaphore (not defined) mandatory
‘System V shared memory (ot defined) mandatory
Doors (ot defined ‘rot defined?
‘sun RPC (ot defined) (notdefined) |
map _POSTH_MAFPED_FILES oF ‘mandatory
_=OSTY_SHARED_MENORY_OBSECTS
Realtime signals POSTX_REALTING SIGNALS | _NOPECREALTINS
Figure 15 Avilaility ofthe various forms of TPC.
Effect of fork, exec, and exit on IPC Objects
We need to understand the effect of the fork, exec, and _exit functions on the v
ous forms of IPC that we discuss. (The latter is called by the exit function.) We sum=
marize this in Figure 1.6.
Most of these features are described later in the text, but we need to make a few
points. First, the calling of fork from a multithreaded process becomes messy with
regard to unnamed synchronization variables (mutexes, condition variables, readwrite
Tocks, and memory-based semaphores). Section 6.1 of [Butenhof 19971 provides the
details. We simply note in the table that if these variables reside in shared memory and
are created! with the process-shared attribute, then they remain accessible to any thread
of any process with access to that shared memory. Second, the three forms of System V
IPC have no notion of being open or closed. We will see in Figure 68 and Exercises 11.1
and 14.1 that all we need to know to access these three forms of IPC is an identifier. So
these three forms of IPC are available to any process that knows the identifier, although
‘some special handling is indicated for semaphores and shared memory.10 Introduction Chapter
Type oC Tom wee Tae
Pipes Cid ges copsvofall | al open deserters remain | all open desripios coeds
and Ferenfscpendaxcriptore | openunleedesenpir’s | all dna removed fom pipe
Hos F_cLowen bast or FPO ontactclowe
Peake Tid gets copies ofall all open message queue allepen message queue
macage | purenfecpenmesige | descriptorsare loved | Senrptrs areccned
queves, (une descriptor
SydanV | noice vote ‘noefect
message
guewee | =
Tex aeredifinahared ‘anise ulesoimebered | vanshes unlow insharad
tnutexesand | memory and procs: | memory that stays open | memory tha stays open
condition |) shored anrbute and proces-shared Sd processharel
‘aviables | stebute sribure
Fest chard inahared ‘anibes wiles in shared’ | vanishes ures in shard
read-wrie | memoryand poces | memory thatstaysopen | memory thet tays.open
Toc Share attbute and processshared fend proces-ahared
stsbute setrbute
Tesi SaeliFinaared ‘onishesualesinshared vanishes unless in shared
tmerory-besed | memory and proces: | memory that etye open | znemery that ays open
Semaphore |) shared atribute bd proces-hared dnd proces
axinbuie ates
Peale alTopenin parent remain | ony open are ws any open are dosed
named open in els
soraphores | i
System] ll sennad values inchild | allseradi values cared | allzenad values are
semaphores || aresett00 Overtonew progam | added wyeoresponding
semaphore value
went Todseld by parentove | Tocsareuncfangedas | alfoutstanding locks
rod otinherited byl | longa descriptor remains owned by procs re
loan open vnlocked
7 Tnanory supple | memary mappingsare | memory mappings ae
tremory || panemtareremined by | unmapped
mappings emma # a
Posi rmenery mappingsin’ | renwary mappingsare | memory wappnGS ae
shared farentareretsined by | nmapped sawoapped
memory || aa
System V | attached dared memery | atached tart memary | atached stared memory
stared Segments remain atiached | Sepmentsaredeached | segmentsore detached
menery—_|| tyrants
Doers child getscopien ofall | elt doordesenpiors should | llopen descriptors dcaed
porensopen depts | heclosed becatse they are
Eutenly porent isa scrver | cteted with Fb CLOREC
fordoorinverationson | bitsee
oor deseiptrs
Figure 16 Flfect of calling fork, exec, and exit on IPC.Section 1.6 Error Handling: Wrapper Functions IL
16
Error Handling: Wrapper Functions
In any real-world program, we must check every function call for an error return. Since
terminating on an error is the common case, we can shorten our programs by defining
‘wrapper furiction that performs the actual function call, tests the return value, and termi-
nates on an error. The convention we use is to capitalize the name of the function, as in.
Sem_post ptr}
‘Our wrapper function is shown in Figure L7.
swt Ibfraparixe
386 Senpoot (sont sen)
388 ¢
390 Af sem_postisen) == -1)
set err_sys ("sem_post error"):
302)
it fraprricc
Figure 17 Our wrapper function forthe sen. post function.
Whenever you encounter a funtion name in the text that begins with a capital let-
ter, thal is @ wrapper furiction of our own. It calls @ function whose name is the
same but begins with the lowercase letter. The wrapper function aluxtys terminates
‘with an error message ifan error is encountered.
When describing the source code that is presented in the text, we always refer to the
lowest-level function being called (e.g., sem_post) and not the wrapper faction
(eg, Sem_post). Similarly the index aluays refers to the lowest level function
being called, ard not the wrapper functions.
‘The format of the source code just shown is used throughout the text. Each nonbank line Is
‘numbered. The text describing portions ofthe code begins wath the starting and ending line
‘numbers in the left margin. Sometimes the pargraph is preceded by a short descriptive bold
heading, providing a summary statement of the code being described,
‘The hozantal rules at the beginning and end of the code fragment specify the source code
filename: the file wrapuntx.c in the directory Lib fr this example. Since the source code fot
all the examples in the text is Freely available (se the Preface), you can locate the appropriate
source file. Compiling, runing, and especially modifying these programs while rading this
toe is an excellent way to learn the concepts of interprocesscommunictions.
Although these wrapper functions might not seem like a big savings, when we dis-
cuss threads in Chapter 7, we will find that the thread functions do not set the standard
Unix errno variable when an error occurs; instead the errno value is the return value
of the function. This means that every time we call one of the pthread functions, we
must allocate a variable, save the return value in that variable, and then set exzno to
this value before calling our err_sys function (Figure C4). To avoid cluttering the
code with braces, we can use C'S comma operator to combine the assigninent into
errno and the call of err_sys into a single statement, as in the following:12 Introduction Chapter 1
Af ( (m= pehread mitex lock (éndone mutex) } I= 0)
erm =f, erzsys ("pthread mtex_lock error);
Alternately, we could define a new error function that takes the system’s error number
san argument. But we can make this piece of code much easier to read as just
Penread_mutex lock (endone mutex) ;
by defining our own wrapper function, shown in Figure 18.
— ib tore
5 wid rar
126 Penrend mutex tock(pthread mutex t *optr!
iar ¢
128 int.
129 Af ( (m= pthroag_mutex lock(mptri} == 0)
130 return
332 ere_sys(*pehrend mutex lock error"):
333)
lib trapptirendc
Figure L€. Our wrapper function for pthread sutex_lock
With careful C coding, we could une macros instead of functions, providing litle run-time
cfitciency, but these wrapper funchons are rarely, fever, the performance bottleneck of a pro:
fram.
(Our choice of capitalizing the first character of the function name is 2 compromise. Many
cther styles were considered: prefising the fmetion name with an (as done on p. 182 of
[Kernighan and Pike 1983), appending to the fanetion name, and so on, Our style seems
the least distracting whe stil providing e visual indication that sone other function is really
being called
“This technique has the side Benefit of checking for errors from functions whose errr retuens
aren grote: close and ptnread mutex_loee. for ample,
“Throughout the rest of this book, we use these wrapper functions unless we need to
check for an explicit error and handle it in some form other than terminating the pro-
cess. We do not show the source code for all our wrapper functions, but the code is
freely available (see the Preface).
Unix ezrno Value
When an error occurs in a Unix function, the global variable errno is set to a positive
value, indicating the type of error, and the function normally returns -1. Our exr_sys
function looks at the value of errno and prints the corresponding error message string,
(eg, "Resource temporarily unavailable” if errno equals ERGAT).
‘The value of errno is set by a function only if an error occurs. Its value is unde-
fined if the function does not return an error. All the positive error Values are constants
with an albuppercase name beginning with E and are normally defined in theSection 17 Unix Standards 13
WwW
Posix
header. No error has the value of 0
With multiple threads, each thread must have its own errno variable. Providing a
per-thread errno is handled automatically, although this normally requires telling the
compiler that the program being compiled must be reentrant. Specifying something,
like -D_REENTRANT or -D_POSIX_C_SOURCE=1995061 to the compiler is typically
required. Often the header defines errno as a macro that expands into a
function call when REENTRANT is defined, referencing a per-thread copy of the error
variable.
‘Throughout the text, we use phrases of the form “the ma_send function returns
EMSGSIZ®” as shorthand to mean that the function returns an error (typically a return
value of ~1) with errno set to the specified constant.
Unix Standards
Most activity these days with regard to Unix standardization is being done by Posix and
‘The Open Group.
Posix js an acronym for “Portable Operating System Interface.” Posix is not a single
standard, but a family of standards being developed by the Institute for Electrical and
Electronics Engineers, Inc, normally called the IEEE. The Posix standards are also
being adopted as international standards by ISO (the International Organization for
Standardization) and IEC (the International Electrotechnical Commission), called
ISO/IEC. The Posix standards have gone through the following, iterations
‘+ IEEE Std 1008.1-1988 (317 pages) was the first of the Posix standards. It specified
the C language interface into a Univlike kernel covering the following areas: process
primitives (fork, exec, signals, timers), the environment of a process (user IDs, pro-
cess groups), files and directories (all the I/O functions), terminal I/O, the system
databases (password file and group file), and the tar and cpio archive formats.
‘The first Posix standard was a eal use version in 1986 kown a6 “THEEDX” The aime Posts
‘was suggested by Richard Stallman,
‘+ IEEE Std 1003.1-1990 (256 pages) was next and it was also International Standard
ISO/IEC 9945-1: 1990. Minimal changes were made from the 1988 version to the
1950 version. Appended to the title was “Part I: System Application Program Inter-
face (APD [C Language]’ indicating that this standard was the C language API.
+ IEEE Stl 1003.2~1992 was published in two volumes, totaling about 1300 pages, and.
its title contained “Part 2: Shell and Utilities.” This part defines the shell (based on
the System V Bourne shell) and about 100 utilities (programs normally executed
from a shell, from awk and basename to vi and yace). Throughout this text, we
refer to this standare as Posi14 Introduction Chapter 1
+ IEEE Std 1003.1b-1993 (590 pages) was originally known as IEEE P1003.4. This was
an update to the 1003.1-1990 standard to include the realtime extensions developed
by the P1003.4 working group: file synchronization, asynchronous I/O, semaphores,
memory management (nmap and shared memory), execution scheduling, clocks and
timers, and message queues.
‘+ IEEE Std 1008.1, 19% Edition [IEEE 1996] (743 pages) includes 1003.1-1990 (the base
API, 1003.1b-1983 (realtime extensions), 1003.1c-1995 (Pthreads), and 1008. 11-1995
(technical corrections to 1008.1b). This standard is also called ISO/IEC 9945-1: 1996,
‘Three chapters on threads were added, along with additional sections on thread syn-
chronization (mutexes and condition variables), thread scheduling, and synchroniza-
tion scheduling. Throughout this text, we refer to this standard as Posix 1
(Over one-quarter of the 743 pages are an appendix tiled "Rationale and Notes.” This ratio.
rele contain historical information and reasons why certain features were included! or omit
tee, Often he rationale Is as informative asthe official standaxd
‘Unfortunatly the IEEE standards are not freely available on the Internet, Ordering informa-
tion is given in the Bibliography entry for [EEE 1996).
[Note thar semaphores were defines in the rsline standard, separately from mutexes and
condition variables (which were defined in the Pthweads standard), which accounts for some of
the differences that we se in their APIs.
Finally note that rend-write locks are not Ged pert of any Pesix standard. We sey more about
thisin Chapter 8
Sometime in the future, a new version of IEEE Std 1008.1 should be printed to
include the P1003.1g standard, the networking APIs (sockets and XI), which are
described in UNPV1.
‘The Foreword of the 1996 Posix.1 standard states that ISO/IEC 9945 consists of the
following parts:
* Part 1: System application program interface (APD) [C languagel,
+ Part 2: Shell and utilities, and
* Part 3: System administration (under development).
arts 1 and 2 are what we call Posix.1 and Posix.2.
Work on all of the Posix standards continues and it is a moving target for any book
that attempts to cover it. The current status of the various Posix standards is available
from http: //waw.pasc.org/standing/sdit -htma,
‘The Open Group
‘The Open Group was formed in 1996 by the consolidation of the X/Open Company
(founded in 1984) and the Open Software Foundation (OSE, founded in 1988). It is an
international consortium of vendors and end-user customers from industry, govern-
‘ment, and academia. ‘Their standards have gone through the following iterations:Section 1.8 Road Map to IPC Fxamples in the Text 15
‘+ X/Open published the X/Open Portability Guide, Issue 3 (XPG3) in 1989,
‘+ Issue 4 was published in 1992 followed by Issue 4, Version 2 in 1994. This latest ver-
sion was also known as “Spee 1170,” with the magic number 1170 being the sum of
the number of system interfaces (626), the number of headers (70), and the number
of commands (174). The latest name for this set of specifications is the “X/Open Sin-
gle Unix Specification,” although itis also called “Unix 95.
‘+ In March 1997, Version 2 of the Single Unix Specification was announced. Products,
‘conforming to this specification can be called “Unix 98,” which is how we refer to
this specification throughout this text. The number of interfaces required by Unix 98
increases from 1170 to 1434, although for a workstation, this jumps to 3030, because
it includes the CDE (Common Desktop Environment), which in turn requires the X
Window System and the Motif user interface. Details are available in [Josey 19971
and http: / /wan.UNIX-systems .org/version2,
‘Much ofthe Single Unix Specification is frely availabe onthe Intsrnet fom this URL,
Unix Versions and Portability
1.8
Most Unix systems today conform to some version of Posix.1 and Posix.2 We use the
qualifier “some” because as updates to Posix occur (e,, the realtime extensions in 1983
and the Pthreads addition in 1996), vendors take a year or two (sometimes more) to
incorporate these latest changes.
Historically, most Unix systems show either a Berkeley heritage or a System V her
ge, but these differences are slowly disappearing as most vendors adopt the Posix
standards. The main differences still existing deal with system administration, one area
that no Posix standard currently addresses.
‘Throughout this text, we use Solaris 26 and Digital Unix 4.0B for most examples.
The reason is that at the time of this writing (late 1997 to carly 1998), these were the only
two Unix systems that supported System V IPC, Posix IPC, and Posix threads.
Road Map to IPC Examples in the Text
Three patterns of interaction are used predominantly throughout the text to illustrate
various features:
1. File server: a client-server application in which the client sends the server a
pathname and the server returns the contents of that fie to the client
2. Producer-consuimer: one or more threads or processes (producers) place data
into a shared buffer, and one or more threads or processes (consumers) operate
on the data in the shared buffer.6
Introduction Chapter 1
1.9
3. Sequence-number-increment: one or more threads or processes increment a
shared sequence number. Sometimes the sequence number is in a shared file,
and sometimes itis in shared memory.
‘The first example illustrates the various forms of message passing, whereas the other
‘two examples illustrate the various types of synchronization and shared memory.
‘To provide a road map for the different topics that are covered in this text, Figures
1.9, 140, and 1:11 summarize the programs that we develop, and the starting figure
number and page number in which the source code appears.
Summary
IPC has traditionally been a messy area in Unix. Various solutions have been imple-
‘mented, none of which are perfect. Our coverage is divided into four main areas:
1._message passing (pipes, FIFOs, message queues),
2. synchronization (mutexes, condition variables, read-write locks, semaphores),
3, shared memory (anonymous, named), and
4, procedure calls Golaris doors, Sun RFC).
We consider IPC between multiple threads in a single process, and between multiple
processes.
‘The persistence of each type of IPC as either can be process-persistent, kernel-
persistent, or filesystem-persistent, based on how long the IPC object stays in existence.
When choosing the type of IPC to use for a given application, we must be aware of the
persistence of that IPC object.
“Another feature of each type of IPC is its name space: how IPC objects are identified
by the processes and threads that use the IPC object. Some have no name (pipes,
mutexes, condition variables, read-write locks), some have names in the filesystem.
(FIFOs), some have what we describe in Chapter 2 as Posix IPC names, and some have
other types of names (what we describe in Chapter 3 as System V IPC keys or identi-
fiers). Typically, a server creates an IPC object with some name and the clients use that
name to access the IPC
"Throughout the source code in the text, we use the wrapper functions described in
Section 146 to reduce the size of our code, yet still check every function call for an error
return, Our wrapper functions all begin with a capital letter.
‘The IEEE Posix standards—Posix.1 defining the basic C interface to Unix and
Posix.2 defining the standard commands—have been the standards that most vendors
are moving toward. ‘The Posix standards, however, are rapidly being absorbed and
‘expanded by the commercial standards, notably The Open Group's Unix standards,
such as Unix 98,Section 1.9
Summary 17
Figure | Page Daseipion
a
45,
416
468
423
425,
47” Uses to pipes, parent-child
53, | Uses popen and cat
155, | Uses two FIFOs, parent-child
57 | Uses two FIFOs, stand-alone server, unrelated client
{62 | Uses FIFOs, stand-alone iterative server, multiple clients
158 _|_Uses pipe or FIFO: builds records on top of byte stream
6
615 | 144 | Uses one System V mesage queue, multiple cients
620 _| 148 | Uses one System V mavage queue per client, multiple clients
TAI] Uses two System V message queves
TSB | Ss] Uses desenptor passing across a door
igure 19 Different versions ofthe fle server client-server example.
Figure | Fase Dexription|
72 | 102 | Niutox only. multiple prxiucers one costimer
76 _|_165_| Mutexsnd condition Yarabe, multiple producers, one consumer |
1017 | 236) Posix named semaphores, one predacer one cnstimer
1020 | 242 | Posix memory-based semaphores, one producer one conser
1021 | 243. | Poss memory-based semaphores, malliple producers, one consumer
1024 | 246. | Posi memory-hared semaphores, multiple producers muliple consumers
1053 | 254_| Posi memory-based semaphores, one prosuce, ne consumer: nlp busfers
Figure L10_Diferent versions ofthe producar-consumer example
Figure [Fane | Description
91 | 19h | Sqn, nolocking 7
93. | am | Sai infile, cent lacking
912 | 215. | Sein file, flesystem locking using cen
1019 | 239 _| Sei le, Poss names ornaphore locking
1210 312) Seq inmmep shared memory Posx named semaphore Tocking
1212 | 34 | SeqPinmnap shared memory, Pook memory based semaphore locking
1214 316 | SeqPin4aESD anonymons shared memory, Posi named semaphore locking
1215 316 | SeqPinSVRE /dev/zer0 shared merry, Pos named semaphore locking
137 334__ Seq in Posi shared memory, six memory-based semaphore locking
"A34|457 Perfrmance measurement mites locking teteentheads
‘A36 | 489 Performance measurement read-vertelcking between treads
‘839 | (1 | Performance measurement: Posi memory-based gemaphore locking between trends
‘Aa1 | 498 Peformance measurement: Poss nama semaphore locking Deeen tics
‘42 | 494 Peformance measurement: Sym V sernaphore locking between tad
45 | 496 Performance messinement Font record locking between Seas
‘Aa8_| 499 _ Peformance measurement mutex locking between process
Figure 11 Diferent vorsions of the sequence-niumber increment example.18 Introduction Chapter 1
Exercises
41 In Figure 1.1 we show two processes accessing a single file. If both processes are just
appending new data to the end of the file (a log file perhaps), what kind of synchronization
required?
112. Look at your systems header and sce how it defines errno,
113. Update Figure 15 by noting the features supported by the Unbx systems that you use,24
22
Posix IPC
Introduction
‘The three types of IPC,
* Posix message queues (Chapter 5),
* Posix semaphores (Chapter 10), and
* Posix shared memory (Chapter 13)
are collectively referred to as “Posix IPC." ‘They share some similarities in the functions
that access them, and in the information that describes them. This chapter describes all
these common properties: the pathnames used for identification, the flaps specified
when opening or creating, and the access permissions.
A summary of their functions is shown in Figure 2.1.
IPC Names
In Figure 1.4, we noted that the three types of Posix IPC use “Posix IPC names” for their
identification. ‘The first argument to the three functions ma_open, sen_open, and
shm_open is such a name, which may or may not be a real pathname in a filesystem.
All that Posix.1 says about these names is:
Tt must conform to existing rules for pathnames (must consist of at most
PATTH_MAX bytes, including a terminating null byte).
* If it begins with a slash, then different calls to these functions alll reference the
same queue. If it does not begin with a slash, the effeet is implementation
dependent.
1%2 Postx IPC Chapter?
Mest? | Semaphowe ‘Shared
quewss memory
Header “Zuqueue.h> | |
TFuncnonsto create open, ordalete | zq_open | sen_open ‘ehn_open
mactone | semclose | shm unlink
mquntiok | semunlink
sen destroy
‘Ranelions for contol opizations | nq_getater feruncate
| Functions foriPC apeatons | nq_sena | sen wait nap
nqcreceive | sem_crywait | munnap
nqnotity | sen post
sen_getvalue
Figure 21. Summary of Posix IPC functions
‘+ The interpretation of additional slashes in the name is implementation defined.
So, for portability, these names must begin with a slash and must not contain any other
slashes. Unfortunately, these rules are inadequate and lead to portability problems.
Solaris 26 requires the initial slash but forbids any additional slashes. Assuming a
message queue, it then creates three files in /tmp that begin with .€Q. For example, if
the argument to mqopen is /queue.1234, then the three files are
/emp/.HODqueue.1234, /tmp/ -MOLqueue.1234, and /tmp/.MQPqueue.1234.
Digital Unix 4.0B, on the other hand, creates the specified pathname in the filesystem.
The portability problem occurs if we specify a name with only one slash (as the first
character: we must have write permission in that directory, the root directory. For
example, /tmp.1234 abides by the Posix rules and would be OK under Solaris, but
Digital Unix would try to ereate this file, and unless we have write permission in the
root directory, this attempt would fail. If we specify a name of /tmp/test .1234, this
‘will succeed! on all systems that create an actual file with that name (assuming that the
emp ditectory exists and that we have write permission in that directory, which is nor
‘mal for most Unix systems), but fails under Solaris.
‘To avoid these portability problems we should always #define the name in a
header that is easy to change if we move our application to another system.
“This case is one in which the standard ties to be so general Gn this case, the realtime standard
was trying to allow message queue, semaphore, and shared memory implementations all
‘within existing Unix kernels are as stand-alone diskless systers) thatthe standan's seta
isnonportable. Within Posi, thisis called “2 standard way of being nonstandard.”
Posix.1 defines the three macros
5 TYPEISKO uf)
SLTYPETSSEN (buf)
‘Ss TYPEISSHO (buf)Section 22 IPC Names 24
that take a single argument, a pointer to a stat structure, whose contents are filled in
by the fstat, Lstat, or stat functions. These three macros evalvate to a nonzero
value if the specified IPC object (message qucue, semaphore, or shared memory object)
is implemented as a distinct file type and the stat structure references such a file type.
Otherwise, the macros evaluate to 0.
‘Unfortunately, these macros ae of itl use, since there is no guarantee that these three types
Cf IPC are implemented using 2 distinct file type. Under Solaris 26, for example, all three
‘macros always evaluate to 0
Allthe other macros that test for a given file type have names beginning with $_15 and their
single argument is the st_node member of a stat structure. Since these three new mocres
havea different argument, their names were changed to begin with S- TYPEIS.
px_ipc_name Function
Another solution to this portability problem is to define our own function named
px_ipc_name that prefixes the correct directory for the location of Posix IPC names.
‘include “unpipe-b*
char "px ipe_nane(const char *name);
“Thisis the notation we nse fr functions of our ovn throughout this book that are not standard
system functions: the box around the fonction prototype and return valve is dashed. The
Iweader that included at ehe baglening is usually cur unpipe-h header (Figure C.).
‘The name argument should not contain any slashes. For example, the call
PiLipe_nane ("testi")
returns a pointer to the string /testi under Solaris 2.6 or a pointer to the string
/tmp/test1 under Digital Unix 4.08. The memory for the result string is dynamically
allocated and is returned by calling free. Additionally, the environment variable
PX_IPC_NAME can override the default directory.
‘Figure 22 shows our implementation of this function.
‘This may be your fst encounter with snprint , Lots of sting code calls sprinc instead,
but sprint cannot check for overfow of the destination builer snprint, on the other
hand, requires that the second argument be the size of the destination buffer, and this buffer
‘will not be overflowed. Providing mpat that intentionally overflows @ progrem's eprint
‘afer has been used for many years by hackers braking into syste,
enprinct is not yet part of the ANSI C standard buts being considered fora revision of the
‘standard, currenly called COX. Nevertheless, many vendors are providing it 2s patt of the
srandard C brary. We use snprine throughout the text, providing cur own version that
justcals spr ine£ when itis not provided,22 Posix IPC Chapter 2
23
iblpx pe name
T Virelude —“unpipe.ky
2 char *
3 pxipe_nane(const char ‘nane)
ae
5 char ‘dir, *det, *alashs
© it ( (ast = malloc (PATH ¥aK)) == NULL
7 return (NOLL) +
8 /* can override default directory with environnent variable */
9 AE ¢ (air = getenv(*PX IPC NAME") == NULL) (
20 #itdet POSIX IPC_PREFIX
a Gir = POSTKLTFCL_PREFTX: /+ from config.h” */
a2 tel:
a air = *yeepysy (+ aotaule +7
16 fendi
3}
16 J+ dix must end ina slash */
17 slash = (irfarzien(aiz) — == 174) 2a
1 arprintf(det, PATHAK, “Se¥ets", dix, elash, name):
19 return (ast): /* caller can free() this pointer */
20}
—lib|px ipe namee
Figure22 Ourpx_ipe_nane function,
Creating and Opening IPC Channels
‘The three functions that create or open an IPC object, ma_open, ser_open, and
shn_open, all take a second argument named oflag that specifies how to open the
requested object. This is similar to the second argument to the standard open function.
‘The various constants that can be combined to form this argument are shown in Fig
ure 23.
Description aLopen
readonly ©. RON
veriteonly ecient
read-write Lr
‘create fit does not already exist | 0. CRERT
exclusive create xc
nonblocking mode ‘©-NONBLOGE
truncate fit already exits
Figure 23. Various constants when opening or eating a Posix PC object
“The first three rows specify how the object is being opened: read-only, write-only, or
read-write, A message queue can be opened in any of the three modes, whereas noneSection 23
Creating and Opening IPC Channels 23
of these three constants is specified for a semaphore (read and write access is required
for any semaphore operation), and a shared memory object cannot be opened write
only.
‘The remaining 0_a.xx flags in Figure 23 are optional.
O_CREAT
O_EXCL.
Create the message quene, semaphore, or shared memory object if it
does not already exist. (Also see the O_EXCL flag, which is
described shortly.)
When creating a new message queue, semaphore, or shared mem-
ory object at least one additional argument is required, called mode
‘This argument specifies the permission bits and is formed as the bit-
wise-OR of the constants shown in Figure 2.4.
Constant | Description
‘SEKUSK | user read
SLEWUSR | user verite
‘SLIRGEP | group read
souwcee | groupwite
S_EROTH | other road
simon | other write
Figure 24 me constants when a new LPC objec is created.
‘These constants are defined in the header. The
specified permission bits are modified by the file mode crentiom mask
of the process, which can be set by calling the umask function
(pp. 83-85 of APUE) or by using the shell's unask command,
‘As with a newly created file, when a new message queue,
semaphore, or shared memory object is created, the user ID is set to
the effective user ID of the process. The group ID of a semaphore or
shared memory object is set to the effective group ID of the process,
or to a system default group ID. The group 1D of a new message
queue is set to the effective group ID of the process. (Pages 77-78 of
APUE talk more about the user and group IDs.)
“This difference in the sting ofthe group ID between the dee types of Posie
IPC is strange. The group ID of a new fle eeated by open is ether the effec
tive proup ID ofthe processor the group TD of the directory i Which the file is
created, But the IPC Functions carmot assume that a pathname inthe filesystem
fn eented for an IPC cbt
If this flag and 0_CREAT are both specified, then the function creates
a new message queue, semaphore, or shared memory object only if
it does not already exist. If it already exists, and if 0_CREAT |
(0_EXCL is specified, an error of EEXTST is returned,4
Posix IPC
Chapter 2
‘The check for the existence of the message queue, semaphore, ot
shared memory object and its creation (if it does not already exist)
must be atomic with regard to other processes. We will see two simi-
lar flags fox System V IPC in Section 3.4.
O_NONBLOCK This flag makes a message queue nonblocking with regard to a read
o_RUNC
on an empty queue or a write to a full queue. We talk about this
more with thema_receive and nq_send functions in Section 5.4.
If an existing shared memory object is opened read-write, this flag
specifies that the object be truncated to 0 length.
Figure 25 shows the actual logic flow for opening an IPC object. We describe what we
mean by the test of the access permissions in Section 2.4. Another way of looking at
x
start hee create new object
nd oe
mownciar Os,
ets yes
oe) i a
already exist? (CLERERT set? errno © BYOENT
ie
‘ae both CREAT | ‘error return,
and C_EXCL set? pee errno = FEXTST
ee
sa
aes
eae ae
eee |
OK
Figure 25. Logic for opening or erating an TPC: object.
aqaquned [Op deencteit | Ooetakedy ene
Tospecalfage | nor ereno = TORT | OK referees sing jee
cma ‘OK cree new etjct) | OK, references ensting cbt
cucntat | orice. | OKcrateerewatget | erorercno = EEUST |
Figure 2.6 Logie for creating or opening an IPC object:Section 2.4 IPC Permissions 25
Note that in the middie line of Figure 26, the 0. CREA flag without ©_EXCL, we do not
get an indication whether a new entry has been created or whether we are referencing,
an existing entry.
24 IPC Permissions
A new message queue, named semaphore, or shared memory object is created by
ma_open, pem_open, or shn_open when the «flag argument contains the O_CREAT
flag. As noted in Figure 24, permission bits are associated with each of these forms of
IPC, similar to the permission bits associated with a Unix file.
When an existing message queue, semaphore, or shared memory object is opened
by these same three functions (either ©_CREA? is not specified, or 0_CRER? is specified
without 0_EXCL and the object already exists), permission testing is performed based.
the permission bits assigned to the IPC object when it was created,
2. the type of access being requested (0_RDONLY, ©_WRONLY, or O_RDWR), and
3. the effective user ID of the calling process, the effective group ID of the calling
‘process, and the supplementary group IDs of the process (if supported),
‘The tests performed by most Unix kernels are as follows:
1. If the effective user ID of the process is 0 (the superuser), access is allowed.
2. If the effective user ID of the process equals the owner ID of the IPC object: if the
appropriate user access permission bit is set, access is allowed, else access is
denied.
By appropriate access permission bit, we mean if the process is opening the IPC
object for reading, the user-read bit must be on. If the process is opening the
IPC object for writing, the user-write bit must be on.
3. If the effective group ID of the process or one of the supplementary group IDs of
the process equals the group ID of the IPC object: if the appropriate group
access permission bit is set, access is allowed, else permission is denicd.
4, If the appropriate other access permission bit is set, access is allowed, else per-
mission is denied.
‘These four steps are tried in sequence in the order listed. Therefore, if the process owns
the IPC object (step 2), then access is granted or denied based only on the user access
permissions—the group permissions are never considered. Similarly, if the process
does not own the IPC object, but the process belongs to an appropriate group, then
access is granted or denied based only on the group access permissions—the other per-
missions are not considered.26 Posix IPC Chopter 2
25
‘We note from Figure 23 that 26=_open does not ust the G_RDOWEY, 0_WRONY, of O_RDE
Rag, We note in Section 102, hensever that some Unix implemeniations assume ©_KLWR, since
fan) useof a semaphore involves reading and vritng the somaphore value
Summary
‘The three types of Posix IPC—message queues, semaphores, and shared memory—are
identified by pathnames. But these may ot may not be real pathnames in the filesystem,
and this discrepancy can be @ portability problem, The solution that we employ
throughout the text isto use our own px_ipc_name function
‘When an IPC object is created or opened, we specify a set of flags that are similar to
those for the open function. When a new IPC object is created, we must specily the per-
missions for the new object, using the same S_xxx constants that are used with open
(Figure 24). When an existing IPC object is opened, the permission testing that is per-
formed is the same as when an existing file is opened.
Exercises
21 In what way do the setuser-ID and set-groupID bits (ection 44 of APUB) of a program
that uses Posis IPC affect the permission testing described in Section 2.47
22 When a program opens a Posix IPC object, how can it determine whether a new object was
created or whether It is referencing an existing object?34
System V IPC
Introduction
‘The three types of IPC,
‘+ System V message queues (Chapter 6),
‘+ System V semaphores (Chapter 11), and
‘+ System V shared memory (Chapter 14)
are collectively referred to as “System V IPC” This term is commonly used for these
three IPC facilities, acknowledging their heritege from System V Unix. ‘They share
many similarities in the functions that access them, and in the information that the ker-
nel maintains on them. This chapter describes all these common properties.
‘A summary of their functions is shown in Figure 3.1.
Wesone = Shared
i gueuss | Semapbores | memory
Feder ~oys/nag b> | |
Funetion fo Galo oF OPER negget sengot shrek
Fonction for antral operations | meget ‘ence saree
‘Functions for IRC operations negend ‘sen0p ‘Shnat_
mogrev shade
igure 34. Summary of Syntem V IPC functions
Information on the design and development of the System V IPC functions is hard to find
IRochkind 1985] provides the following information: Sytem V message ques, semaphor,
land shared momory were developed in the late 1970s at 2 branch laboratory of Bell
2D28
System V IPC Chapter 3
32
Laboratories in Columbus, Ohio, for an internal version of Unix called (not surprisingly)
"Columbus Unix” of ust "CB Unix.” This version of Unix was used for “Operation Support
Systoms,” transaction processing eystems that automated tulephane company administration
land recordeeping, System V IPC was added to the commercial Unix system with System V
round 1988,
key_t Keys and ftok Function
In Figure 1.4, the three types of System V IPC are noted as using key_t values for their
names. ‘The header defines the key_t datatype, as an integer, nor
‘mally at least a 32-bit integer. These integer values are normally assigned by the Etok
function.
“The function £tok converts an existing pathname and an integer identifier into a
key_t value (called an IPC key).
Finclude
key_t ftok(const char *pufnome, int id):
Returns IPC key OK, -1 on error
‘This function takes information derived from the pathname and the low-order 8 bits of
id, and combines them into an integer IPC key.
This function assumes that for a given application using System V IPC, the server
and dlients all agree on 2 single pathname that has some meaning to the application. It
could be the pathname of the server daemon, the pathname of a common data file used
by the server, or some other pathname on the system. If the client and server need only
single IPC channel between them, an id of one, say, can be used. If multiple IPC chan-
nels are needed, say one from the client to the server and another from the server to the
dlient, then one channel can use an id of one, and the other an id of two, for example.
Once the pathname and id are agreed on by the client and server, then both can call the
tok function to convert these into the same IPC key.
‘Typical implementations of £tok call the stat. function and then combine
1. information about the filesystem on which pathname resides (the st_dev mem-
ber of the stat structure),
2. the file's i-node number within the filesystem (the st_ino member of the stat
structure), and
3. the low-order § bits of the id.
‘The combination of these three values normally produces a 32-bit key. No guarantee
exists that two different pathnames combined with the same, id generate different keys,
because the number of bits of information in the three items just listed (filesystem iden-
tifier, inode, and id) can be greater than the number of bits in an integer. (See Exer-
cise3.5.)Section 32 key_t Keys and ftok Function 29
‘The node sumber is never 0,50 most implementations define 7PC_PRIVATE (which we
everbe in Section 3.4) 0 be.
If the pathname does not exist, or is not accessible to the calling process, ftok
retums ~1. Be aware that the file whose pathname is used to generate the key must not
be a file that is created and deleted by the server during its existence, since each time it
is created, it can assume a new inode number that can change the key retumed by
£tok to the next caller.
Example
‘The program in Figure 3.2 takes a pathname as a command-line argument, calls stat,
calls £tok, and then prints the st_dev and st_ino members of the st.at: structure,
and the resulting IPC key. ‘These three values are printed in hexadecimal, so we can eas-
ily see how the IPC key is constructed from these two values and our id of 0x57.
ae er svipelfokc
T finclde —rwmpipes hr ms
2 int
3 nain(int arge, char **argv)
ac
5 struct stat stat:
6 Sf farge t= 2)
7 ere_quit(‘usage: ftck ");
8 Statlergviil, estat):
9 print#(sst_dev: Fix, et inor the, Key: xine,
10 (Gong) stat-st dev, (wlong) stat.st_ino,
n Feoke(arav(1], 0x57)
22 exittons
at
2 svipe(fcke
Figure 32 Obtain and print flesystem information and resulting TPC key
Executing this under Solaris 2.6 gives us the following:
solaris % ftok /otc/ayaten
stdev: 600016, st_ine: dalb, key: 57018a1b
solaris % ftok /usr/tmp
se _dev: 600015, st_ine: 10:78, key: S7025b78
solarie 4 ftok /home/rstevens/Kat1.out
st dev: S00O1f, st_ino: S03, key: 5701fb03
Apparently, the id is in the upper 8 bils, the low-order 12 bits of st_dev in the mext
12 bits, and the low-order 12 bits of st_ino in the low-order 12 bits.
‘Our purpose in showing this example is not to let us count on this combination of
information to form the IPC key, but to let us see how one implementation combines the
pathname and id. Other implementations may do this differently.
FreeBSD uses the lower 8 bits ofthe i, the lower 8 bits of et_dev, and the lower 16 bts of
st ino.0
3.3
34
System V IPC Chapters
[Note that the mapping done by Ft0k is one-way, since some bits from st dev and st_ine
‘are not used. Thats, given 2 lay, we cannot determine the pathname that ws used to Geet
the hy
ipc_perm Structure
‘The kernel maintains a structure of information for each IPC object, similar to the infor-
mation it maintains for files.
struct 1pe_perm ¢
wide uid) /+ omer‘a user ie */
gidt gid) /* omer" group id */
uit cuid; —/* creator's user id *
ict cgi) /* creator's group 1a */
modet mode: /* read-wrive permissions */
vlorg_t seg; /* slot usage sequence number *
keyt key: /* IPC key */
b
‘This structure, and other manifest constants for the System V IPC functions, are defined.
in the header. We talk about all the members of this structure in this
chapter.
Creating and Opening IPC Channels
The three get XXX functions that create or open an IPC object (Figure 3.1) all take an
IPC key value, whose type is key_t, and return an integer identifier. This identifier is
not the same as the id argument to the Etok function, as We sce shortly. An application
has two choices for the key value that is the first argument to the three get XXX func
tions:
1. call £tok, passing ita pathname and id, or
2. specify a key of TEC_PRIVATE, which guarantees that a new, unique LPC object
is created
‘The sequence of steps is shown in Figure 3.3.
-eeae satire! peor) Léey
a Amerrsn Frsacti0, megsndi), megrew’
efrec_ersvare [Serger () [28 Menten, cercen (| senop0)
pj ekaget |) letect LO, shnat (), stat
opeat ore ‘occas IPC cane
IC channel
Figure 3:3. Ganerating IPC identiiors from IPC keys.Section 34 Creating and Opening IPC Channels 31
All three get XXX functions (Figure 3.1) also take an oflag argument that specifies the
xead-write permission bits (the mode member of the ipe_perm structure) for the IPC
‘object, and whether a new IPC object is being created or an existing one is being refer-
enced. ‘The rules for whether a new IPC object is created or whether an existing one is
referenced ate as follows:
‘+ Specifying a key of TPC_PRIVATE guarantees that a unique IPC object is created.
No combinations of pathname and id exist that cause Etok to generate a key value
of 'PC_PRIVATE,
+ Setting the TEC_CREAT bit of the oflag argument creates a now ontry for the
specified key, if it does not already exist. If an existing entry is found, that entry
is retumed.
‘+ Setting both the TPC_CREA and TPC_EXCL bits of the ofiag argument creates a
new entry for the specified key, only if the entry does not already exist. If an
‘existing entry is found, an error of BEXTS' is returned, since the IPC object
already exists,
“The combination of TPC_CREAT and TPC_EXCL with regard to IPC objects is
similar to the combination of O_CREAT and 0_EXCL with regard to the open.
function.
Setting the 1PC_EXCL bit, without setting the TPC_CREAT bit, has no meaning.
‘The actual logic flow for opening an IPC object is shown in Figure 34. Figure 35 shows
another way of looking at Figure 34.
Note that in the middle line of Figure 3.5, the rPc_CREAY flag without 1Pc_EXCL,
we do not get an indication whether a new entry has been created or whether we ate
referencing an existing entry. In most applications, the server creates the IPC object and
specifies either IPC_CREAT Gf it does not care whether the object already exists) or
TPC_CREAT | IPC_EXCE (if it needs to check whether the object already exists). The
clients specify neither fag (assuming that the server has already created the object).
‘The System V IPC functions defie thelr own TPC_xx constants, instead of wsing the
(0-CRENT and ©_EXCE constants that are used by the standard open function along with the
‘Posie IPC functions Figure 23)
[Also note that the System V IPC fuetions combine their TACs constants with tho permit~
‘sion bits (hich we desenbe in the next section) into a single ofl argument, The open func-
tion aking with the Posix IPC functions have ane argument named flag that specifies the
various ©-792 flags, and another argument named me that specifies the permission bts.32 System V IC
Chapter 3
ox
cater ety
sre rer sete
tro
1
ance} 2p) system tal Be eeala
=i "| Puen Maa errno = ENOSPC
¥
se ny
7 ye
‘seated t
docskryaleady exit? |p| ree creanaet? [Be | erent,
ye
— 4
webeih 2c GT | ye, arorsctum,
snd ute Exch? exine HST
—
ouye
iereed
weiheaces | no, aati,
permis OK? extn 2 nantes
pe
ox
storm Westie
Figure34. Logic fer creating or opening an IPC objec
ofa argument Tay dossnok oxsk Te alendy exis
‘no special age ‘error, errno = ENOEIP | OK, references existing object
1PC_CREAT OK creates new entry | OK, references existing object
1eC_cREAT | TRCLExCL | OKjeretesnewenty | enorermno = EEXISt
Figure 3.3. Logic for creating or opening an IPC chante.
3.5 IPC Permissions
Whenever a new IPC object is created using one of the get XXX functions with the
‘IPC_CREAT flag, the following information is saved in the ipc_perm structure (Gec-
ion 33):
1, Some of the bits in the oflag argument initialize the mode member of the
ipc_perm structure. Figure 5.6 shows the permission bits for the three different
IPC mechanisms. (The notation >> 3 means the value is right shifted 3 bits.)Section 35
IPC Permissions 33,
Symbolic valoes
Numesie | hiesage |g Shared
end | gant | Semaphore | Rinery | Desarption
0400 | HSER ‘SER ‘Sine ead by user
0200 | msc. SBCA su write by user
(0060 | MSGR >> 3 | SHUR >> 3 | SHUR >> 3 | ready group
o020_| MGW >> 3 | SECA >> 3 | sian >> 3 | wnteby group
0008 [MSGR >> 6 | SRLR >> 6 | SHLR >> 6 | ready athers
002 | rscw >> 6 | sma >> 6 | semcw >> 6 | write by others
Figure 46 mode values for IPC read-waite permissions
2 The two members cuié and cgid are set to the effective user ID and effective
group ID of the calling process, respectively. These two members are called the
‘creator IDs.
3. The two members uid and gid in the ipc_perm structure are also set to the
effective user ID and effective group ID of the calling process. These two mem-
bers are called the cwner IDs.
‘The creator IDs never change, although a process can change the owner IDs by calling
the ct 1XXX function for the IPC mechanism with a command of T°C_Se7, The three
ct1XXX functions also allow a process to change the permission bits of the mode mem-
ber for the IPC object.
Most implementations define the si constants MSG_R, MSG, SEXLR, SLA, SHOLR, and
SHH shown in Figure 24 in the , , and headers
But these are not required by Uni 98. The sulin nin S74 _p stands for “alte
‘The three get200X fonctions do not use the normal Unix fle made ction mask. The permis
sions of the message queve, semaphore, or shared memory segment ate set to exactly What the
function specifies
Posi IPC does not let the creator of an IPC object change the owner, Nothing i lke the
ZTEC_SE® command with Posin IPC. Buti the Posix IPC name ie sted inthe flesystem, then
the superuser can change the owner using the chown command,
‘Two levels of checking are done whenever an IPC object is accessed by any process,
once when the IPC object is opened (the getXXX function) and then each time the IPC
object is used:
1. Whenever a process establishes access to an existing IPC object with one of the
get XXX functions, an initial check is made that the caller's oflag argument does
not specify any access bits that are not in the mode member of the ipc_perm
structure. This is the bottom box in Figure 34. For example, a server process
can set the mode member for its input message queue so that the group-read
and other-read permission bits are off. Any process that tries to specify an offag
argument that includes these bits gets an error return from the nsgget function.
But this test that is done by the get XXX functions is of little use. It implies that4
System V IFC Chapter
3.6
the caller knows which permission category it falls into—user, group, or other.
If the creator specifically turns off certain permission bits, and if the caller speci-
fies these bits, the error is detected by the get XXX function. Any process, how
ever, can totally bypass this check by just specifying an oflag argument of 0 if it
knows that the IPC object already exists.
2. Every IPC operation does a permission test for the process using the operation.
For example, every time a process tries to put a message onto a message queue
with the msgsnd function, the following tests are performed in the order listed.
As soon as a test grants access, no further tests are performed.
a. The superuser is always granted access,
b. If the effective user ID equals either the uid value or the cui value for the
IRC object, and if the appropriate access bit is on in the mode member for the
IRC object, permission is granted. By “appropriate access bit,” we mean the
read-bit must be set if the caller wants to do a read operation on the IPC
object (receiving a message from a message queue, for example), or the
write-bit must be set for a write operation.
©. If the effective group ID equals either the gid value or the cuid value for
the IPC object, and if the appropriate access bit is on in the mode member for
the IPC object, permission is granted.
d. Hnone of the above tests are true, the appropriate “other” access bit must be
on in the node member for the IPC object, for permission to be allowed.
Identifier Reuse
‘The ipe_perm structure Gection 3.3) also contains a variable named seq, which is a
slot usage sequence number. This is a counter that is maintained by the kernel for every
ppotential IPC object in the system. Every time an IPC object is removed, the kernel
increments the slot number, cycling it back to zero when it overflows.
What we ae describing i this section i he common SVR implementation. ‘This implmen-
tation techniques not mandated by Unix 8
“This counter is needed for two reasons. First, consider the file descriptors main-
tained by the kernel for open files. They are small integers, but have meaning only
within a single process—they are process-specific values. If we try to read from file
descriptor 4, say, in a process, this approach works only if that process has a file open on
this descriptor. It has no meaning whatsoever for a file that might be open on file
descriptor 4 in some other unrelated process. System V IPC identifiets, however, are
systenrwice and not process-specifc.
‘We obtain an IPC identifier (similar to a file descriptor) from one of the get: func-
tions: mscget, senget, and shmget. These identifiers are also integers, but their
meaning applies to all processes. If two unrelated processes, a client and server, for
example, use a single message queue, the message queue identifier returned by theSection 3.6 Identifier Reuse 35
msgget function must be the same integer value in both processes in order to access the
same message queue. This feature means that a rogue process could try to read a mes-
sage from some other application's message queue by trying different small integer
identifiers, hoping to find one that is currently in use that allows world read acoess. If
the potential values for these identifiers were small integers (like file descriptors), then
the probability of finding a valid identifier would be about I in 50 (assuming a maxi-
‘mum of about 50 descriptors per process).
‘To avoid this problem, the designers of these IPC facilities decided to increase the
possible range of identifier values to include all integers, not just small integers. This
increase is implemented by incrementing the identifier value that is returned to the call-
ing process, by the number of IPC table entries, each time a table entry is reused. For
example, ifthe system is configured for a maximum of 50 message queues, then the first
time the first message queue table entry in the kernel is used, the identifier returned to
the process is zero. After this message queue is removed and the first table entry is
reused, the identifier returned is 50. The next time, the identifier is 100, and so on.
Since seq is often implemented as an unsigned long integer (see the ipc_perm struc-
ture shown in Section 3.3), it cycles after the table entry has been used 85,899,346 times
(2"/50, assuming 32-bit long integers).
‘A second reason for incrementing the slot usage sequence number is to avoid short
term reuse of the System V IPC identifiers. ‘This helps ensure that a server that prema-
‘urely terminates and is then restarted, does not reuse an identifier
‘As an example of this feature, the program in Figure 3.7 prints the first 10 identifier
values returned by mecoet.
sumsg/stote
7 Finchads Sawipech ae
2 ine
3 main(int axge, char *rargv)
ac
S tnt 4, maghas
6 for (= 0, 4 < a0) Aes)
7 oqid = Negget (10°C PRIVATE, SVMSG_NEOE | TPC_CREAT)
& prince (negia ~ tain", regia):
° Magctl (oagid, ERC_SMID, NULL):
10 >
a exit(ory
ws
2 z sumgislotc
FFignce 37 Print ere! asigned message queue identifier 10 times ina row.
Each time around the loop msgget creates a message queue, and then msgct1 with a
command of TPC_rue1D deletes the queue. The constant SVHSG_MODE is defined in our
unpipe.h header (Figure C.1) and specifies our default permission bits for a System V
message queue. The program’s output is
solaris ¥ olot
Fegid = 0
said = 5036 System V IPC Chapter 3
37
38
150
200
280
= 300,
aso
400
negid = 450
If we run the program again, we sce that this slot usage sequence number is a kernel
variable that persists between processes.
eolaris @ eee
regia = 500
nogia = 550
nsqid = 600
nagid = 650
Bagis = 200
regia = 750
regia = 800
pogid = 650
regia = 900
nogid = 950
ipes and ipcrm Programs
Since the three types of System V IPC are not identified by pathnames in the filesystem,
‘we cannot look at them or remove them using the standard 1s and xm programs.
Instead, two special programs are provided by any system that implements these types
of IPC: ipes, which prints various pieces of information about the System V IPC fea-
tures, and ipcr, which removes a System V message queue, semaphore set, or shared
memory segment. ‘The former supports about a dozen command-line options, which
affect which of the three types of IPC is reported and what information is output, and
the latter supports six command-line options. Consult your manual pages for the
details of all these options,
Since System V IPC isnot pat of Posi, these two commands are not standardized by Posix2.
But these tivo commands are part of Unix 9,
Kernel Limits
Most implementations of System V IPC have inherent kernel limits, such as the maxi-
mum number of message queues and the maximum number of semaphores per
semaphore set. We show some typical values for these limits in Figures 6.25, 11.9, and
145. These limits are often derived from the original System V implementation.
Section 11.2 of [Bach 1986] and Chapter 8 of [Goodheart and Cox 1984) both describe the
System V implementation of messages, semaphores, and shared memory. Some ofthese its
are described therein.Section 3.8 Kemel Limits 37
Unfortunately, these kernel limits are often too small, because many are derived
from their original implementation on a small address system (the 16-bit PDP-11). For-
tunately, most systems allow the administrator to change some or all of these default
limits, but the required steps are different for each flavor of Unix. Most require reboot-
ing the nanning kernel after changing the values. Unfortunately, some implementations
still use 16-bit integers for some of the limits, providing a hard limit that cannot be
excreded,
Solaris 2.6, for example, has 20 of these limits. Their current values are printed by
the sysde£ command, although the values are printed as 0 if the corresponding kernel
module has not been loaded (i.e,, the facility has not yet been uscd). “These may be
‘changed by placing any of the following statements in the /etc/system file, which is
read when the kernel bootstraps.
det mogsys:neginfo_negsea = ralie
magsys:neginto_megees = value
fet magsya:naginte_megeal © tulue
Set nagsys:neginfo_magiap = tule
"t egsys:neginfo_negrax - rahe
Set megeyerneginfo negenh - male
pet megsys:nsginfo regmi = whe
fet sensys:eeninfo_senopa = vile
et ss =
eet = eae
fet = tale
set
et
et
set sensys:ceninfo_sertn
set shnays:ehninto sein = value
fet shnsys:chninfo_shasea = value
eet ohmaye:ehninto_ehemase = vale
Set shnaya:abninfo_sbrani = tele
‘The last six characters of the name on the left-hand side of the equals sign are the vari-
ables listed in Figures 6.25, 11.9, and 145.
With Digital Unix 4.0B, the sysconfig program can query or modify many kernel
parameters and limits, Here is the output of this program with the -a option, which.
queries the kernel for the current limits, for the ipc subsystem. We have omitted some
lines unrelated to the System V LPC facility.
aipha ¢ /sbin/syscontig -a ipe
iver
reg-max = 9192
meg-nnb = 16384
6a
40
4150204
‘
aH38 System VPC Chapter
3.9
fennel = 25
en-ops = 10
eenmaem = 16384
rom-of-sens = 60
Different defaults for these parameters can be specified in the /etc/syscontigtab
file, which should be maintained using the sysconfigd> program. ‘This file is read
when the system bootstraps.
‘Summary
‘The first argument to the three functions, msgvet, senget, and shnget, is a System V
IPC key. These keys are normally created from a pathname using the system's £tok
function. The key can also be the special value of IPC_PRIVATE. ‘These three functions
create a new IPC object or open an existing IPC object and return a System V IPC identi-
fier: an integer that is then used to identify the object to the remaining IPC functions.
‘These integers are not per-process identifiers (like descriptors) but are systemwide iden-
tifiers. These identifiers are also reused by the kernel after some time.
‘Associated with every System V IPC object is an ipc_pern structure that contains
information such as the owner's user ID, group ID, read-write permissions, and so on.
‘One difference between Posix IPC and System V IPC is that this information is always,
available for a System V IPC object (by calling one of the three XXXct functions with
an argument of TPC_STAT), but access to this information for a Posix IPC object
depends on the implementation. If the Posix IPC object is stored in the filesystem, and
if we know its name in the filesystem, then we can access this same information using
the existing filesystem tools.
When a new System V IPC object is created or an existing object is opened, two
flags are specified to the GetXXX function (IPC_CREAT and IPC_EXCL), combined
with nine permission bits.
‘Undoubtedly, the biggest problem in using System V IPC is that most implementa
tions have artifical kernel limits on the sizes of these objects, and these Limits date back
to their original implementation. These mean that most applications that make heavy
use of System V IPC require that the system administrator modily these kernel limits,
and accomplishing this change differs for each flavor of Unix.
Exercises
3.1 Read about the msgeti function in Section 65 and modify the program in Figure 3.7 to
print the ceq member of the {pe_peva structure in addition to the assigned identifier.Chapter
Exercises 38
33
34
35
36
Immediately after running the program in Figure 3, we run a program that creates two
message queues. Assuming no other message queues have been used by any other applica
tions since the kernel was booted, what two values are returned by the kernel as the mes-
‘sage queue identifiers?
‘We noted in Section 35 that the System V IPC getXXX functions do not use the fle mode
‘creation mask Write a test program that creates @ FIFO (using the alk£ifo function
described in Section 4.6) and a System V message queue, specifying a permission of (octal)
(666 for both. Compare the permissions of the resulling FIFO and message queue. Make
certain your shell umask value is nonzero before running this progra:n
A server wants to create a unique message queue for its clients. Which is preferable—using.
‘some constant pathname (say the server’s executable file) as an argument to ftok, oF using
IPC_PRIVATE?
“Modify Figure 32 to print just the IPC key and pathname. Run the £ira program to print
all the pathnames on your system and run the output through the program just modified.
How many pathname’ map te the same key?
{your system supports the sax program (“systam activity reporter”), run the command
sar -n 56
‘This prints the number of message queue operations per second and the number of
semaphore operations per second, sampled every 5 seconds, 6 times.Part 2
Message Passing44
42
Pipes and FIFOs
Introduction
Pipes are the original form of Unix IPC, dating back to the Third Edition of Unix in 1973
[Salus 1994], Although useful for many operations, their fundamental limitation is that
they have no name, and can therefore be used only by related processes. This was cor
rected in System III Unix (1982) with the addition of FIFOs, sometimes called named
pipes. Both pipes and FIFOs are accessed using the normal reac and write functions.
‘Technically pipes canbe used between tnrlated processes, given the aby to pass desrip-
tors between processes (which we describe in Section 158 ofthis text a well a Section 1427 0
[UNDv1). But for practical purposes, pipes are normally uscd between processes that ave 8
‘This chapter describes the creation and use of pipes and FIFOs. We use a simple file
server example and also look at some client-server design issues: how many IPC chan-
nels are needed, iterative versus concurrent servers, and byte streams versus message
interfaces.
A Simple Client-Server Example
‘The client-server example shown in Figure 4.1 is used throughout this chapter and
Chapter 6 to illustrate pipes, FIFOs, and System V message queues
‘The client reads a pathname from the standard input and writes it to the IPC chan-
nel. The server reads this pathname from the IPC channel and tries to open the file for
reading. If the server can open the file, the server responds by reading the file and writ-
ing it to the IPC channel; otherwise, the server responds with an error message. The
343
Pipes and FIROs Chapter
stain
ore ee eee fC)
cremor message ‘or eror message
Figuread. Client-server example.
Client then reads from the IPC channel, writing what it receives to the standard output.
If the file cannot be read by the server, the client reads an error message from the IPC
channel. Otherwise, the client reads the contents of the file. ‘The two dashed lines
between the client and server in Figure 4.1 are the IPC channel.
Pipes
Pipes are provided with all favors of Unix. A pipe is created by the pipe function and
provides a one-way (unidirectional) flow of data.
pinclude
int pipetint (A207
Retums: 030K, —t on ercor
‘Two file descriptors are returned: fi{0], which is open for reading, and fif1], which is
open for writing.
Some versions of Unix, notably SVRS, provide fellcuplex pipes, in which ease, both ends are
vail for tending and writing. Anther way to cate a fulllupiex IPC channel is with the
ocketaiy function, described in Section 14.3 of UNPUI, and this works on most current
Unix systems, The most common use of pipes, however s withthe various shells in wich
‘oe, «hall duplex pipe is adequate
Posy tan Unix 96 reuire only all-uplex pipes, and weassumo soin this chapter.
“The $_TSPTFO macto can be used to determine if a descriptor or file is either a pipe
or a FIFO. Its single argument is the st_mode member of the stat structure and the
‘acto evaluates to true (nonzero) or false (0). For a pipe, this structure is filled in by the
Estat function For a FIFO, this structure is filled in by the fstat, Lstat, or stat
functions.
Figure 42 shows how a pipe looks in a single process.
Although a pipe is created by one process, itis rarely used within a single process.
(We show an example of a pipe within a single process in Figure 5.14) Pipes are typi-
cally used to communicate between two different processes (a parent and child) in the
following way. First, a process (which will be the parent) creates a pipe and then forks
to create a copy of itself, as shown in Figure 4.3.— oe
ey
Sow of dia
Figure 2 A pipeina single proces.
Figure 43 Pipeina single process, immediately after fork.
Next, the parent process closes the read end of one pipe, and the child process closes the
write end of that same pipe, This provides a one-way flow of data between the two pro-
cesses, as shown in Figure 4.4,
perent hil
{filo} 0
5
Figureaa_ Pipeberweon two processes
‘When we enter a command such as
who | sort | Ip
to a Unix shell, the shell performs the steps described previously to create three46
Pipes and FIFOs ‘Chapter 4
Processes with two pipes between them. The shell also duplicates the read end of each
Pipe (o standard input and the write end of each pipe to standard output. We show this
Pipeline in Figure 45.
eho process sort process tp process
=> £5. £
All the pipes shown so far have been half duplex or unidirectional, providing a one-
way flow of data only. When a two-way flow of data is desired, we must create tyro
Pipes and use one for each direction. The actual steps ate as follows:
1. create pipe 1 (fa1/0] anc fii{1)), create pipe 2 (f2{0] and fi2{1,
2. fork,
3. parent closes read end of pipe 1 (1/0),
4. parent closes write end of pipe 2(fa2/1),
5. child closes write end of pipe 1 (1), and
6. child closes read end of pipe 2 (4210).
‘We show the code for these steps in Figure 48. This generates the pipe arrangement
shown in Figure 46,
eid
sie]
2007
Clow of dame
Figure 46 Two pipes to provides bidirectional ow of data,Section 43 Pipes 47
Example
Let us now implement the client-server example described in Section 42 using pipes.
‘The main function creates two pipes and forks child. The client then runs in the par-
‘ent process and the server runs in the chile! process. The first pipe is used to send the
pathname from the client to the server, and the second pipe is used to send the contents,
Of that file (or an error message) from the server to the client. This setup gives us the
arrangement shown in Figure 47.
stdin Pant rare ou
Pethrane — |
alent server se
econ tear Mleconents orem
oreror mange or or MeSOE
Figure 47. Implementation of Figuze 41 using 90 pipes
Realize that in this figure we show the two pipes connecting the two processes, but each,
Pipe goes through the kernel, as shown previously in Figure 4.6. Therefore, each byte of
data from the client to the server, and vice versa, crosses the user-kernel interface twice:
‘once when written to the pipe, and again when read from the pipe.
Figure 48 shows our main function for this example.
“TD Winelude —vunpine. 5 ee
2 void —cliene(ane, anc), sexver(ant, nt);
3 in
4 nain(int sree, char “army
st
© Sek pipet 21, pipert21y
7 pide Cnilépid;
8 Pipewwizer), J exeace two pipes */
$8 Pipetipe2):
10 AE ( (ehilepta = Forko} == 0) ¢/* entia +7
a Close (pipel(21)7
2 (lose (pipe20)
a server(pipel(0), pipe2(11);
au eae):
~%
46 /* parent */
17 chose tptpen (01)
18 Close (pipe2i2);
19 client (pipe2(o}, piven ttn
20 Waiepie(enitépid, NULL, 0); /* wait for child te texminate */
2 enti)
zm)
pipe/mainpipee
Figure 48 oan function for client-server using to pipes.48 Pipes and FIROs Chapter 4
Create pipes, fork
‘Two pipes are created and the six steps that we listed with Figure 4.6 are performed.
The parent calls the client function (Figure 49) and the child calls the server func
tion (Figure 4.10).
waitpid for child
‘The server (the child) terminates first, when it calls exit after writing the final data
to the pipe. It then becomes a zombie: a process that has terminated, but whose parent is
still running but has not yet waited for the child. When the child terminates, the kernel
also generates STGCHLD signal for the parent, but the parent does not catch this signal,
and the default action of this signal is to be ignored. Shortly thereafter, the parent's
client. function returns after reading the final data from the pipe. The parent then
calls waitpid to fetch the termination status of the terminated child (the zombie).
the parent did not call wai tpi, but just terminated, the child would be inherited by
the init process, and another SIGCHLD signal would be sent to the init process,
‘which would then fetch the termination status of the zombie.
‘The client function is shown in Figure 49.
——pipejeliente
7 vinchude -unpipesn Pipe
2 vole
3 client (int read, int writeta)
at
5 size. len;
© ssizet a
7 char bute (Maneanel ;
® J read pathname */
3 Feets(butf, BAXLINE, tein) ;
10 len = strien(buff) (* fgets() guarantees null byte at end */
ML AF @ugeQien 1) == *\n/)
2 lens (7 gelete newline fren fgets) */
B /* write pathnane to IFC channel */
id Write(writetd, butt, Lend;
1s J" xeud from TPC, write te standard cuteut */
3 while ( (r= Raad(reaata, bute, NAXLINE)) > 0)
a Weite(sToou? stim, Bate, nd;
ae
piprtectiontc
Figure 49 clicnt function for elent-server wsing two pipes.
Read pathname from standard input
‘The pathname is read from standard input and written to the pipe, after deleting
the newline that is stored by fgets.
Copy from pipe to standard output
‘The client then reads everything that the server writes to the pipe, writing it toSection 4.3 Pipes 49
1e23
standard output. Normally this is the contents of the file, but if the specified pathname
cannot be opened, what the server returns is an error message.
Figure 4.10 shows the server function.
7 Finclade -unpipe.h arsine a
2 void
3 merver (int readfé, int writefay
ac
5 int Eas
6 asazet n;
7 char DUfE (MARLENE + 117
a (/* road pathnane fron IPC channel */
9 Af ( (n= Road(readta, butt, MAXLINE]) == 0)
10 lerr_quit(*end-of-f1le while reading pathname"):
a1 betetn) = *\0"7 /* null terninate pathnane */
124 ( (fa = opem(mutt, ©_RDoNLY)) = 0) (
B [+ error: mst Cell client */
a4 suprinti(iutt +n, sizeof (buff) - x, “: can't open, s\n".
as strezrer(errno));
46 serton (bust
av Write(writefd, butt. nds
ie) else (
19 1 open succeeded: copy file to TPC channel */
20 white ( tn = Read(£a, EUEE, MAXLINE)) > 0)
a Write(writefa, buff, n};
2 closet ta) ;
ashe)
2a)
ie server
Figure 410. server function for client-server using two pipes.
Read pathname from pipe
The pathname written by the client is read from the pipe and null terminated. Note
that a read on a pipe returns as soon a3 some data is present; it need not wait for the
requested number of bytes (MAXLINE in this example).
pen file, handle error
The file is opened for reading, and if an error occurs, an error message string is
retumed to the client across the pipe. We call the strerror funetion to return the error
message string corresponding to errno. (Pages 690-691 of UNPvI talk more about the
strerror function.)
Copy file to pine
Ifthe open succeeds, the contents of the file are copied to the pipe.
We can see the output from the program when the pathname is OK, and when an
error occurs.50
Pipes and FIFOs Chapter 4
44
solaris ® malnpipe
Jetc/inet intp-cont file consisting of fc Hines
multicastelient 224.0.1.2
Grifcesle /ete/inet/ntp.arite
solaris ® malnpipe
Jee/shadow file we cannot reat
Jete/shaden: can't open, Peraission denied
solaris # matmpipe
jno/euch/file 4 nonexistent file
Joo/such/File: can't open, Ne such file or directory
Full-Duplex Pipes
‘We mentioned in the previous section that some systems provide full-duplex pipes:
SVR4’s pipe function and the socketpair function provided by many kernels. But
what exactly does a full-duplex pipe provide? First, we can think of a half-duplex pipe
as shown in Figure 4.11, a modification of Figure 42, which omits the process.
try 8S alttupln pipe >|} *S8 eft
Figure4.11 Half'duplex pipe
A full-duplex pipe could be implemented as shown in Figure 4.12. This implies that
only one buffer exists for the pipe and everything written to the pipe (on either descrip
tor) gets appended to the buffer and any read from the pipe (on either descriptor) just
takes data from the front of the buffer.
eek (peaeeaeeerecraseette | leeeat
a =} satduplex fe pa
ae read st write du
Figure 412. One posible incorec implementation ofa fll duplex pipe.
‘The problem with this implementation becomes apparent in a program such as Fig-
ure A29. We want two-way communication but we need two independent data
streams, one in each direction. Otherwise, when a process writes data to the full-duplex.
pipe and then turns around and issues a read on that pipe, it could read back what it
Just wrote.
Figure 4.13 shows the actual implementation of a full-duplex pipe.
pan EE aE SS ay
ea {Le mitcpierpipe gree
[Figure 413. Actual implementation of full duplex pipe
Here, the full-duplex pipe is constructed from two half-duplex pipes. Anything written,Section 4.4 FullDuplex Pipes 51
to ff1] will be available for reading by fill0], and anything written to fi{0} will be avail-
able for reading by fit
‘The program in Figure 4.14 demonstrates that we can use a single full-duplex pipe
for two-way commonication.
T Finelude —"unpipe. hr ipeftuplexe
2 Ane
3 main(int argc, char ‘tergv)
ac
Sant rata. mi
6 cher cr
7 plat chiiapta:
s Pipettes J agounes a ful1-duplex pipe (e.g. svt) */
9 48 ( (chinepia = Forkin) == 0) C/* emt #7
10 sleep}:
n SE ( (nw ReaQifat0l, ge. 10) t=
a err_quit ("childs read returned 84", n);
3 printf (venild read ee\nt, ¢)7
Fr Weicecratol, -c. 10
5 exit (0)
wo}
” J+ parent */
48 mriteceatt), spr. 11:
19 $f (in = Reaa(fali}, ee, 1 I=
20 err_quit (parent: read returned at, ml;
21 prinef(-parert read tein", 17
22 exitioy
2 P
* $$ pipetftopexe
Figure 414. Testa full-duplex pipe for two-way communicsion.
We create a full-duplex pipe and fork. The parent writes the character p to the
pipe, and then reads a character from the pipe. The child sleeps for 3 seconds, reads a
character from the pipe, and then writes the character c to the pipe. ‘The purpose of the
sleep in the child is to allow the parent to call read before the child can call read, to see
whether the parent reads back what it wrote,
If we run this program under Solaris 2.6, which provides full-duplex pipes, we
observe the desired behavior.
solaris ¢ féuplex
child reas p
parent read ¢
‘The character p goes across the half-duplex pipe shown in the top of Figure 4.13, and
the character c goes across the half-duplex pipe shown in the bottom of Figure 4.13.
‘The parent does not read back what it wrote (the character p).
If we run this program under Digital Unix 4.0B, which by default provides half
duplex pipes (it also provides full-duplex pipes like SVR&, if different options are speci-
fied at compile time), we see the expected behavior of a half-duplex pipe.2
Pipes and FIFOs Chapter 4
45
‘alpha + sauplex
read error: Sad file miner
aipha # cniad read p
weite error: bad file number
‘The parent writes the character p, which the child reads, but then the parent aborts
when it tries to read from fif1J, and the child aborts when it tries to write to flO!
(recall Figure 4.11). The error returned by read is FRADE, which means that the
descriptor is not open for reading, Similarly, write returns the same etror if its
descriptor is not open for writing,
popen and pclose Functions
As another example of pipes, the standard I/O library provides the poper: function that
creates a pipe and initiates another process that either reeds from the pipe or writes to
the pipe,
Minclude
FILE ‘*pepen(const char ‘command, const char “type):
Returns: le pointer if OK, nutt.on error
int pelese (FILE ‘stream) ;
Returns: termination satus of shell or “1 on error
command isa shell command line. It is processed by the sh program (normally a Bourne
shell), $0 the PATH environment variable is used to locate the command. A pipe is cre-
ated between the calling process and the specified command. The value returned by
popen is a standatd I/O FILE pointer that is used for either input or output, depend-
ing on the character string type.
‘+ le typeis x, the calling process reads the standard output of the commana
+ Tetype is.w, the calling process writes to the standard input of the command.
‘The pcLose function closes a standard I/O stream that was created by popen, waits
for the command to terminate, and then returns the termination status of the shell.
‘Section 14.8 of APUE provides an implementation of poper and pelese.
Example
Figure 4.15 shows another solution to our client-server example using the popen func-
tion and the Unix cat: program.Section 4.5 open and pclose Functions 53
T Winelude vonpipe Feta
3 main(int aruc, char **argv)
ac
3 sizet ny
6 char bueerter.tne), command (ec.tme)
7 FILE tp;
e /* reed pathnane */
9 Focte(butl, MOXLINE, stdin);
30 n= strlon(bors); 1 fyota() guarantees null kyte at ond */
31 Gf dbutetn - 1) ae Any
2 nea /* delete newline from fgets) */
13° anprint#(ccomand, sizeof (comand). “cat =", buff)
14 Ep = Popentconmand, *r");
as 7? copy fron pipe to standard output */
Ae white (egets(outf, MAXLINE, fp) != MULL)
7 puts (buf, staour);
1e— Petosettp)
19 exits
204 -
Pipe/maixpopene
Figure 4.5 Client-server using popen,
‘The pathname is read from standard input, as in Figure 4.9. A command is built
and passed to popen. The output from either the shell or the cat program is copied to
standard output.
(One difference between this implementation and the implementation in Figure 4.8
is that now we are dependent on the error message generated by the system's cat. pro-
gram, which is often inadequate. For example, under Solaris 2.6, we get the following
‘error when trying to read a file that we do not have permission to read:
solaris cat /ete/shadow
cats cannot open /etc/ shadow
But under BSD/OS 3.1, we get a more descriptive ertor when trying to read a similar
file:
Ded * cat saec/mancer pase
eat: /ete/aster.passwd: canmct open [Permission denied]
Also realize that the call to popen succeeds in such a case, but fgets just returns an
end-of-file the first time itis called. The cat: program writes its error message to stan-
dard error, and popen does nothing special with it—only standard output is redirected
to the pipe that it creates.54 Pipes and FIFOs Chapter 4
46
FIFOs
Pipes have no names, and their biggest disadvantage is that they can be used only
between processes that have a parent process in common. Two unrelated processes can-
not create a pipe between them and use it for IPC (ignoring descriptor passing).
FIFO stands for first in, first out, and a Unix FIFO is similar to a pipe. Itis a one-way
(half-duplex) flow of data. But unlike pipes, a FIFO has a pathname associated with it,
allowing unrelated processes to access a single FIFO. FIFOs are also called name pipes.
A FIFO is created by the mk#ifo function,
include
Ant mkEifo(const char ‘pathname, medat male)
Returns: Of OK, -1 on error
‘The pathname isa normal Unix pathname, and this is the name of the FIFO.
‘The miode argument specifies the file permission bits, similar to the second argument
to open. Figure 24 shows the six constants from the header used to
specify these bits for a FIFO.
‘The mk£ fo function implies 0_CREAT | 0 EXCL. That is, it creates a new FIFO or
retums an error of EEXIS? if the named FIFO already exists. If the creation of a new.
FIFO is not desired, call open instead of wkEifo. To open an existing FIFO or create a
new FIFO if it does not already exist, call mk#i fo, check for an error of #EXIST, and if,
this occurs, call open instead
‘The mk fifo command also creates a FIFO. This can be used from shell scripts or
from the command line.
‘Once a FIFO is created, it must be opened for reading or writing, using either the
open function, or one of the standard I/O open functions such as foren. A FIFO must
be opened either read-only or write-only. It mast not be opened for read-write, because
28 FIFO is half-duplex.
‘A write toa pipe or FIFO always appends the data, and a read always returns
hat is at the beginning of the pipe or FIFO. If Lseek is called for a pipe or FIFO, the
error ESPIPS is returned.
Example
We now redo our client-server from Figure 4.8 to use two FIFOs instead of two pipes.
Our client and server functions remain the same; all that changes is the main. func
tion, which we showin Figure 4.16,
1 ¥inelude — "onpipe.h” a cae iprclnee tc
2 Aeefine FIFoL */tmp/tito.1
43 #éefine FIFO? */tmp/tito.2°
4 void client(int, int), ge
jer(int, int;Section 4.6 FIFOs 55
5 ine
6 main(int arge, char *arsyy
70
6 int reacted, writefa;
S pidt chilepiay
10 [> create tuo FIROs; OK if they alreagy exist */
11 Af ((ekELfo(PTFOL, FILE-MODE) < 0) && (errno I= EXTST))
2 err_eys("can't create ts", FIFOL}:
33 Af (mkFifo(erro2, FILE_MCDE) < 0) Gu (erro [= EEXIST)) (
ad uplink(e1e0r) ;
rey erz_sys("can't create $s", FIFO);
1)
37 Af ( (ehilepid = Fork() = 0) ¢ _/* enila */
18 readtd = Open(FIFOL, O.REOMLY, 0)7
19 weitefa = Open(PIFOZ, © HRONLY, 0);
20 server (readta, weiteta);
aL exit (0)?
2)
23 J parent */
24 © writefa ~ Open(FIFOL, 0 WAOMLY, 0);
23 reacfe = Open(FIFO2, 0 RDGULY, 0);
26 client (readté. writefal;
27 Waitpidichilepia, aULL, 0); /* walt for child to terminate */
28 © Close(readsa)
25 Chonw(urieate)
30 Cnlink(rrFo1) ;
31 Unlink(rreo2)
32 exie(oy:
33)
Pipelminffoe
Figure 4.16 air function for our client-server that uses two FIFOs.
Create two FIFOs
‘Two FIFOs are created in the /tnp filesystem. If the FIFOs already exis, that is OK.
‘The FILE_NODE constant is defined in our unpipe.h header (Figure C.1) as
fefine FILE MODE (S_IRUSR | S_IWUSR |S IRGRP | s_rmoTm)
J @efault perissions for new flee */
‘This allows user-read, user-write, group-read, and other-read. ‘These permission bits are
‘modified by the fle mode creation mask of the process.
fork
We call fork, the child calls our server function (Figure 4.10), and the parent calls,
our client function Figure 4.9). Before executing these calls, the parent opens the first
FIFO for writing and the second FIFO for reading, and the child opens the first FIFO for
reading and the second FIFO for writing, This is similar to our pipe example, and Fig-
ure 4.17 shows this arrangement.56 Pipes and FIFOs Chapter 4
parent hit
sort} — vite
Jemp/£ito.1
HIROT
Stawordaa
/emp/tito.2
FIFO2 be
Towordma =
Figure 427 Client-server example using two FIFOs,
‘The changes from our pipe example to this FIFO example are as follows:
* To create and open a pipe requires one call to pipe. To create and open a FIFO
requires one call to mk£ i fo followed by a call to open.
* A pipe automatically disappears on its last close. A FIFO’s name is deleted from
the filesystem only by calling un}ink.
The benefit in the extra calls required for the FIFO is that a FIFO has a name in the file
system allowing one process to create a FIFO and another unrelated process to open the
FIFO. This is not possible with a pipe.
Subtle problems can occur with programs that do not tse FIFOs correctly. Consider
Figure 4.16: if we swap the order of the two calls to open in the parent, the program.
does not work, The reason is that the open of a FIFO for reading blocks if no process
currently has the FIFO open for writing. If we swap the order of these two opens in the
parent, both the parent and the child are opening a FIFO for reading when no process
has the FIFO open for writing, so both block. This is called a deadlock. We discuss this
scenario in the next section.
Example: Unrelated Client and Server
In Figure 4.16, the client and server are still related processes. But we can redo this
example with the dient and server unrelated. Figure 4.18 shows the server program.
‘This program is nearly identical to the server portion of Figure 4.16.
‘The header £3 f0.h is shown in Figure 4.19 and provides the definitions of the two
FIFO names, which both the client and server must know.
Figure 4.20 shows the client program, which is nearly identical to the client portion
of Figure 4.16. Notice that the client, not the server, deletes the FIFOs when done,
because the client performs the last operation on the FIFOs.Section 46 FIFOs 57
1 Hincluce °fifo.b ie server_maine
2voia —server(ine, int);
3 ine
4 main(int arge, char *targv)
5¢
6 Ant readfd, writesdy
7 1% cxeate two FIFOs; OK if they alzeady exist */
& Af ((mkFigo(wxPoL, FILE voDE) < 0) bk (errno 1= FEXIST))
3 err_sys("can‘t create ts", PIFOL);
104 ((@KELEOUFTFO2, PILE MODE)’ < 0) && (exeno I= EEKTST)) (
a unl irk (FEFOL) +
12 erx_sys(*can't create ts", FIFO2);
fai 9
14 reaefa = Open(FIFOL, O_RDOULY, 0);
15 writefa = Open(FIFO2, _HRONLY, 0);
16 server{readfé, writefay:
17 exte(o)
18)
ipejserver_main.e
Figure 416 Stand-alone server mat function.
he
T Hinelude -unpipe. 5 Pept
2 Yewtine FIFOL —*/emp/tito.1~
3 teetine FIFO? */tmp/#ifo.2"
pipelfifohc
Figure 419 £10. header that both the client and server indude
ipelclient_maine
7 Finclade “fifo. Piel
2 void client (int, int);
3 ine
4 main(int arge, char **argv)
5st
6 ane eaata, writeta;
7 writeta = open(FIFOL, o_NREELY, 0);
8 readtd = open (FZFO2, “0 RDORLY, 0)
9 client (raadta, weitetay;
10 Close(readta +
11 Close(uriteta) :
12 Unlink(rtron)
13 Untink(rtro2):
a4 exit ia);
as)
plpefclient_maine
Figure 420 Standalone clint rats function.58 Pipes and FIFOs Chapter 4
Inthe ease ofa pipe cr FIFO, where the kernel Keeps 2 reference count of the number of open
descriptors that ree othe piper FIFO, ether the cic or server could call un ic without
a problem. Even though tis fanetion removes the pathname from the filesystem, this does nat
stot cpen descriptors that had previously opened the pathname, But for ether forms of 1°,
such as System V message queues, no counter evs and Hf theserver wore to delete the uous
tier writing ts ial message to the queue the queue could be gone when the cent te
read the final message
‘Torun this client and server, start the server in the background
& server_tito &
and then start the client, Alternately, we could start only the client and have it invoke
the server by calling fork and then exec. The client could also pass the names of the
‘two FIFOs to the server as command-line arguments through the exec function, instead
of coding them into a header. But this scenario would make the server a child of the
client, in which case, a pipe could just as easily be used
4.7 Additional Properties of Pipes and FIFOs
We need to describe in more detail some properties of pipes and FIFOs with regard to
their opening, reading, and writing, First, a descriptor can be set nonblocking in two
ways.
1. The 0_NONBLOCK flag can be specified when open is called. For example, the
first call to oper: in Figure 420 could be
weitefd = oper(FTPO1, O_WROWEY | o_xeRELCCE, 0);
2. Ifa descriptor is already open, font can be called to enable the O_NONBLOCK
flag, This technique must be used with a pipe, since open is not called for a
pipe, and no way exists to specify the O_NONELOCK flag in the call to pipe.
‘When using font, we first fetch the current file status flags with the F_GETFL
command, bitwise-OR the 0_NONBLOCK flag, and then store the file status flags
with the F_SETPL command:
ane flage,
Ar ( (fags = fenti(#4, FGETFL, 09) < 0)
err_eys (°F GETFL error"):
flags |= 0 -NoNBLOcK;
Af (font (te, P_SETFL, flags) < 0)
ferr_sys("F_SEIFL arzer");
Beware of code that you may encounter that simply sets the desired flag,
‘because this also clears all the other possible file status flags:
[7 wrong wy to set nonblocking */
Af (font (Fe, F_SETEL, o_NeMBLOCE) <0)
‘err_eys(°F_SETPL error");Section 47
Additional Properties of Pipes and FIFOs 58
Figure 4.21 shows the effect of the nonblocking flag for the opening of a FIFO and
for the reading of data from an empty pipe or from an empty FIFO.
empty FIFO | pipear FIFO not | read retumsO dor
Curent | Bxstingopens | Faure
epention | ofpipeor FO Blocking tant S_NORBLECKt
FO euursOK aun OK
even FIFO | open for wating, |
redenly | FIFO not ‘Beds uni FIFOs opener | ratums OR
open for writing | wating Hate
FO ‘ehares OK ‘turns OK
open FIFO | open forreeding
wrte-only FIFO not dks nil FIFO m opened or | wtormsan ero oF TO
openforneading | reading
Pipeor FEO | Blocks untildataisin the pspeor ] stores an oral ESCA
ead pen for wnting | FEO, oruntilehe piper
empty pipe FIFOs no longer open for
o swnting
_open for writing
weitere — | openfor reading
PipeorFIFO [pipe orFiFOuct | STGPIPR generated for thread | SIGPTPE generated for Hira
PipeorTIFO | Geetexd
open for reading
Figure 4.21, Effet of ¢_NeNBCCCK flag on pipes an FIFOS
Note a few additional rules regarding the reading and writing of a pipe or FIFO.
+ Ifwe ask to read more data than is currently available in the pipe or FIFO, only
the available data is returned. We must be prepared to handle a return value
from read that is less than the requested amount.
+ If the number of bytes to write is less than or equal to PIPE_BUF (a Posix limit
that we say more about in Section 4.10, the write is guaranteed to be atomic.
‘This means that if two processes each write to the same pipe or FIFO at about
the same time, either all the data from the first process is written, followed by all.
the data from the second process, or vice versa. The system does not intermix
the data from the two processes. If, however, the number of bytes to write is
greater than PIPE_BUP, there is no guarantee that the write operation is
atomic.
osc requires that BYFE_EUP be at last 512 bytes. Commenly encounters! values
range front 1024 for BSD/O5 41 to 5120 for Slats 26. We show a program in Soc
‘on 4.1 shat prinesthis value.
* The setting of the 0_noNSLOCK flag has no effect on the atomicity of writestoa
pipe or FIFO—atomicity is determined solely by whether the requested number
Of bytes is less than or equal to PIPF_BUP. But when a pipe or FIFO is set non-
blocking, the return value from wri te depends on the number of bytes to write«
Pipes and FIFOs Chapter
48
and the amount of space currently available in the pipe or FIFO. If the number
of bytes to write is less than or equal to PIPE_Et
a. Ifthere is room in the pipe or FIFO for the requested number of bytes, all the
bytes are transferred.
b. If there is not enough room in the pipe or FIFO for the requested number of,
bytes, return is made immediately with an error of ERGAIN. Since the
©_NONBLOCK flag is set, the process docs not want to be put to sleep. But the
kernel cannot accept part of the data and still guarantee an atomic wei te, so
the kernel must return an error and tell the process to try again later.
If the number of bytes to wri te is greater than PEPE_BUE:
If there is room for at least 1 byte in the pipe or FIFO, the kemel transfers
whatever the pipe or FIFO can hold, and that is the return value from
write,
b. If the pipe or FIFO is full, return is made immediately with an error of
EAGAIN.
+ If we write toa pipe or FIFO that is not open for reading, the STGPIPE signal
is generated:
a. If the process does not catch or ignore STGPIPE, the default action of termi-
nating the process is taken.
b. If the process ignores the STGPTPE signal, or if it catches the signal and
returns from its signal handler, then write returns an error of EPTPE.
[STOPIEE is considered a synchronous signal thet is, signa atibutable to one
specific thread, the ape that called weite, But the easest way to handle this
signal i to ignore it (et its disposttion to SIC_IGN) and let write return an
frror of EPZPE. An application should always detect an ecror return from
torte, but detecting the termination ofa process by STGPTEE is harder. If the
Sgmal isnot caught, we must look atthe termination status ofthe process from
the shell to determine thatthe proces as Killed by’ a Sigal, and which signal
‘Section 5.13 of UNPVI talks more about STCPTPE,
One Server, Multiple Clients
The real advantage of a FIFO is when the server is a long-running process (e.g., a dae-
mon, as described in Chapter 12 of UNPv1) thet is unrelated to the ctient. The daemon
creates a FIFO with a well-known pathname, opens the FIFO for reading, and the client
then starts at some later time, opens the FIFO for writing, and sends its commands or
whatever to the daemon through the FIFO. One-way communication of this form
(client to server) is easy with a FIFO, but it becomes harder if the daemon needs to send.
something back to the client, Figure 4.22 shows the technique that we use with our
‘example.
‘The server creates a FIFO with a well-known pathname, /tmp/£ifo.sery in this
ample. The server will read client requests from this FIFO. Each client creates its own
FIFO when it staris, with a pathname containing its process ID. Each client writes itsSection 48 One Server, Multiple Clients 61
tay
[eegTETtowery a
Pewp/ #1001234 semprerco.9576
FIFO
sdeny
tient 2
PIDs
Figure 422 One server, multiple clients
request to the server's well-known FIFO, and the request contains the client process ID
along with the pathname of the file that the client wants the server to open and send to
the client.
Figure 4.23 shows the server program.
(Create well-known FIFO and open for read-only and write-only
118 The server's well-known FIFO is created, and it is OK if it already exists. We then
open the FIFO twice, once read-only and once write-only. The reacf ifo descriptor is,
used to read each client request that arrives at the FIFO, but the dunmy£d descriptor is
never used. The reason for opening the FIFO for writing can be seen in Figure 4.21. If
we do not open the FIFO for writing, then each time a client terminates, the FIFO
becomes empty and the server's read returns 0 to indicate an end-offile. We would
then have to close the FIFO and call open again with the 0_RDONLY flag, and this will
block until the next client request arrives. But if we always have a descriptor for the
FIFO that was opened for writing, reac will never retum () to indicate an end-of file
when no clients exist. Instead, our server will just block in the call to read, waiting for
the next client request. This trick therefore simplifies our server code and reduces the
number of calls to open for its well-known FIFO.
‘When the server starts, the first open (with the 0_RDONLY flag) blocks until the first
0) (
” Af Guffin = 1] w= in)
18 S (7 delete newline from readline() */
2 buff (r] = "V0"; 7 null terminate pathnane */
20 AE ( (ptr = strchrtbuff, © 1)) 22 mmm
a ferr_msg("bogus request: ts", buff)
22 continue;
23 )
24 wperts = 0; (/* cull terminate PID, ptr = pathname */
25 pid ~ atol (buff);
26 snprintt (fiforane, sizeof (titonane), */trp/tito.€1é", (long) pid):
2 AE ( (writefifo = open(tifonane, O_WRONLY, 0)) <0) {
28 ‘err_nsg(*cannot open: ts", £1 fenamwe
23 continue;
30 d
a AE ( (fd = opentptr, © RDGELY)) < 0) ¢
a2 Js error: must tel) client */
3 suprintf (cuff +n, sizeof (buff) - n, *: can’t open, @s\n*,
36 strerror (erro) };
a5 f= atrlenipts};
36 write (writetito, ptr, n):
37 Close (uriteeiso :
38 detec ¢
39 /* open succeeded: copy file to FIFO */
40 while ( (9 = Reed(fa, buff, VAXLINE)) > 0)
a Write(writefife, ‘buff, 2);
2 chose ta) ;
a Chose (writerite):
44 ,
40
460
ffoctiserof mainservere
Figure 425 FIFO server that handles mulipe cients.Section 4.8 (One Server, Multiple Clients 63
250
Parse client's request
‘The newline that is normally retuned by readline is deleted. This newline is
missing only if the buffer was filled before the newline was encountered, or ifthe final
line of input was not terminated by a newline. The strchr function returns pointer
to the first blank in the line, and pex is incremented to point to the first character of the
pathname that follows. The pathname of the client's FIFO is constructed from the pro-
cess ID, and the FIFO is opened for write-only by the server.
Open file for client, send file to client's FIFO
‘The remainder of the server is similar to our server function from Figure 4.10.
The file is opened and if this fails, an error message is returned to the client across the
FIFO. If the open succeeds, the file is copied to the client's FIFO. When done, we must
close the server's end of the client’s FIFO, which causes the client's read to return 0
(end-offile). The server does not delete the client's FIFO; the client must do so after it
reads the end-offile from the server.
We show the client program in Figure 4.24
Create FIFO
The client's FIFO is created with the process ID as the final part of the pathname,
Build client request line
‘The client’s request consists of its process ID, one blank, the pathname for the server
to send to the client, and a newline. This line is built in the array buff, reading the
pathname from the standard input.
Open server's FIFO and write request
‘The server's FIFO is opened and the request is written to the FIFO. If this client is,
the first to open this FIFO since the server was started, then this open unblocks the
server from its call fo open (with the 0_RDONLY flag).
Read file contents or error message from server
The server's reply is read from the FIFO and written to standard output. The
lient’s FIFO is then closed and deleted.
We can start our server in one window and run the client in another window, and it
works as expected. We show only the client interaction.
solaris 9 mainelient
Jotc/ahadow 4 fle we cannot read
yetc/stador: can't open, Permission denied
solaric * mainelient
Jete/inet /ntp.cont 4 2ine file
ulticostclient 226.0.1.1
GrLftElle /ete/inet/ntp drift
We can also interact with the server from the shell, because FIFOs have names in the
filesystem.Pipes and FIFOs Chapters
sero mainchiont.c
T Winclude“Eife.h ‘Aloetiserop
2 ant
3 main(int arce, char **arov)
ae
5 int xeadfifo, writetifor
6 cizet len:
2 asizet n
8 char “ptr, fifonavelMARLINE), buf (MAKLINE];
5 plat pias
10 /* create FIFO with cur PID ae part of none */
pia = getpieny;
12 srprintf(fifonane, sizeof (tifonane), “/tmp/fito.81e", (Leng) pid +
1a GF ((mkeiEo(FSfoname, FILE MODE) < 0) Gk (errno {= FEXIST))
uw err eys(*can’t create 4s", £ifonane) ;
as (+ stare butter with pid ard a blank */
16 snprint# (batt, sizeof (buff), “tld *, (long) pid);
17 len = atxlen(bute)
18 oper = bute + ten:
1 /* read pathnane */
20 Foetetper, MAMLINE ~ len, stein):
21 len = strien(butf); J fgers() guarantees null byte at end */
22 /* oper: FIFO to server and write PID and pathname to FIFO */
23 weitefifo = pen(SERV_PIFO, O.WRONLY, 0) 7
24 Write(weitefife, butt, len):
25 /* now open our FIFO; blocks until server opens for writing */
26 readtifo = open(#ifonane, O_RDONLY, 0);
27 /* read from TEC, write to standara output */
28 while ( (n= Read(readfifo, buff, MAXLINE)) > 0)
29 Welte(STDOUr_FILENO, Butt, n):
30 Close (readtito):
32 Unlink(fitoname) ;
32 exit (ole
3
7 {ifetisero|mainclientc
Figure 4.24. FIFO client that works with the server in Figure 423.
solaris § pides¢ process 1D of his sell
solaris © mk£ifo /tap/Eite. $044 make the eens FIEO
Solaris € echo "Pia /ete/inet /ntp.cont* > /tmp/fito.sery
solaris © cat < /emp/tito.§Pld arn sero’ reply
multicastelient 224.0.1.1
erifttsie /ete/inet tp dei fe
solarie © rm /tmo/fifo.§pia
We send our process ID and pathname to the server with one shell command (ecto)
and read the server's reply with another (cat). Any amount of time can occur between
these two commands, Therefore, the server appears to write the file to the FIFO, and
the client later executes cat to read the data from the FIFO, which might make us thinkSection 48 One Server, Multiple Clients 65
that the data remains in the FIFO somehow, even when no process has the FIFO open.
‘This is not what is happening. Indeed, the rulc is that when the final close of a pipe ot
FIFO occurs, any remaining data in the pipe or FIFO is discarded. What is happening in
‘our shell example is that after the server reacis the request line from the client, the server
blocks in its call to oper: on the client's FIFO, because the client (our shell) has not yet
‘opened the FIFO for reading (recall Figure 4.21). Only when we execute cat sometime
later, which opens the client FIFO for reading, does the server's call to open for this
FIFO return. This timing also leads to a denial-of-seroice attack, which we discuss in the
next section.
Using the shell also allows simple testing of the server’s error handling. We can
easily send a line to the server without a process ID, and we can also send a line to the
server specifying a process ID that does not correspond to a FIFO in the /tmp directory.
For example, if we invoke the client and enter the following lines
solaris € ent > /tmp/tito.eery
fnoferocest
999999 /invalid/process/ié
then the server's output (in another window) is
solaris § server
‘yne/process/id
erp /£it6.999999
Atomicity of FIFO writes
(Our simple client-server also lets us see why the atomicity property of writes to pipes
and FIFOs is important, Assume that two clients send requests at about the same time
to the server. ‘The first client's request is the line
2234 /ete/inot /atp.cont
and the second client’s request is the line
9876 sete /vassw
Ie we assume that cach client issties one write function call for its request line, and that
each line is less than or equal to PTPE_BUF (which is reasonable, since this limit is usu-
ally between 1024 and 5120 and since pathnames are often limited to 1024 bytes), then.
‘we are guaranteed that the data in the FIFO will be either
1234 jeteyinet mep-cont
9876 /ete/passd
9875 /ote/rasewa
1234 /ecesinee /atp-cont
‘The data in the FIFO will not be something like
1234 /ete/anet9876 /ete/panawd
pavp.cont6
Pipes and FIFOs Chapter
FIFOs and NFS
49
FIFOs are a form of IPC that can be used on a single hest. Although FIFOs have names
in the filesystem, they can be used only on local filesystems, and not on NFS-mounted
filesystems,
polaris # mktifo /née/bedi/uer/ratevens/tito.temp
mRfifo: 1/0 excor
In this example, the filesystem /nfs/bsdi /usr is the /usr filesystem on the host
bsdi.
Some systems (e.g,, BSD/OS) do allow FIFOs to be created on an NFS-mounted file-
system, but data cannot be passed between the two systems through one of these FIFOs.
In this scenario, the FIFO would be used only as a rendezvous point in the filesystem
between clients and servers on the same host. A process on one host cannot send data to
2 process on another host through a FIFO, even though both processes may be able to
‘open a FIFO that is accessible to both hosts through NFS.
lterative versus Concurrent Servers
‘The server in our simple example from the preceding section is an iterative server. Ititer-
ates through the client requests, completely handling cach client's request before pro-
‘ceeding to the next client. For example, if two clients each send a request to the server
at about the same time—the first for a 10-megabyte file that takes 10 seconds (say) to
‘send to the client, and the second for a 10-byte file—the second client must wait at least
10 seconds for the first client to be serviced.
‘The alternative is a concurrent server. The most common type of concurrent server
under Unix is called a one-child-per-client server, and it has the server call £oxk to create
a new child each time a client request arrives. ‘The new child handles the client request
to completion, and the multiprogramming features of Unix provide the concurrency of
all the different processes. But there are other techniques that are discussed in detail in
Chapter 27 of UNPVI:
* create a pool of children and service a new client with an idle child,
‘+ create one thread per client, and
* create a pool of threads and service a new client with an idle thread.
Although the discussion in UNP¥1 is for network servers, the same techniques apply to
IPC servers whose clients are on the same host.
Deniabof-Service Attacks
We have already mentioned one problem with an iterative server—some clients must
wait longer than expected because they are in line following other clients with Jonger
roquests—but another problem exists. Recall our shell example following Figure 4.24
and our discussion of how the server blocks in its call to open for the client FIFO if the
0 */
char mesg data [MAXMESGDATA]:
dM
12 seize + nosg_pend(int, struct mesg
13 vod Meeg_send(inc, struct myrosy *)
14 seize_t meeg_recv(iac, struct ayness *)
15 seive.t Meeg_recv(ine, struct aymesg *);
ee Pipemesg|mesg i
Figure 428. Ourmyneeg structure and related definitions,
Each message has a mesg_type, which we define as an integer whose value must be
greater than 0. We ignore the type field for now, but return to it in Chapter 6, when we
describe System V message queues. Each message also has a length, and we allow the
length to be zero. What we are doing, with the mymesg structure is to precede each mes-
sage with its length, instead of using newlines to separate the messages. Earlier, we
mentioned two benefits of this design: the receiver need not scan each received byte
looking for the end of the message, and there is no need to escape the delimiter (a new-
line) ifit appears in the message.
Figure 4.26 shows a picture of the mymesg structure, and how we use it with pipes,
FIFOs, and System V message queues.Section 4.10 Streams and Messages 68
second argument for write and reed
second argument for mogend and negrew
}_—— ses ten ——e|
1
meog_ten | mesg. tyne mesg. data,
Spolem Vineige mage,
used with Systom V message quits,
nagend and megeev functors
‘Our message mymeeat,
‘use with pipes and FIFOs,
‘write and read functions
Figure 4.26 Ourmynosa structure
We define two functions to send and receive messages. Figure 4.27 shows our
mesg_send function, and Figure 4.28 shows our mesg_recv function.
pipemesg mesg. send
I include “neeg.b
2 seize t
3 mesg cond(int £4, struct mymeeg *mper)
at
5 return (write(fa, mptr, mSGHDRSIZE + mptr-smesg lee):
6)
pipemesy/mese sende
Figure (27 meeg_sena function
piemess mesg rec.c
“mong BP
| 2
3 mesg recv(ine fa, struct mmeeg tants)
5 sicet len;
6 eeiset n;
7 J vead message header first, to get len of data that follow */
@ it C (n= Read(ta, mptr, MESGHDRSIZ=)} == 0)
° return (0 7? end of file */
10 else if (n tm MESGHDRSIZE)
a err quit(*message header: expected $4, got 44°, MESGIDRSTZE. nb:
12 Af { Gen = aptr-omesg_ten) > 0)
a SE ( (n= Read(fa, mptr-smacg data, len)) != lent
44 err quit(mereage data: expocted 44, got td", len, n)y
15 return (len)
16}
pees mesgrecoe
Figure 428 seog_recv function70 Pipes and FIFOs Chapter
1k now takes two reads for each message, one to read the length, and another to read
the actual message (ifthe length is greater than 0).
(Careful readers may note that mesg_vecv checks for all possible errors and terminates if one
‘occurs, Nevertheless, we sll define a wrapper function named Yeeg_ recy and calli fom
four programs, for coneistency
We now change our client and server functions to use the mesg_send and
nosg_xecv functions. Figure 4.29 shows our client.
pipemesg cliente
i Hincluds-nesg.h ai
2 vole
43 client (int react, int writera)
ac
5 sizer len:
6 seize t ns
7 struct myrasg mesgy
8 J+ read pathname */
9 Foetislmeng.merg Gate, NAXMESGORTA, stdin}
10 en = strlen(nesg.mesg_data) ;
11 Af (weeg.weeg datalien ~ 2] == "\n')
2 ter 7* delete newline from foots) */
33 mesg.mesg_len = Tent
14 mesg.mesg_type = 1;
a3 /* weite pathnane te TEC channel */
16 Mesg_sond(writoté, Lneeg?
uv (* road fron TPC, write to standard output */
18 while ( (= Mesg_recu(readfa, tnexg)) > 0)
a9 write(sti0Ur FILANO, mesg.nesg_data, 1)
209
ipemesg/clientc
Figure 429 Qur cient function that uses messages
Read pathname, send to server
e-ze The pathname is read from standard input and then sent to the server using
mesg_sené.
Read file's contents or error message from server
vis Theclient calls mesg_recv in a loop, reading everything that the server sends back
By convention, when mesc_recv returns a length of 0, this indicates the end of data
from the server. We will see that the server includes the newline in each message that it
sends to the client, so a blank line will have a message length of 1.
Figure 4.30 shows our server.Section 4.10
Steams and Messages 71
Winches "neeg a Pees eae
void
server (int readfa, int weitefd)
c
ssizet ni
sad pathrane from TPC channel */
snesg.mesg_type = 1:
AE ( (n= Mesa recv(readfa, smesg))
err_quit (*pathnane missing")
nosg.mesg_cataln] = ‘\O"; /* mull terminate patheare */
ou)
0
LE | (fp = fopen(nesg.mesg data, *1"))
(> error: must tell client */
enpeintf(weeg-necg data +n, slzco! (noee.mesg_datal ~ ny
st can't open, S6\n", strerrer (errno!) ;
nesy.oeeg_len = strlen(sexg.netg_datal:
Mesgsend(writeta, enesg) ;
‘
/* fopen eucceeded: copy ile to TEC channel */
while (Fgete(mecg.moeg_data, MAXMESGDATA, fp) I= MULL) (
neag.mesg_len = stvlen(nesg.nesg_data):
Nesg_sendiwritefa, knesg!
de
,
Felose fp
[+ send a O-length message to signify the end */
smecg.mesg_len = 07
Mesg_fend(writets, smeeg):
pipemesg [servere
Figure 4.30 Our server function that uses messages,
Read pathname from IPC channel, open file
‘The pathname is read from the client. Although the assignment of 1 to mesq_type
appears useless (it is overwritten by mesg_recv in Figure 4.28), we call this same func-
tion when using System V message queues (Figure 6.10), in which case, this assignment
is needed (eg,, Figure 6.13). The standard I/O function fopen opens the file, which
differs from Figure 4.10, where we called the Unix 1/O function open to obtain a
descriptor for the file. ‘The reason we call the standard I/O librazy here is to call fgets
to read the file one line at a time, and then send each line to the client as a message.
Copy file to client
19-26
If the call to fopen succeeds, the file is read using fget's and sent to the client, one
line per message. A message with a length of 0 indicates the end of the fie.n
Pipes and FIFOs Chapter
411
When using either pipes or FIFOs, we could also close the IPC channel to notify the
peer that the end of the input file was encountered. We send back a message with a
length of 0, however, because we will encounter other types of IPC that do not have the
concept of an end-oF file.
‘The main functions that call our client and server functions do not change at
all, Wecan use either the pipe version (Figure 4.8) or the FIFO version (Figure 4.16).
Pipe and FIFO Limits
‘The only system-imposed limits on pipes and FIFOs are
OPEN_MAX the maximum number of descriptors open at any time by @ process
(Posix requires that this be at least 16), and
PIPE_BUF the maximum amount of data that can be written to a pipe or FIFO
atomically (we described this in Section 4.7; Posix requires that this be
at least 512).
“The value of OPEN_MAX can be queried by calling the sysconé function, as we show
shortly. It can normally be changed from the shell by executing the ulimit command
(Bourne shell and KornSbell, as we show shortly) or the Limit command (C shel). It
can also be changed from a process by calling the setrlimit function (described in
detail in Section 7.11 of APUE).
‘The value of P1PF_BUF is often defined in the <1 imits -h> header, but it is consid
‘ered a pathname oariable by Posix. This means that its value can differ, depending on the
pathname that is specified (for a FIFO, since pipes do not have names), because differ-
ent pathnames cen end up on different filesystems, and these filesystems might have
Gifferent characteristics. ‘The value can therefore be obtained at run time by calling,
either pathcon or fpathconé. Figure 431 shows an example that prints these two
limits.
T fincas “onpipe-Br Bie pipecon
2 int
3 main(int arge, char argv)
ac
5 Sf (arge t= 2)
é ferr_quit "usage: pipeconf
mgd_t nopen(const char *name, int oflag,
J mede_t mode, gtruct mqlattr ‘atir */ );
Returns: message queue descriptor if OK,-I onenior