0% found this document useful (0 votes)
123 views

Unix Network Programming Volume 2 PDF

Uploaded by

Balaji Venkatesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
123 views

Unix Network Programming Volume 2 PDF

Uploaded by

Balaji Venkatesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 92
UNIX NETWORK PROGRAMMING SECOND EDITION W a = ~, <> \Wellimplemented interpracess communications (IPC) are key to the performance af irwslly every non-trivial UNIX program. In UNIX Network Programming, Moiume 2, Second Edition, legendary UNIK expert W. Richart Stevens presents 8 comprebersive guide to every form of IPG, mluding message passing, synchronization, shared memory, and Remote Procedure Calls (RPC). you've read Stevens’ bes-stling first edition of UNIX Network Programming, this book expands is 1°C coverage by @ factor of five! You won't just. learn about IC "from the outside." You'll acwally create implenientations of Pesix message queues, read-weite locks, and semaphores, gaining an in-depth understanding of these capabilities you simply can't get anywhere else. ‘The book contains extensive new source cade—allcarcfully optimized snd available (on the Web. You'll even find 2 complete guide to meaSuring IPC performance with message passing bandwidth and latency programs, and thread and process synchronization programs. The better you understand IPC, the belter your UNIX software will run, ‘This Look contains all you need to know. ABOUT THE AUTHOR W. RICHARD STEVENS is author of UNIX Network Programming, Virst Ktion, ‘widely recognized as the classic text m UNIX networking and LINK Network Programming, Volume 1, Second dition, Ke is also author of Advanced Programming in the UNIX Environment and the TCPAP ftusirater Series, Stevens {san ackrowledged UNIX and networking expert, sought-after ‘instructor, and occasional consultant, PRENTICE HALL ‘Uppes Sale River. RS 07458 prec ppt cam Function protetype = Dool.t elmt_control (CLIENT “d, unsigned int requsl, char “pln; CLIENT “elmt create (const char *hat, unsigned Long prognum, ‘unsigned long veremum, const char protect); void elnt_deserey (chew cl) ine oor binaline fl ine oor _cata tint fil, door arse *ayp): Ane door exente(Door_server_proe “proc, void *eedie, int all: ine door ered deor_eredt ‘ed : Ant door tato( ine ff, door info *#fo); int door xetura(char ‘ditty, size_t datas, door desc ¢ ‘dowplt, size t mies); ine door revoke (ine fl) Beorereate proc *door_server_creste(Door create proc “proc): int door uabina(voia) 7 old exr dump(const char “fal, 7 void erx_magiconst char “ft, <0: a8 BeleeegRgeeeesiss void aur quiticonst char ‘ft, 3) 512 void exe ret (const char “fat. 01: on vote eer ayalconat char ‘fat, 0: 5 Ine femea (ine ff, ine omd, .. /* struct flock “ag */ Vy 198 int fmeaeint ff, stzvct stat “hyf); 8 key t ftek(conse char *pallneme, ime i) Ea int feruncateline fi, off ¢ lexgtht: a7 int maetowe crgt miles) id int mg_getaterngd e migdes, struct me attr tal: % ine mametléyingd © myles, const struct sigevent *nelfation): 9 nat maepaniconst char ‘namie, int ofiag, (* weet made, struct mqactr *ettr*/ 1+ % ine maunLlakiconst char *name) 7 Tne mage (ine mg, ine ond, aeruct seq ae 7m int maggot (key t fey, Sint ofl) 150 PILE ‘*popaniconst char “commund, const char ype): ls Function prototype Ent pthread cancel (pthread « tid) void permed cleanup peptist cule); void pthread_cleanup push(void (-furcion) (void *), void tang) int pthread create(pthread «via, const pthread attr © all, vota * (fine) (void *), wosd Mag); int pthread detach{pehread t tid): void pthread exte (void *status); int pthread joim(prhresd ¢ Hd, void **etne) pthread ¢ pthread self (void) int pthread condater destroy(pthread condater t “wily; int pthread condattr getpshared(const pthread condattr t veltr, int “vate; int pebreadcondattr init (pthread condattr_t *altr) int pUbread_condattz setpshared (pthread condatert *etir, int vale) int pthread cond broadoast (pthread cond *epr) int pebread cond destroy ipthreedcond_t “ptr; Ane pthvead cond init (pthread cond © “gir, const pthread condattr e ‘altr) int pthread cond atgnal (pthrasd cond t * ere porte 1: 1 PSEC orien dled AL) taystem persistent IPC: Ciisynter > jis tl PCat is ‘explicily deleted Figure 12 Persistence IPC objects 1. A process-persistent IPC object remains in existence until the last process that holds the object open closes the object. Examples are pipes and FIFOs. 2. A kemnel-persistent IPC object remains in existence until the kemel reboots or until the object is explicitly deleted. Examples are System V message queues, semaphores, and shared memory. Posix message queues, semaphores, and shared memory must be at least kemekpersistent, but may be file- system-persistent, depending, on the implementation. Section 14 Name Spaces 7 14 3. A filesystem: persistent IPC object remains in existence until the object is explicitly deleted. The object retains its value even if the kernel reboots. Posix message ‘queues, semaphores, and shared memory have this property, if they are imple- mented using mapped files (not a requirement) ‘We must be careful when defining the persistence of an IPC object because it is not always as it seems, For example, the data within a pipe is maintained within the kernel, but pipes have process persistence and not kernel persistence—after the last process that has the pipe open for reading closes the pipe, the kernel discards all the data and removes the pipe. Similarly, even though FIFOs have names within the filesystem, they. also have process persistence because all the data in a FIFO is discarded after the last process that has the FIFO open closes the FIFO. Figure 13 summarizes the persistence of the IPC objects that we describe in this, text ‘TypeolTeC (| Persistence Tipe process FIFO process Pesix mutex process esx condition variable process esx read-writ lock process sent record locking process Pex meseage queue ered Peeix named setaphore kere esix memory-based semaphore | process ori shared memory Kernel ‘System V mesrage queue ‘kernel System V semaphore ern Sytem V shared memory kernel TCP socket recess) UDP socket process | Unix domain socket procese_| Figure L3 Persistence of various types of IPC objects Note that no type of IPC has filesystem persistence, but we have mentioned that the three types of Posix IPC may, depending on the implementation. Obviously, writing data to a file provides filesystem Persistence, but this is normally not used as a form of IPC. Most forms of IPC are not intended to survive a system reboot, because the pro- cesses do not survive the reboot. Requiring filesystem persistence would probably degrade the performance for a given form of IPC, and a common design goal for IPC is, high performance. Name Spaces ‘When two unrelated processes use some type of IPC to exchange information between themselves; the IPC object must have a name or identifier of some form so that one 8 Introduction Chapter] process (often a server) can create the IPC object and other processes (often one oF more clients) can specify that same IPC object. Pipes do not have names (and therefore cannot be used between unrelated pro- cesses), but FIFOs have a Unix pathname in the filesystem as their identifier (and can therefore be used between unrelated processes). As we move to other forms of IPC in the following chapters, we use additional naming conventions. The set of possible names for a given type of IPC is called its name space. The name space is important, because with all forms of IPC other than plain pipes, the name is how the client and server connect with each other to exchange messages Figure 1.4 summarizes the naming, conventions used by the different forms of IPC. Namne space aentibeation Voix pectic toopncrerste | _alteriPCopened | ‘owe. | UX Tre torame) Sener ae Fro pathname descptor scat Posi atox Goname) | pehread mato tp [ep Postc condition vanable Gone | ‘pthread contre | Rox ron-we lock Gonsme | pthread cvieck-t pt fers wecantlocking athe ‘dseriptor eis Tor meage queue Poser IPC name wet sive 7 Pes remed semaphore Posi PC rare sc poiner cee oxi memory teced semaphore | fo rae pee paieer tt Posi shred Pemory PosixIPC name fescaemi | is cfs Spe Vmessoge qa Key_tley | Syriem VIPC deren > Stern Vsenmphor® rey-they | System VIPCidentfer : Stee V tart mee wevlehey | SytemvirCideniier | | eos ralhame ‘lscripor San RFC program’ version ReChandle TCP socket IP addr & TCP port descriptor |g: . Upped adds £ UDP pert desrptor fe Unix domain socket pathrame | deceriptor ripe [ee Figure 1.4. Name paces for the various forms of IPC. Weealso indicate which forms of IPC are standardized by the 1996 version of Posix.1 and Unix 98, both of which we say more about in Section 1.7. For comparison purposes, we include three types of sockets, which are described in detail in UNPv1. Note that the sockets API (application program interface) is being standardized by the Posix.1g work- ing group and should eventually become part of a future Posix. standard. Even though Posix.1 standardizes semaphores, they are an optional feature. Fig- ture 1.5 summarizes which features are specified by Posix.1 and Unix 98. Each feature is mandatory, not defined, or optional. For the optional features, we specify the name of the constant (eg,, _POSTX_THREADS) that is defined (normally in the header) ifthe feature is supported. Note that Unix 98 is a superset of Posix.1 Section 15 Effect of fork, exec, and exit on IPC Objects 9 15 Type of PC Posie 1586 Tae Pipe ‘mandatory mandatory HO, mandatory mandatory oat POST THREADS mandatory Psx condition variable TPOSTX_THREADS smandatory processshared mutex/CV | POSIX THREAD_PROCESS.suaReD | mandaiory Posixread-write lock (otdefined) mandatory one record locking, smundaiory mandatory asic message queue “POSTH_MGESSAGE_PASSIIG | _KOPER_REALTINE osx somaphores =POSTX_ SEMAPHORES Sore pear: Posi shared memory __POSDE_ SHARED MewoRY_opyects | “xopEN REALTIME System V message queue ‘ot defined Tmandaiony ‘System V semaphore (not defined) mandatory ‘System V shared memory (ot defined) mandatory Doors (ot defined ‘rot defined? ‘sun RPC (ot defined) (notdefined) | map _POSTH_MAFPED_FILES oF ‘mandatory _=OSTY_SHARED_MENORY_OBSECTS Realtime signals POSTX_REALTING SIGNALS | _NOPECREALTINS Figure 15 Avilaility ofthe various forms of TPC. Effect of fork, exec, and exit on IPC Objects We need to understand the effect of the fork, exec, and _exit functions on the v ous forms of IPC that we discuss. (The latter is called by the exit function.) We sum= marize this in Figure 1.6. Most of these features are described later in the text, but we need to make a few points. First, the calling of fork from a multithreaded process becomes messy with regard to unnamed synchronization variables (mutexes, condition variables, readwrite Tocks, and memory-based semaphores). Section 6.1 of [Butenhof 19971 provides the details. We simply note in the table that if these variables reside in shared memory and are created! with the process-shared attribute, then they remain accessible to any thread of any process with access to that shared memory. Second, the three forms of System V IPC have no notion of being open or closed. We will see in Figure 68 and Exercises 11.1 and 14.1 that all we need to know to access these three forms of IPC is an identifier. So these three forms of IPC are available to any process that knows the identifier, although ‘some special handling is indicated for semaphores and shared memory. 10 Introduction Chapter Type oC Tom wee Tae Pipes Cid ges copsvofall | al open deserters remain | all open desripios coeds and Ferenfscpendaxcriptore | openunleedesenpir’s | all dna removed fom pipe Hos F_cLowen bast or FPO ontactclowe Peake Tid gets copies ofall all open message queue allepen message queue macage | purenfecpenmesige | descriptorsare loved | Senrptrs areccned queves, (une descriptor SydanV | noice vote ‘noefect message guewee | = Tex aeredifinahared ‘anise ulesoimebered | vanshes unlow insharad tnutexesand | memory and procs: | memory that stays open | memory tha stays open condition |) shored anrbute and proces-shared Sd processharel ‘aviables | stebute sribure Fest chard inahared ‘anibes wiles in shared’ | vanishes ures in shard read-wrie | memoryand poces | memory thatstaysopen | memory thet tays.open Toc Share attbute and processshared fend proces-ahared stsbute setrbute Tesi SaeliFinaared ‘onishesualesinshared vanishes unless in shared tmerory-besed | memory and proces: | memory that etye open | znemery that ays open Semaphore |) shared atribute bd proces-hared dnd proces axinbuie ates Peale alTopenin parent remain | ony open are ws any open are dosed named open in els soraphores | i System] ll sennad values inchild | allseradi values cared | allzenad values are semaphores || aresett00 Overtonew progam | added wyeoresponding semaphore value went Todseld by parentove | Tocsareuncfangedas | alfoutstanding locks rod otinherited byl | longa descriptor remains owned by procs re loan open vnlocked 7 Tnanory supple | memary mappingsare | memory mappings ae tremory || panemtareremined by | unmapped mappings emma # a Posi rmenery mappingsin’ | renwary mappingsare | memory wappnGS ae shared farentareretsined by | nmapped sawoapped memory || aa System V | attached dared memery | atached tart memary | atached stared memory stared Segments remain atiached | Sepmentsaredeached | segmentsore detached menery—_|| tyrants Doers child getscopien ofall | elt doordesenpiors should | llopen descriptors dcaed porensopen depts | heclosed becatse they are Eutenly porent isa scrver | cteted with Fb CLOREC fordoorinverationson | bitsee oor deseiptrs Figure 16 Flfect of calling fork, exec, and exit on IPC. Section 1.6 Error Handling: Wrapper Functions IL 16 Error Handling: Wrapper Functions In any real-world program, we must check every function call for an error return. Since terminating on an error is the common case, we can shorten our programs by defining ‘wrapper furiction that performs the actual function call, tests the return value, and termi- nates on an error. The convention we use is to capitalize the name of the function, as in. Sem_post ptr} ‘Our wrapper function is shown in Figure L7. swt Ibfraparixe 386 Senpoot (sont sen) 388 ¢ 390 Af sem_postisen) == -1) set err_sys ("sem_post error"): 302) it fraprricc Figure 17 Our wrapper function forthe sen. post function. Whenever you encounter a funtion name in the text that begins with a capital let- ter, thal is @ wrapper furiction of our own. It calls @ function whose name is the same but begins with the lowercase letter. The wrapper function aluxtys terminates ‘with an error message ifan error is encountered. When describing the source code that is presented in the text, we always refer to the lowest-level function being called (e.g., sem_post) and not the wrapper faction (eg, Sem_post). Similarly the index aluays refers to the lowest level function being called, ard not the wrapper functions. ‘The format of the source code just shown is used throughout the text. Each nonbank line Is ‘numbered. The text describing portions ofthe code begins wath the starting and ending line ‘numbers in the left margin. Sometimes the pargraph is preceded by a short descriptive bold heading, providing a summary statement of the code being described, ‘The hozantal rules at the beginning and end of the code fragment specify the source code filename: the file wrapuntx.c in the directory Lib fr this example. Since the source code fot all the examples in the text is Freely available (se the Preface), you can locate the appropriate source file. Compiling, runing, and especially modifying these programs while rading this toe is an excellent way to learn the concepts of interprocesscommunictions. Although these wrapper functions might not seem like a big savings, when we dis- cuss threads in Chapter 7, we will find that the thread functions do not set the standard Unix errno variable when an error occurs; instead the errno value is the return value of the function. This means that every time we call one of the pthread functions, we must allocate a variable, save the return value in that variable, and then set exzno to this value before calling our err_sys function (Figure C4). To avoid cluttering the code with braces, we can use C'S comma operator to combine the assigninent into errno and the call of err_sys into a single statement, as in the following: 12 Introduction Chapter 1 Af ( (m= pehread mitex lock (éndone mutex) } I= 0) erm =f, erzsys ("pthread mtex_lock error); Alternately, we could define a new error function that takes the system’s error number san argument. But we can make this piece of code much easier to read as just Penread_mutex lock (endone mutex) ; by defining our own wrapper function, shown in Figure 18. — ib tore 5 wid rar 126 Penrend mutex tock(pthread mutex t *optr! iar ¢ 128 int. 129 Af ( (m= pthroag_mutex lock(mptri} == 0) 130 return 332 ere_sys(*pehrend mutex lock error"): 333) lib trapptirendc Figure L€. Our wrapper function for pthread sutex_lock With careful C coding, we could une macros instead of functions, providing litle run-time cfitciency, but these wrapper funchons are rarely, fever, the performance bottleneck of a pro: fram. (Our choice of capitalizing the first character of the function name is 2 compromise. Many cther styles were considered: prefising the fmetion name with an (as done on p. 182 of [Kernighan and Pike 1983), appending to the fanetion name, and so on, Our style seems the least distracting whe stil providing e visual indication that sone other function is really being called “This technique has the side Benefit of checking for errors from functions whose errr retuens aren grote: close and ptnread mutex_loee. for ample, “Throughout the rest of this book, we use these wrapper functions unless we need to check for an explicit error and handle it in some form other than terminating the pro- cess. We do not show the source code for all our wrapper functions, but the code is freely available (see the Preface). Unix ezrno Value When an error occurs in a Unix function, the global variable errno is set to a positive value, indicating the type of error, and the function normally returns -1. Our exr_sys function looks at the value of errno and prints the corresponding error message string, (eg, "Resource temporarily unavailable” if errno equals ERGAT). ‘The value of errno is set by a function only if an error occurs. Its value is unde- fined if the function does not return an error. All the positive error Values are constants with an albuppercase name beginning with E and are normally defined in the Section 17 Unix Standards 13 WwW Posix header. No error has the value of 0 With multiple threads, each thread must have its own errno variable. Providing a per-thread errno is handled automatically, although this normally requires telling the compiler that the program being compiled must be reentrant. Specifying something, like -D_REENTRANT or -D_POSIX_C_SOURCE=1995061 to the compiler is typically required. Often the header defines errno as a macro that expands into a function call when REENTRANT is defined, referencing a per-thread copy of the error variable. ‘Throughout the text, we use phrases of the form “the ma_send function returns EMSGSIZ®” as shorthand to mean that the function returns an error (typically a return value of ~1) with errno set to the specified constant. Unix Standards Most activity these days with regard to Unix standardization is being done by Posix and ‘The Open Group. Posix js an acronym for “Portable Operating System Interface.” Posix is not a single standard, but a family of standards being developed by the Institute for Electrical and Electronics Engineers, Inc, normally called the IEEE. The Posix standards are also being adopted as international standards by ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission), called ISO/IEC. The Posix standards have gone through the following, iterations ‘+ IEEE Std 1008.1-1988 (317 pages) was the first of the Posix standards. It specified the C language interface into a Univlike kernel covering the following areas: process primitives (fork, exec, signals, timers), the environment of a process (user IDs, pro- cess groups), files and directories (all the I/O functions), terminal I/O, the system databases (password file and group file), and the tar and cpio archive formats. ‘The first Posix standard was a eal use version in 1986 kown a6 “THEEDX” The aime Posts ‘was suggested by Richard Stallman, ‘+ IEEE Std 1003.1-1990 (256 pages) was next and it was also International Standard ISO/IEC 9945-1: 1990. Minimal changes were made from the 1988 version to the 1950 version. Appended to the title was “Part I: System Application Program Inter- face (APD [C Language]’ indicating that this standard was the C language API. + IEEE Stl 1003.2~1992 was published in two volumes, totaling about 1300 pages, and. its title contained “Part 2: Shell and Utilities.” This part defines the shell (based on the System V Bourne shell) and about 100 utilities (programs normally executed from a shell, from awk and basename to vi and yace). Throughout this text, we refer to this standare as Posi 14 Introduction Chapter 1 + IEEE Std 1003.1b-1993 (590 pages) was originally known as IEEE P1003.4. This was an update to the 1003.1-1990 standard to include the realtime extensions developed by the P1003.4 working group: file synchronization, asynchronous I/O, semaphores, memory management (nmap and shared memory), execution scheduling, clocks and timers, and message queues. ‘+ IEEE Std 1008.1, 19% Edition [IEEE 1996] (743 pages) includes 1003.1-1990 (the base API, 1003.1b-1983 (realtime extensions), 1003.1c-1995 (Pthreads), and 1008. 11-1995 (technical corrections to 1008.1b). This standard is also called ISO/IEC 9945-1: 1996, ‘Three chapters on threads were added, along with additional sections on thread syn- chronization (mutexes and condition variables), thread scheduling, and synchroniza- tion scheduling. Throughout this text, we refer to this standard as Posix 1 (Over one-quarter of the 743 pages are an appendix tiled "Rationale and Notes.” This ratio. rele contain historical information and reasons why certain features were included! or omit tee, Often he rationale Is as informative asthe official standaxd ‘Unfortunatly the IEEE standards are not freely available on the Internet, Ordering informa- tion is given in the Bibliography entry for [EEE 1996). [Note thar semaphores were defines in the rsline standard, separately from mutexes and condition variables (which were defined in the Pthweads standard), which accounts for some of the differences that we se in their APIs. Finally note that rend-write locks are not Ged pert of any Pesix standard. We sey more about thisin Chapter 8 Sometime in the future, a new version of IEEE Std 1008.1 should be printed to include the P1003.1g standard, the networking APIs (sockets and XI), which are described in UNPV1. ‘The Foreword of the 1996 Posix.1 standard states that ISO/IEC 9945 consists of the following parts: * Part 1: System application program interface (APD) [C languagel, + Part 2: Shell and utilities, and * Part 3: System administration (under development). arts 1 and 2 are what we call Posix.1 and Posix.2. Work on all of the Posix standards continues and it is a moving target for any book that attempts to cover it. The current status of the various Posix standards is available from http: //waw.pasc.org/standing/sdit -htma, ‘The Open Group ‘The Open Group was formed in 1996 by the consolidation of the X/Open Company (founded in 1984) and the Open Software Foundation (OSE, founded in 1988). It is an international consortium of vendors and end-user customers from industry, govern- ‘ment, and academia. ‘Their standards have gone through the following iterations: Section 1.8 Road Map to IPC Fxamples in the Text 15 ‘+ X/Open published the X/Open Portability Guide, Issue 3 (XPG3) in 1989, ‘+ Issue 4 was published in 1992 followed by Issue 4, Version 2 in 1994. This latest ver- sion was also known as “Spee 1170,” with the magic number 1170 being the sum of the number of system interfaces (626), the number of headers (70), and the number of commands (174). The latest name for this set of specifications is the “X/Open Sin- gle Unix Specification,” although itis also called “Unix 95. ‘+ In March 1997, Version 2 of the Single Unix Specification was announced. Products, ‘conforming to this specification can be called “Unix 98,” which is how we refer to this specification throughout this text. The number of interfaces required by Unix 98 increases from 1170 to 1434, although for a workstation, this jumps to 3030, because it includes the CDE (Common Desktop Environment), which in turn requires the X Window System and the Motif user interface. Details are available in [Josey 19971 and http: / /wan.UNIX-systems .org/version2, ‘Much ofthe Single Unix Specification is frely availabe onthe Intsrnet fom this URL, Unix Versions and Portability 1.8 Most Unix systems today conform to some version of Posix.1 and Posix.2 We use the qualifier “some” because as updates to Posix occur (e,, the realtime extensions in 1983 and the Pthreads addition in 1996), vendors take a year or two (sometimes more) to incorporate these latest changes. Historically, most Unix systems show either a Berkeley heritage or a System V her ge, but these differences are slowly disappearing as most vendors adopt the Posix standards. The main differences still existing deal with system administration, one area that no Posix standard currently addresses. ‘Throughout this text, we use Solaris 26 and Digital Unix 4.0B for most examples. The reason is that at the time of this writing (late 1997 to carly 1998), these were the only two Unix systems that supported System V IPC, Posix IPC, and Posix threads. Road Map to IPC Examples in the Text Three patterns of interaction are used predominantly throughout the text to illustrate various features: 1. File server: a client-server application in which the client sends the server a pathname and the server returns the contents of that fie to the client 2. Producer-consuimer: one or more threads or processes (producers) place data into a shared buffer, and one or more threads or processes (consumers) operate on the data in the shared buffer. 6 Introduction Chapter 1 1.9 3. Sequence-number-increment: one or more threads or processes increment a shared sequence number. Sometimes the sequence number is in a shared file, and sometimes itis in shared memory. ‘The first example illustrates the various forms of message passing, whereas the other ‘two examples illustrate the various types of synchronization and shared memory. ‘To provide a road map for the different topics that are covered in this text, Figures 1.9, 140, and 1:11 summarize the programs that we develop, and the starting figure number and page number in which the source code appears. Summary IPC has traditionally been a messy area in Unix. Various solutions have been imple- ‘mented, none of which are perfect. Our coverage is divided into four main areas: 1._message passing (pipes, FIFOs, message queues), 2. synchronization (mutexes, condition variables, read-write locks, semaphores), 3, shared memory (anonymous, named), and 4, procedure calls Golaris doors, Sun RFC). We consider IPC between multiple threads in a single process, and between multiple processes. ‘The persistence of each type of IPC as either can be process-persistent, kernel- persistent, or filesystem-persistent, based on how long the IPC object stays in existence. When choosing the type of IPC to use for a given application, we must be aware of the persistence of that IPC object. “Another feature of each type of IPC is its name space: how IPC objects are identified by the processes and threads that use the IPC object. Some have no name (pipes, mutexes, condition variables, read-write locks), some have names in the filesystem. (FIFOs), some have what we describe in Chapter 2 as Posix IPC names, and some have other types of names (what we describe in Chapter 3 as System V IPC keys or identi- fiers). Typically, a server creates an IPC object with some name and the clients use that name to access the IPC "Throughout the source code in the text, we use the wrapper functions described in Section 146 to reduce the size of our code, yet still check every function call for an error return, Our wrapper functions all begin with a capital letter. ‘The IEEE Posix standards—Posix.1 defining the basic C interface to Unix and Posix.2 defining the standard commands—have been the standards that most vendors are moving toward. ‘The Posix standards, however, are rapidly being absorbed and ‘expanded by the commercial standards, notably The Open Group's Unix standards, such as Unix 98, Section 1.9 Summary 17 Figure | Page Daseipion a 45, 416 468 423 425, 47” Uses to pipes, parent-child 53, | Uses popen and cat 155, | Uses two FIFOs, parent-child 57 | Uses two FIFOs, stand-alone server, unrelated client {62 | Uses FIFOs, stand-alone iterative server, multiple clients 158 _|_Uses pipe or FIFO: builds records on top of byte stream 6 615 | 144 | Uses one System V mesage queue, multiple cients 620 _| 148 | Uses one System V mavage queue per client, multiple clients TAI] Uses two System V message queves TSB | Ss] Uses desenptor passing across a door igure 19 Different versions ofthe fle server client-server example. Figure | Fase Dexription| 72 | 102 | Niutox only. multiple prxiucers one costimer 76 _|_165_| Mutexsnd condition Yarabe, multiple producers, one consumer | 1017 | 236) Posix named semaphores, one predacer one cnstimer 1020 | 242 | Posix memory-based semaphores, one producer one conser 1021 | 243. | Poss memory-based semaphores, malliple producers, one consumer 1024 | 246. | Posi memory-hared semaphores, multiple producers muliple consumers 1053 | 254_| Posi memory-based semaphores, one prosuce, ne consumer: nlp busfers Figure L10_Diferent versions ofthe producar-consumer example Figure [Fane | Description 91 | 19h | Sqn, nolocking 7 93. | am | Sai infile, cent lacking 912 | 215. | Sein file, flesystem locking using cen 1019 | 239 _| Sei le, Poss names ornaphore locking 1210 312) Seq inmmep shared memory Posx named semaphore Tocking 1212 | 34 | SeqPinmnap shared memory, Pook memory based semaphore locking 1214 316 | SeqPin4aESD anonymons shared memory, Posi named semaphore locking 1215 316 | SeqPinSVRE /dev/zer0 shared merry, Pos named semaphore locking 137 334__ Seq in Posi shared memory, six memory-based semaphore locking "A34|457 Perfrmance measurement mites locking teteentheads ‘A36 | 489 Performance measurement read-vertelcking between treads ‘839 | (1 | Performance measurement: Posi memory-based gemaphore locking between trends ‘Aa1 | 498 Peformance measurement: Poss nama semaphore locking Deeen tics ‘42 | 494 Peformance measurement: Sym V sernaphore locking between tad 45 | 496 Performance messinement Font record locking between Seas ‘Aa8_| 499 _ Peformance measurement mutex locking between process Figure 11 Diferent vorsions of the sequence-niumber increment example. 18 Introduction Chapter 1 Exercises 41 In Figure 1.1 we show two processes accessing a single file. If both processes are just appending new data to the end of the file (a log file perhaps), what kind of synchronization required? 112. Look at your systems header and sce how it defines errno, 113. Update Figure 15 by noting the features supported by the Unbx systems that you use, 24 22 Posix IPC Introduction ‘The three types of IPC, * Posix message queues (Chapter 5), * Posix semaphores (Chapter 10), and * Posix shared memory (Chapter 13) are collectively referred to as “Posix IPC." ‘They share some similarities in the functions that access them, and in the information that describes them. This chapter describes all these common properties: the pathnames used for identification, the flaps specified when opening or creating, and the access permissions. A summary of their functions is shown in Figure 2.1. IPC Names In Figure 1.4, we noted that the three types of Posix IPC use “Posix IPC names” for their identification. ‘The first argument to the three functions ma_open, sen_open, and shm_open is such a name, which may or may not be a real pathname in a filesystem. All that Posix.1 says about these names is: Tt must conform to existing rules for pathnames (must consist of at most PATTH_MAX bytes, including a terminating null byte). * If it begins with a slash, then different calls to these functions alll reference the same queue. If it does not begin with a slash, the effeet is implementation dependent. 1% 2 Postx IPC Chapter? Mest? | Semaphowe ‘Shared quewss memory Header “Zuqueue.h> | | TFuncnonsto create open, ordalete | zq_open | sen_open ‘ehn_open mactone | semclose | shm unlink mquntiok | semunlink sen destroy ‘Ranelions for contol opizations | nq_getater feruncate | Functions foriPC apeatons | nq_sena | sen wait nap nqcreceive | sem_crywait | munnap nqnotity | sen post sen_getvalue Figure 21. Summary of Posix IPC functions ‘+ The interpretation of additional slashes in the name is implementation defined. So, for portability, these names must begin with a slash and must not contain any other slashes. Unfortunately, these rules are inadequate and lead to portability problems. Solaris 26 requires the initial slash but forbids any additional slashes. Assuming a message queue, it then creates three files in /tmp that begin with .€Q. For example, if the argument to mqopen is /queue.1234, then the three files are /emp/.HODqueue.1234, /tmp/ -MOLqueue.1234, and /tmp/.MQPqueue.1234. Digital Unix 4.0B, on the other hand, creates the specified pathname in the filesystem. The portability problem occurs if we specify a name with only one slash (as the first character: we must have write permission in that directory, the root directory. For example, /tmp.1234 abides by the Posix rules and would be OK under Solaris, but Digital Unix would try to ereate this file, and unless we have write permission in the root directory, this attempt would fail. If we specify a name of /tmp/test .1234, this ‘will succeed! on all systems that create an actual file with that name (assuming that the emp ditectory exists and that we have write permission in that directory, which is nor ‘mal for most Unix systems), but fails under Solaris. ‘To avoid these portability problems we should always #define the name in a header that is easy to change if we move our application to another system. “This case is one in which the standard ties to be so general Gn this case, the realtime standard was trying to allow message queue, semaphore, and shared memory implementations all ‘within existing Unix kernels are as stand-alone diskless systers) thatthe standan's seta isnonportable. Within Posi, thisis called “2 standard way of being nonstandard.” Posix.1 defines the three macros 5 TYPEISKO uf) SLTYPETSSEN (buf) ‘Ss TYPEISSHO (buf) Section 22 IPC Names 24 that take a single argument, a pointer to a stat structure, whose contents are filled in by the fstat, Lstat, or stat functions. These three macros evalvate to a nonzero value if the specified IPC object (message qucue, semaphore, or shared memory object) is implemented as a distinct file type and the stat structure references such a file type. Otherwise, the macros evaluate to 0. ‘Unfortunately, these macros ae of itl use, since there is no guarantee that these three types Cf IPC are implemented using 2 distinct file type. Under Solaris 26, for example, all three ‘macros always evaluate to 0 Allthe other macros that test for a given file type have names beginning with $_15 and their single argument is the st_node member of a stat structure. Since these three new mocres havea different argument, their names were changed to begin with S- TYPEIS. px_ipc_name Function Another solution to this portability problem is to define our own function named px_ipc_name that prefixes the correct directory for the location of Posix IPC names. ‘include “unpipe-b* char "px ipe_nane(const char *name); “Thisis the notation we nse fr functions of our ovn throughout this book that are not standard system functions: the box around the fonction prototype and return valve is dashed. The Iweader that included at ehe baglening is usually cur unpipe-h header (Figure C.). ‘The name argument should not contain any slashes. For example, the call PiLipe_nane ("testi") returns a pointer to the string /testi under Solaris 2.6 or a pointer to the string /tmp/test1 under Digital Unix 4.08. The memory for the result string is dynamically allocated and is returned by calling free. Additionally, the environment variable PX_IPC_NAME can override the default directory. ‘Figure 22 shows our implementation of this function. ‘This may be your fst encounter with snprint , Lots of sting code calls sprinc instead, but sprint cannot check for overfow of the destination builer snprint, on the other hand, requires that the second argument be the size of the destination buffer, and this buffer ‘will not be overflowed. Providing mpat that intentionally overflows @ progrem's eprint ‘afer has been used for many years by hackers braking into syste, enprinct is not yet part of the ANSI C standard buts being considered fora revision of the ‘standard, currenly called COX. Nevertheless, many vendors are providing it 2s patt of the srandard C brary. We use snprine throughout the text, providing cur own version that justcals spr ine£ when itis not provided, 22 Posix IPC Chapter 2 23 iblpx pe name T Virelude —“unpipe.ky 2 char * 3 pxipe_nane(const char ‘nane) ae 5 char ‘dir, *det, *alashs © it ( (ast = malloc (PATH ¥aK)) == NULL 7 return (NOLL) + 8 /* can override default directory with environnent variable */ 9 AE ¢ (air = getenv(*PX IPC NAME") == NULL) ( 20 #itdet POSIX IPC_PREFIX a Gir = POSTKLTFCL_PREFTX: /+ from config.h” */ a2 tel: a air = *yeepysy (+ aotaule +7 16 fendi 3} 16 J+ dix must end ina slash */ 17 slash = (irfarzien(aiz) — == 174) 2a 1 arprintf(det, PATHAK, “Se¥ets", dix, elash, name): 19 return (ast): /* caller can free() this pointer */ 20} —lib|px ipe namee Figure22 Ourpx_ipe_nane function, Creating and Opening IPC Channels ‘The three functions that create or open an IPC object, ma_open, ser_open, and shn_open, all take a second argument named oflag that specifies how to open the requested object. This is similar to the second argument to the standard open function. ‘The various constants that can be combined to form this argument are shown in Fig ure 23. Description aLopen readonly ©. RON veriteonly ecient read-write Lr ‘create fit does not already exist | 0. CRERT exclusive create xc nonblocking mode ‘©-NONBLOGE truncate fit already exits Figure 23. Various constants when opening or eating a Posix PC object “The first three rows specify how the object is being opened: read-only, write-only, or read-write, A message queue can be opened in any of the three modes, whereas none Section 23 Creating and Opening IPC Channels 23 of these three constants is specified for a semaphore (read and write access is required for any semaphore operation), and a shared memory object cannot be opened write only. ‘The remaining 0_a.xx flags in Figure 23 are optional. O_CREAT O_EXCL. Create the message quene, semaphore, or shared memory object if it does not already exist. (Also see the O_EXCL flag, which is described shortly.) When creating a new message queue, semaphore, or shared mem- ory object at least one additional argument is required, called mode ‘This argument specifies the permission bits and is formed as the bit- wise-OR of the constants shown in Figure 2.4. Constant | Description ‘SEKUSK | user read SLEWUSR | user verite ‘SLIRGEP | group read souwcee | groupwite S_EROTH | other road simon | other write Figure 24 me constants when a new LPC objec is created. ‘These constants are defined in the header. The specified permission bits are modified by the file mode crentiom mask of the process, which can be set by calling the umask function (pp. 83-85 of APUE) or by using the shell's unask command, ‘As with a newly created file, when a new message queue, semaphore, or shared memory object is created, the user ID is set to the effective user ID of the process. The group ID of a semaphore or shared memory object is set to the effective group ID of the process, or to a system default group ID. The group 1D of a new message queue is set to the effective group ID of the process. (Pages 77-78 of APUE talk more about the user and group IDs.) “This difference in the sting ofthe group ID between the dee types of Posie IPC is strange. The group ID of a new fle eeated by open is ether the effec tive proup ID ofthe processor the group TD of the directory i Which the file is created, But the IPC Functions carmot assume that a pathname inthe filesystem fn eented for an IPC cbt If this flag and 0_CREAT are both specified, then the function creates a new message queue, semaphore, or shared memory object only if it does not already exist. If it already exists, and if 0_CREAT | (0_EXCL is specified, an error of EEXTST is returned, 4 Posix IPC Chapter 2 ‘The check for the existence of the message queue, semaphore, ot shared memory object and its creation (if it does not already exist) must be atomic with regard to other processes. We will see two simi- lar flags fox System V IPC in Section 3.4. O_NONBLOCK This flag makes a message queue nonblocking with regard to a read o_RUNC on an empty queue or a write to a full queue. We talk about this more with thema_receive and nq_send functions in Section 5.4. If an existing shared memory object is opened read-write, this flag specifies that the object be truncated to 0 length. Figure 25 shows the actual logic flow for opening an IPC object. We describe what we mean by the test of the access permissions in Section 2.4. Another way of looking at x start hee create new object nd oe mownciar Os, ets yes oe) i a already exist? (CLERERT set? errno © BYOENT ie ‘ae both CREAT | ‘error return, and C_EXCL set? pee errno = FEXTST ee sa aes eae ae eee | OK Figure 25. Logic for opening or erating an TPC: object. aqaquned [Op deencteit | Ooetakedy ene Tospecalfage | nor ereno = TORT | OK referees sing jee cma ‘OK cree new etjct) | OK, references ensting cbt cucntat | orice. | OKcrateerewatget | erorercno = EEUST | Figure 2.6 Logie for creating or opening an IPC object: Section 2.4 IPC Permissions 25 Note that in the middie line of Figure 26, the 0. CREA flag without ©_EXCL, we do not get an indication whether a new entry has been created or whether we are referencing, an existing entry. 24 IPC Permissions A new message queue, named semaphore, or shared memory object is created by ma_open, pem_open, or shn_open when the «flag argument contains the O_CREAT flag. As noted in Figure 24, permission bits are associated with each of these forms of IPC, similar to the permission bits associated with a Unix file. When an existing message queue, semaphore, or shared memory object is opened by these same three functions (either ©_CREA? is not specified, or 0_CRER? is specified without 0_EXCL and the object already exists), permission testing is performed based. the permission bits assigned to the IPC object when it was created, 2. the type of access being requested (0_RDONLY, ©_WRONLY, or O_RDWR), and 3. the effective user ID of the calling process, the effective group ID of the calling ‘process, and the supplementary group IDs of the process (if supported), ‘The tests performed by most Unix kernels are as follows: 1. If the effective user ID of the process is 0 (the superuser), access is allowed. 2. If the effective user ID of the process equals the owner ID of the IPC object: if the appropriate user access permission bit is set, access is allowed, else access is denied. By appropriate access permission bit, we mean if the process is opening the IPC object for reading, the user-read bit must be on. If the process is opening the IPC object for writing, the user-write bit must be on. 3. If the effective group ID of the process or one of the supplementary group IDs of the process equals the group ID of the IPC object: if the appropriate group access permission bit is set, access is allowed, else permission is denicd. 4, If the appropriate other access permission bit is set, access is allowed, else per- mission is denied. ‘These four steps are tried in sequence in the order listed. Therefore, if the process owns the IPC object (step 2), then access is granted or denied based only on the user access permissions—the group permissions are never considered. Similarly, if the process does not own the IPC object, but the process belongs to an appropriate group, then access is granted or denied based only on the group access permissions—the other per- missions are not considered. 26 Posix IPC Chopter 2 25 ‘We note from Figure 23 that 26=_open does not ust the G_RDOWEY, 0_WRONY, of O_RDE Rag, We note in Section 102, hensever that some Unix implemeniations assume ©_KLWR, since fan) useof a semaphore involves reading and vritng the somaphore value Summary ‘The three types of Posix IPC—message queues, semaphores, and shared memory—are identified by pathnames. But these may ot may not be real pathnames in the filesystem, and this discrepancy can be @ portability problem, The solution that we employ throughout the text isto use our own px_ipc_name function ‘When an IPC object is created or opened, we specify a set of flags that are similar to those for the open function. When a new IPC object is created, we must specily the per- missions for the new object, using the same S_xxx constants that are used with open (Figure 24). When an existing IPC object is opened, the permission testing that is per- formed is the same as when an existing file is opened. Exercises 21 In what way do the setuser-ID and set-groupID bits (ection 44 of APUB) of a program that uses Posis IPC affect the permission testing described in Section 2.47 22 When a program opens a Posix IPC object, how can it determine whether a new object was created or whether It is referencing an existing object? 34 System V IPC Introduction ‘The three types of IPC, ‘+ System V message queues (Chapter 6), ‘+ System V semaphores (Chapter 11), and ‘+ System V shared memory (Chapter 14) are collectively referred to as “System V IPC” This term is commonly used for these three IPC facilities, acknowledging their heritege from System V Unix. ‘They share many similarities in the functions that access them, and in the information that the ker- nel maintains on them. This chapter describes all these common properties. ‘A summary of their functions is shown in Figure 3.1. Wesone = Shared i gueuss | Semapbores | memory Feder ~oys/nag b> | | Funetion fo Galo oF OPER negget sengot shrek Fonction for antral operations | meget ‘ence saree ‘Functions for IRC operations negend ‘sen0p ‘Shnat_ mogrev shade igure 34. Summary of Syntem V IPC functions Information on the design and development of the System V IPC functions is hard to find IRochkind 1985] provides the following information: Sytem V message ques, semaphor, land shared momory were developed in the late 1970s at 2 branch laboratory of Bell 2D 28 System V IPC Chapter 3 32 Laboratories in Columbus, Ohio, for an internal version of Unix called (not surprisingly) "Columbus Unix” of ust "CB Unix.” This version of Unix was used for “Operation Support Systoms,” transaction processing eystems that automated tulephane company administration land recordeeping, System V IPC was added to the commercial Unix system with System V round 1988, key_t Keys and ftok Function In Figure 1.4, the three types of System V IPC are noted as using key_t values for their names. ‘The header defines the key_t datatype, as an integer, nor ‘mally at least a 32-bit integer. These integer values are normally assigned by the Etok function. “The function £tok converts an existing pathname and an integer identifier into a key_t value (called an IPC key). Finclude key_t ftok(const char *pufnome, int id): Returns IPC key OK, -1 on error ‘This function takes information derived from the pathname and the low-order 8 bits of id, and combines them into an integer IPC key. This function assumes that for a given application using System V IPC, the server and dlients all agree on 2 single pathname that has some meaning to the application. It could be the pathname of the server daemon, the pathname of a common data file used by the server, or some other pathname on the system. If the client and server need only single IPC channel between them, an id of one, say, can be used. If multiple IPC chan- nels are needed, say one from the client to the server and another from the server to the dlient, then one channel can use an id of one, and the other an id of two, for example. Once the pathname and id are agreed on by the client and server, then both can call the tok function to convert these into the same IPC key. ‘Typical implementations of £tok call the stat. function and then combine 1. information about the filesystem on which pathname resides (the st_dev mem- ber of the stat structure), 2. the file's i-node number within the filesystem (the st_ino member of the stat structure), and 3. the low-order § bits of the id. ‘The combination of these three values normally produces a 32-bit key. No guarantee exists that two different pathnames combined with the same, id generate different keys, because the number of bits of information in the three items just listed (filesystem iden- tifier, inode, and id) can be greater than the number of bits in an integer. (See Exer- cise3.5.) Section 32 key_t Keys and ftok Function 29 ‘The node sumber is never 0,50 most implementations define 7PC_PRIVATE (which we everbe in Section 3.4) 0 be. If the pathname does not exist, or is not accessible to the calling process, ftok retums ~1. Be aware that the file whose pathname is used to generate the key must not be a file that is created and deleted by the server during its existence, since each time it is created, it can assume a new inode number that can change the key retumed by £tok to the next caller. Example ‘The program in Figure 3.2 takes a pathname as a command-line argument, calls stat, calls £tok, and then prints the st_dev and st_ino members of the st.at: structure, and the resulting IPC key. ‘These three values are printed in hexadecimal, so we can eas- ily see how the IPC key is constructed from these two values and our id of 0x57. ae er svipelfokc T finclde —rwmpipes hr ms 2 int 3 nain(int arge, char **argv) ac 5 struct stat stat: 6 Sf farge t= 2) 7 ere_quit(‘usage: ftck "); 8 Statlergviil, estat): 9 print#(sst_dev: Fix, et inor the, Key: xine, 10 (Gong) stat-st dev, (wlong) stat.st_ino, n Feoke(arav(1], 0x57) 22 exittons at 2 svipe(fcke Figure 32 Obtain and print flesystem information and resulting TPC key Executing this under Solaris 2.6 gives us the following: solaris % ftok /otc/ayaten stdev: 600016, st_ine: dalb, key: 57018a1b solaris % ftok /usr/tmp se _dev: 600015, st_ine: 10:78, key: S7025b78 solarie 4 ftok /home/rstevens/Kat1.out st dev: S00O1f, st_ino: S03, key: 5701fb03 Apparently, the id is in the upper 8 bils, the low-order 12 bits of st_dev in the mext 12 bits, and the low-order 12 bits of st_ino in the low-order 12 bits. ‘Our purpose in showing this example is not to let us count on this combination of information to form the IPC key, but to let us see how one implementation combines the pathname and id. Other implementations may do this differently. FreeBSD uses the lower 8 bits ofthe i, the lower 8 bits of et_dev, and the lower 16 bts of st ino. 0 3.3 34 System V IPC Chapters [Note that the mapping done by Ft0k is one-way, since some bits from st dev and st_ine ‘are not used. Thats, given 2 lay, we cannot determine the pathname that ws used to Geet the hy ipc_perm Structure ‘The kernel maintains a structure of information for each IPC object, similar to the infor- mation it maintains for files. struct 1pe_perm ¢ wide uid) /+ omer‘a user ie */ gidt gid) /* omer" group id */ uit cuid; —/* creator's user id * ict cgi) /* creator's group 1a */ modet mode: /* read-wrive permissions */ vlorg_t seg; /* slot usage sequence number * keyt key: /* IPC key */ b ‘This structure, and other manifest constants for the System V IPC functions, are defined. in the header. We talk about all the members of this structure in this chapter. Creating and Opening IPC Channels The three get XXX functions that create or open an IPC object (Figure 3.1) all take an IPC key value, whose type is key_t, and return an integer identifier. This identifier is not the same as the id argument to the Etok function, as We sce shortly. An application has two choices for the key value that is the first argument to the three get XXX func tions: 1. call £tok, passing ita pathname and id, or 2. specify a key of TEC_PRIVATE, which guarantees that a new, unique LPC object is created ‘The sequence of steps is shown in Figure 3.3. -eeae satire! peor) Léey a Amerrsn Frsacti0, megsndi), megrew’ efrec_ersvare [Serger () [28 Menten, cercen (| senop0) pj ekaget |) letect LO, shnat (), stat opeat ore ‘occas IPC cane IC channel Figure 3:3. Ganerating IPC identiiors from IPC keys. Section 34 Creating and Opening IPC Channels 31 All three get XXX functions (Figure 3.1) also take an oflag argument that specifies the xead-write permission bits (the mode member of the ipe_perm structure) for the IPC ‘object, and whether a new IPC object is being created or an existing one is being refer- enced. ‘The rules for whether a new IPC object is created or whether an existing one is referenced ate as follows: ‘+ Specifying a key of TPC_PRIVATE guarantees that a unique IPC object is created. No combinations of pathname and id exist that cause Etok to generate a key value of 'PC_PRIVATE, + Setting the TEC_CREAT bit of the oflag argument creates a now ontry for the specified key, if it does not already exist. If an existing entry is found, that entry is retumed. ‘+ Setting both the TPC_CREA and TPC_EXCL bits of the ofiag argument creates a new entry for the specified key, only if the entry does not already exist. If an ‘existing entry is found, an error of BEXTS' is returned, since the IPC object already exists, “The combination of TPC_CREAT and TPC_EXCL with regard to IPC objects is similar to the combination of O_CREAT and 0_EXCL with regard to the open. function. Setting the 1PC_EXCL bit, without setting the TPC_CREAT bit, has no meaning. ‘The actual logic flow for opening an IPC object is shown in Figure 34. Figure 35 shows another way of looking at Figure 34. Note that in the middle line of Figure 3.5, the rPc_CREAY flag without 1Pc_EXCL, we do not get an indication whether a new entry has been created or whether we ate referencing an existing entry. In most applications, the server creates the IPC object and specifies either IPC_CREAT Gf it does not care whether the object already exists) or TPC_CREAT | IPC_EXCE (if it needs to check whether the object already exists). The clients specify neither fag (assuming that the server has already created the object). ‘The System V IPC functions defie thelr own TPC_xx constants, instead of wsing the (0-CRENT and ©_EXCE constants that are used by the standard open function along with the ‘Posie IPC functions Figure 23) [Also note that the System V IPC fuetions combine their TACs constants with tho permit~ ‘sion bits (hich we desenbe in the next section) into a single ofl argument, The open func- tion aking with the Posix IPC functions have ane argument named flag that specifies the various ©-792 flags, and another argument named me that specifies the permission bts. 32 System V IC Chapter 3 ox cater ety sre rer sete tro 1 ance} 2p) system tal Be eeala =i "| Puen Maa errno = ENOSPC ¥ se ny 7 ye ‘seated t docskryaleady exit? |p| ree creanaet? [Be | erent, ye — 4 webeih 2c GT | ye, arorsctum, snd ute Exch? exine HST — ouye iereed weiheaces | no, aati, permis OK? extn 2 nantes pe ox storm Westie Figure34. Logic fer creating or opening an IPC objec ofa argument Tay dossnok oxsk Te alendy exis ‘no special age ‘error, errno = ENOEIP | OK, references existing object 1PC_CREAT OK creates new entry | OK, references existing object 1eC_cREAT | TRCLExCL | OKjeretesnewenty | enorermno = EEXISt Figure 3.3. Logic for creating or opening an IPC chante. 3.5 IPC Permissions Whenever a new IPC object is created using one of the get XXX functions with the ‘IPC_CREAT flag, the following information is saved in the ipc_perm structure (Gec- ion 33): 1, Some of the bits in the oflag argument initialize the mode member of the ipc_perm structure. Figure 5.6 shows the permission bits for the three different IPC mechanisms. (The notation >> 3 means the value is right shifted 3 bits.) Section 35 IPC Permissions 33, Symbolic valoes Numesie | hiesage |g Shared end | gant | Semaphore | Rinery | Desarption 0400 | HSER ‘SER ‘Sine ead by user 0200 | msc. SBCA su write by user (0060 | MSGR >> 3 | SHUR >> 3 | SHUR >> 3 | ready group o020_| MGW >> 3 | SECA >> 3 | sian >> 3 | wnteby group 0008 [MSGR >> 6 | SRLR >> 6 | SHLR >> 6 | ready athers 002 | rscw >> 6 | sma >> 6 | semcw >> 6 | write by others Figure 46 mode values for IPC read-waite permissions 2 The two members cuié and cgid are set to the effective user ID and effective group ID of the calling process, respectively. These two members are called the ‘creator IDs. 3. The two members uid and gid in the ipc_perm structure are also set to the effective user ID and effective group ID of the calling process. These two mem- bers are called the cwner IDs. ‘The creator IDs never change, although a process can change the owner IDs by calling the ct 1XXX function for the IPC mechanism with a command of T°C_Se7, The three ct1XXX functions also allow a process to change the permission bits of the mode mem- ber for the IPC object. Most implementations define the si constants MSG_R, MSG, SEXLR, SLA, SHOLR, and SHH shown in Figure 24 in the , , and headers But these are not required by Uni 98. The sulin nin S74 _p stands for “alte ‘The three get200X fonctions do not use the normal Unix fle made ction mask. The permis sions of the message queve, semaphore, or shared memory segment ate set to exactly What the function specifies Posi IPC does not let the creator of an IPC object change the owner, Nothing i lke the ZTEC_SE® command with Posin IPC. Buti the Posix IPC name ie sted inthe flesystem, then the superuser can change the owner using the chown command, ‘Two levels of checking are done whenever an IPC object is accessed by any process, once when the IPC object is opened (the getXXX function) and then each time the IPC object is used: 1. Whenever a process establishes access to an existing IPC object with one of the get XXX functions, an initial check is made that the caller's oflag argument does not specify any access bits that are not in the mode member of the ipc_perm structure. This is the bottom box in Figure 34. For example, a server process can set the mode member for its input message queue so that the group-read and other-read permission bits are off. Any process that tries to specify an offag argument that includes these bits gets an error return from the nsgget function. But this test that is done by the get XXX functions is of little use. It implies that 4 System V IFC Chapter 3.6 the caller knows which permission category it falls into—user, group, or other. If the creator specifically turns off certain permission bits, and if the caller speci- fies these bits, the error is detected by the get XXX function. Any process, how ever, can totally bypass this check by just specifying an oflag argument of 0 if it knows that the IPC object already exists. 2. Every IPC operation does a permission test for the process using the operation. For example, every time a process tries to put a message onto a message queue with the msgsnd function, the following tests are performed in the order listed. As soon as a test grants access, no further tests are performed. a. The superuser is always granted access, b. If the effective user ID equals either the uid value or the cui value for the IRC object, and if the appropriate access bit is on in the mode member for the IRC object, permission is granted. By “appropriate access bit,” we mean the read-bit must be set if the caller wants to do a read operation on the IPC object (receiving a message from a message queue, for example), or the write-bit must be set for a write operation. ©. If the effective group ID equals either the gid value or the cuid value for the IPC object, and if the appropriate access bit is on in the mode member for the IPC object, permission is granted. d. Hnone of the above tests are true, the appropriate “other” access bit must be on in the node member for the IPC object, for permission to be allowed. Identifier Reuse ‘The ipe_perm structure Gection 3.3) also contains a variable named seq, which is a slot usage sequence number. This is a counter that is maintained by the kernel for every ppotential IPC object in the system. Every time an IPC object is removed, the kernel increments the slot number, cycling it back to zero when it overflows. What we ae describing i this section i he common SVR implementation. ‘This implmen- tation techniques not mandated by Unix 8 “This counter is needed for two reasons. First, consider the file descriptors main- tained by the kernel for open files. They are small integers, but have meaning only within a single process—they are process-specific values. If we try to read from file descriptor 4, say, in a process, this approach works only if that process has a file open on this descriptor. It has no meaning whatsoever for a file that might be open on file descriptor 4 in some other unrelated process. System V IPC identifiets, however, are systenrwice and not process-specifc. ‘We obtain an IPC identifier (similar to a file descriptor) from one of the get: func- tions: mscget, senget, and shmget. These identifiers are also integers, but their meaning applies to all processes. If two unrelated processes, a client and server, for example, use a single message queue, the message queue identifier returned by the Section 3.6 Identifier Reuse 35 msgget function must be the same integer value in both processes in order to access the same message queue. This feature means that a rogue process could try to read a mes- sage from some other application's message queue by trying different small integer identifiers, hoping to find one that is currently in use that allows world read acoess. If the potential values for these identifiers were small integers (like file descriptors), then the probability of finding a valid identifier would be about I in 50 (assuming a maxi- ‘mum of about 50 descriptors per process). ‘To avoid this problem, the designers of these IPC facilities decided to increase the possible range of identifier values to include all integers, not just small integers. This increase is implemented by incrementing the identifier value that is returned to the call- ing process, by the number of IPC table entries, each time a table entry is reused. For example, ifthe system is configured for a maximum of 50 message queues, then the first time the first message queue table entry in the kernel is used, the identifier returned to the process is zero. After this message queue is removed and the first table entry is reused, the identifier returned is 50. The next time, the identifier is 100, and so on. Since seq is often implemented as an unsigned long integer (see the ipc_perm struc- ture shown in Section 3.3), it cycles after the table entry has been used 85,899,346 times (2"/50, assuming 32-bit long integers). ‘A second reason for incrementing the slot usage sequence number is to avoid short term reuse of the System V IPC identifiers. ‘This helps ensure that a server that prema- ‘urely terminates and is then restarted, does not reuse an identifier ‘As an example of this feature, the program in Figure 3.7 prints the first 10 identifier values returned by mecoet. sumsg/stote 7 Finchads Sawipech ae 2 ine 3 main(int axge, char *rargv) ac S tnt 4, maghas 6 for (= 0, 4 < a0) Aes) 7 oqid = Negget (10°C PRIVATE, SVMSG_NEOE | TPC_CREAT) & prince (negia ~ tain", regia): ° Magctl (oagid, ERC_SMID, NULL): 10 > a exit(ory ws 2 z sumgislotc FFignce 37 Print ere! asigned message queue identifier 10 times ina row. Each time around the loop msgget creates a message queue, and then msgct1 with a command of TPC_rue1D deletes the queue. The constant SVHSG_MODE is defined in our unpipe.h header (Figure C.1) and specifies our default permission bits for a System V message queue. The program’s output is solaris ¥ olot Fegid = 0 said = 50 36 System V IPC Chapter 3 37 38 150 200 280 = 300, aso 400 negid = 450 If we run the program again, we sce that this slot usage sequence number is a kernel variable that persists between processes. eolaris @ eee regia = 500 nogia = 550 nsqid = 600 nagid = 650 Bagis = 200 regia = 750 regia = 800 pogid = 650 regia = 900 nogid = 950 ipes and ipcrm Programs Since the three types of System V IPC are not identified by pathnames in the filesystem, ‘we cannot look at them or remove them using the standard 1s and xm programs. Instead, two special programs are provided by any system that implements these types of IPC: ipes, which prints various pieces of information about the System V IPC fea- tures, and ipcr, which removes a System V message queue, semaphore set, or shared memory segment. ‘The former supports about a dozen command-line options, which affect which of the three types of IPC is reported and what information is output, and the latter supports six command-line options. Consult your manual pages for the details of all these options, Since System V IPC isnot pat of Posi, these two commands are not standardized by Posix2. But these tivo commands are part of Unix 9, Kernel Limits Most implementations of System V IPC have inherent kernel limits, such as the maxi- mum number of message queues and the maximum number of semaphores per semaphore set. We show some typical values for these limits in Figures 6.25, 11.9, and 145. These limits are often derived from the original System V implementation. Section 11.2 of [Bach 1986] and Chapter 8 of [Goodheart and Cox 1984) both describe the System V implementation of messages, semaphores, and shared memory. Some ofthese its are described therein. Section 3.8 Kemel Limits 37 Unfortunately, these kernel limits are often too small, because many are derived from their original implementation on a small address system (the 16-bit PDP-11). For- tunately, most systems allow the administrator to change some or all of these default limits, but the required steps are different for each flavor of Unix. Most require reboot- ing the nanning kernel after changing the values. Unfortunately, some implementations still use 16-bit integers for some of the limits, providing a hard limit that cannot be excreded, Solaris 2.6, for example, has 20 of these limits. Their current values are printed by the sysde£ command, although the values are printed as 0 if the corresponding kernel module has not been loaded (i.e,, the facility has not yet been uscd). “These may be ‘changed by placing any of the following statements in the /etc/system file, which is read when the kernel bootstraps. det mogsys:neginfo_negsea = ralie magsys:neginto_megees = value fet magsya:naginte_megeal © tulue Set nagsys:neginfo_magiap = tule "t egsys:neginfo_negrax - rahe Set megeyerneginfo negenh - male pet megsys:nsginfo regmi = whe fet sensys:eeninfo_senopa = vile et ss = eet = eae fet = tale set et et set sensys:ceninfo_sertn set shnays:ehninto sein = value fet shnsys:chninfo_shasea = value eet ohmaye:ehninto_ehemase = vale Set shnaya:abninfo_sbrani = tele ‘The last six characters of the name on the left-hand side of the equals sign are the vari- ables listed in Figures 6.25, 11.9, and 145. With Digital Unix 4.0B, the sysconfig program can query or modify many kernel parameters and limits, Here is the output of this program with the -a option, which. queries the kernel for the current limits, for the ipc subsystem. We have omitted some lines unrelated to the System V LPC facility. aipha ¢ /sbin/syscontig -a ipe iver reg-max = 9192 meg-nnb = 16384 6a 40 4150204 ‘ aH 38 System VPC Chapter 3.9 fennel = 25 en-ops = 10 eenmaem = 16384 rom-of-sens = 60 Different defaults for these parameters can be specified in the /etc/syscontigtab file, which should be maintained using the sysconfigd> program. ‘This file is read when the system bootstraps. ‘Summary ‘The first argument to the three functions, msgvet, senget, and shnget, is a System V IPC key. These keys are normally created from a pathname using the system's £tok function. The key can also be the special value of IPC_PRIVATE. ‘These three functions create a new IPC object or open an existing IPC object and return a System V IPC identi- fier: an integer that is then used to identify the object to the remaining IPC functions. ‘These integers are not per-process identifiers (like descriptors) but are systemwide iden- tifiers. These identifiers are also reused by the kernel after some time. ‘Associated with every System V IPC object is an ipc_pern structure that contains information such as the owner's user ID, group ID, read-write permissions, and so on. ‘One difference between Posix IPC and System V IPC is that this information is always, available for a System V IPC object (by calling one of the three XXXct functions with an argument of TPC_STAT), but access to this information for a Posix IPC object depends on the implementation. If the Posix IPC object is stored in the filesystem, and if we know its name in the filesystem, then we can access this same information using the existing filesystem tools. When a new System V IPC object is created or an existing object is opened, two flags are specified to the GetXXX function (IPC_CREAT and IPC_EXCL), combined with nine permission bits. ‘Undoubtedly, the biggest problem in using System V IPC is that most implementa tions have artifical kernel limits on the sizes of these objects, and these Limits date back to their original implementation. These mean that most applications that make heavy use of System V IPC require that the system administrator modily these kernel limits, and accomplishing this change differs for each flavor of Unix. Exercises 3.1 Read about the msgeti function in Section 65 and modify the program in Figure 3.7 to print the ceq member of the {pe_peva structure in addition to the assigned identifier. Chapter Exercises 38 33 34 35 36 Immediately after running the program in Figure 3, we run a program that creates two message queues. Assuming no other message queues have been used by any other applica tions since the kernel was booted, what two values are returned by the kernel as the mes- ‘sage queue identifiers? ‘We noted in Section 35 that the System V IPC getXXX functions do not use the fle mode ‘creation mask Write a test program that creates @ FIFO (using the alk£ifo function described in Section 4.6) and a System V message queue, specifying a permission of (octal) (666 for both. Compare the permissions of the resulling FIFO and message queue. Make certain your shell umask value is nonzero before running this progra:n A server wants to create a unique message queue for its clients. Which is preferable—using. ‘some constant pathname (say the server’s executable file) as an argument to ftok, oF using IPC_PRIVATE? “Modify Figure 32 to print just the IPC key and pathname. Run the £ira program to print all the pathnames on your system and run the output through the program just modified. How many pathname’ map te the same key? {your system supports the sax program (“systam activity reporter”), run the command sar -n 56 ‘This prints the number of message queue operations per second and the number of semaphore operations per second, sampled every 5 seconds, 6 times. Part 2 Message Passing 44 42 Pipes and FIFOs Introduction Pipes are the original form of Unix IPC, dating back to the Third Edition of Unix in 1973 [Salus 1994], Although useful for many operations, their fundamental limitation is that they have no name, and can therefore be used only by related processes. This was cor rected in System III Unix (1982) with the addition of FIFOs, sometimes called named pipes. Both pipes and FIFOs are accessed using the normal reac and write functions. ‘Technically pipes canbe used between tnrlated processes, given the aby to pass desrip- tors between processes (which we describe in Section 158 ofthis text a well a Section 1427 0 [UNDv1). But for practical purposes, pipes are normally uscd between processes that ave 8 ‘This chapter describes the creation and use of pipes and FIFOs. We use a simple file server example and also look at some client-server design issues: how many IPC chan- nels are needed, iterative versus concurrent servers, and byte streams versus message interfaces. A Simple Client-Server Example ‘The client-server example shown in Figure 4.1 is used throughout this chapter and Chapter 6 to illustrate pipes, FIFOs, and System V message queues ‘The client reads a pathname from the standard input and writes it to the IPC chan- nel. The server reads this pathname from the IPC channel and tries to open the file for reading. If the server can open the file, the server responds by reading the file and writ- ing it to the IPC channel; otherwise, the server responds with an error message. The 3 43 Pipes and FIROs Chapter stain ore ee eee fC) cremor message ‘or eror message Figuread. Client-server example. Client then reads from the IPC channel, writing what it receives to the standard output. If the file cannot be read by the server, the client reads an error message from the IPC channel. Otherwise, the client reads the contents of the file. ‘The two dashed lines between the client and server in Figure 4.1 are the IPC channel. Pipes Pipes are provided with all favors of Unix. A pipe is created by the pipe function and provides a one-way (unidirectional) flow of data. pinclude int pipetint (A207 Retums: 030K, —t on ercor ‘Two file descriptors are returned: fi{0], which is open for reading, and fif1], which is open for writing. Some versions of Unix, notably SVRS, provide fellcuplex pipes, in which ease, both ends are vail for tending and writing. Anther way to cate a fulllupiex IPC channel is with the ocketaiy function, described in Section 14.3 of UNPUI, and this works on most current Unix systems, The most common use of pipes, however s withthe various shells in wich ‘oe, «hall duplex pipe is adequate Posy tan Unix 96 reuire only all-uplex pipes, and weassumo soin this chapter. “The $_TSPTFO macto can be used to determine if a descriptor or file is either a pipe or a FIFO. Its single argument is the st_mode member of the stat structure and the ‘acto evaluates to true (nonzero) or false (0). For a pipe, this structure is filled in by the Estat function For a FIFO, this structure is filled in by the fstat, Lstat, or stat functions. Figure 42 shows how a pipe looks in a single process. Although a pipe is created by one process, itis rarely used within a single process. (We show an example of a pipe within a single process in Figure 5.14) Pipes are typi- cally used to communicate between two different processes (a parent and child) in the following way. First, a process (which will be the parent) creates a pipe and then forks to create a copy of itself, as shown in Figure 4.3. — oe ey Sow of dia Figure 2 A pipeina single proces. Figure 43 Pipeina single process, immediately after fork. Next, the parent process closes the read end of one pipe, and the child process closes the write end of that same pipe, This provides a one-way flow of data between the two pro- cesses, as shown in Figure 4.4, perent hil {filo} 0 5 Figureaa_ Pipeberweon two processes ‘When we enter a command such as who | sort | Ip to a Unix shell, the shell performs the steps described previously to create three 46 Pipes and FIFOs ‘Chapter 4 Processes with two pipes between them. The shell also duplicates the read end of each Pipe (o standard input and the write end of each pipe to standard output. We show this Pipeline in Figure 45. eho process sort process tp process => £5. £ All the pipes shown so far have been half duplex or unidirectional, providing a one- way flow of data only. When a two-way flow of data is desired, we must create tyro Pipes and use one for each direction. The actual steps ate as follows: 1. create pipe 1 (fa1/0] anc fii{1)), create pipe 2 (f2{0] and fi2{1, 2. fork, 3. parent closes read end of pipe 1 (1/0), 4. parent closes write end of pipe 2(fa2/1), 5. child closes write end of pipe 1 (1), and 6. child closes read end of pipe 2 (4210). ‘We show the code for these steps in Figure 48. This generates the pipe arrangement shown in Figure 46, eid sie] 2007 Clow of dame Figure 46 Two pipes to provides bidirectional ow of data, Section 43 Pipes 47 Example Let us now implement the client-server example described in Section 42 using pipes. ‘The main function creates two pipes and forks child. The client then runs in the par- ‘ent process and the server runs in the chile! process. The first pipe is used to send the pathname from the client to the server, and the second pipe is used to send the contents, Of that file (or an error message) from the server to the client. This setup gives us the arrangement shown in Figure 47. stdin Pant rare ou Pethrane — | alent server se econ tear Mleconents orem oreror mange or or MeSOE Figure 47. Implementation of Figuze 41 using 90 pipes Realize that in this figure we show the two pipes connecting the two processes, but each, Pipe goes through the kernel, as shown previously in Figure 4.6. Therefore, each byte of data from the client to the server, and vice versa, crosses the user-kernel interface twice: ‘once when written to the pipe, and again when read from the pipe. Figure 48 shows our main function for this example. “TD Winelude —vunpine. 5 ee 2 void —cliene(ane, anc), sexver(ant, nt); 3 in 4 nain(int sree, char “army st © Sek pipet 21, pipert21y 7 pide Cnilépid; 8 Pipewwizer), J exeace two pipes */ $8 Pipetipe2): 10 AE ( (ehilepta = Forko} == 0) ¢/* entia +7 a Close (pipel(21)7 2 (lose (pipe20) a server(pipel(0), pipe2(11); au eae): ~% 46 /* parent */ 17 chose tptpen (01) 18 Close (pipe2i2); 19 client (pipe2(o}, piven ttn 20 Waiepie(enitépid, NULL, 0); /* wait for child te texminate */ 2 enti) zm) pipe/mainpipee Figure 48 oan function for client-server using to pipes. 48 Pipes and FIROs Chapter 4 Create pipes, fork ‘Two pipes are created and the six steps that we listed with Figure 4.6 are performed. The parent calls the client function (Figure 49) and the child calls the server func tion (Figure 4.10). waitpid for child ‘The server (the child) terminates first, when it calls exit after writing the final data to the pipe. It then becomes a zombie: a process that has terminated, but whose parent is still running but has not yet waited for the child. When the child terminates, the kernel also generates STGCHLD signal for the parent, but the parent does not catch this signal, and the default action of this signal is to be ignored. Shortly thereafter, the parent's client. function returns after reading the final data from the pipe. The parent then calls waitpid to fetch the termination status of the terminated child (the zombie). the parent did not call wai tpi, but just terminated, the child would be inherited by the init process, and another SIGCHLD signal would be sent to the init process, ‘which would then fetch the termination status of the zombie. ‘The client function is shown in Figure 49. ——pipejeliente 7 vinchude -unpipesn Pipe 2 vole 3 client (int read, int writeta) at 5 size. len; © ssizet a 7 char bute (Maneanel ; ® J read pathname */ 3 Feets(butf, BAXLINE, tein) ; 10 len = strien(buff) (* fgets() guarantees null byte at end */ ML AF @ugeQien 1) == *\n/) 2 lens (7 gelete newline fren fgets) */ B /* write pathnane to IFC channel */ id Write(writetd, butt, Lend; 1s J" xeud from TPC, write te standard cuteut */ 3 while ( (r= Raad(reaata, bute, NAXLINE)) > 0) a Weite(sToou? stim, Bate, nd; ae piprtectiontc Figure 49 clicnt function for elent-server wsing two pipes. Read pathname from standard input ‘The pathname is read from standard input and written to the pipe, after deleting the newline that is stored by fgets. Copy from pipe to standard output ‘The client then reads everything that the server writes to the pipe, writing it to Section 4.3 Pipes 49 1e23 standard output. Normally this is the contents of the file, but if the specified pathname cannot be opened, what the server returns is an error message. Figure 4.10 shows the server function. 7 Finclade -unpipe.h arsine a 2 void 3 merver (int readfé, int writefay ac 5 int Eas 6 asazet n; 7 char DUfE (MARLENE + 117 a (/* road pathnane fron IPC channel */ 9 Af ( (n= Road(readta, butt, MAXLINE]) == 0) 10 lerr_quit(*end-of-f1le while reading pathname"): a1 betetn) = *\0"7 /* null terninate pathnane */ 124 ( (fa = opem(mutt, ©_RDoNLY)) = 0) ( B [+ error: mst Cell client */ a4 suprinti(iutt +n, sizeof (buff) - x, “: can't open, s\n". as strezrer(errno)); 46 serton (bust av Write(writefd, butt. nds ie) else ( 19 1 open succeeded: copy file to TPC channel */ 20 white ( tn = Read(£a, EUEE, MAXLINE)) > 0) a Write(writefa, buff, n}; 2 closet ta) ; ashe) 2a) ie server Figure 410. server function for client-server using two pipes. Read pathname from pipe The pathname written by the client is read from the pipe and null terminated. Note that a read on a pipe returns as soon a3 some data is present; it need not wait for the requested number of bytes (MAXLINE in this example). pen file, handle error The file is opened for reading, and if an error occurs, an error message string is retumed to the client across the pipe. We call the strerror funetion to return the error message string corresponding to errno. (Pages 690-691 of UNPvI talk more about the strerror function.) Copy file to pine Ifthe open succeeds, the contents of the file are copied to the pipe. We can see the output from the program when the pathname is OK, and when an error occurs. 50 Pipes and FIFOs Chapter 4 44 solaris ® malnpipe Jetc/inet intp-cont file consisting of fc Hines multicastelient 224.0.1.2 Grifcesle /ete/inet/ntp.arite solaris ® malnpipe Jee/shadow file we cannot reat Jete/shaden: can't open, Peraission denied solaris # matmpipe jno/euch/file 4 nonexistent file Joo/such/File: can't open, Ne such file or directory Full-Duplex Pipes ‘We mentioned in the previous section that some systems provide full-duplex pipes: SVR4’s pipe function and the socketpair function provided by many kernels. But what exactly does a full-duplex pipe provide? First, we can think of a half-duplex pipe as shown in Figure 4.11, a modification of Figure 42, which omits the process. try 8S alttupln pipe >|} *S8 eft Figure4.11 Half'duplex pipe A full-duplex pipe could be implemented as shown in Figure 4.12. This implies that only one buffer exists for the pipe and everything written to the pipe (on either descrip tor) gets appended to the buffer and any read from the pipe (on either descriptor) just takes data from the front of the buffer. eek (peaeeaeeerecraseette | leeeat a =} satduplex fe pa ae read st write du Figure 412. One posible incorec implementation ofa fll duplex pipe. ‘The problem with this implementation becomes apparent in a program such as Fig- ure A29. We want two-way communication but we need two independent data streams, one in each direction. Otherwise, when a process writes data to the full-duplex. pipe and then turns around and issues a read on that pipe, it could read back what it Just wrote. Figure 4.13 shows the actual implementation of a full-duplex pipe. pan EE aE SS ay ea {Le mitcpierpipe gree [Figure 413. Actual implementation of full duplex pipe Here, the full-duplex pipe is constructed from two half-duplex pipes. Anything written, Section 4.4 FullDuplex Pipes 51 to ff1] will be available for reading by fill0], and anything written to fi{0} will be avail- able for reading by fit ‘The program in Figure 4.14 demonstrates that we can use a single full-duplex pipe for two-way commonication. T Finelude —"unpipe. hr ipeftuplexe 2 Ane 3 main(int argc, char ‘tergv) ac Sant rata. mi 6 cher cr 7 plat chiiapta: s Pipettes J agounes a ful1-duplex pipe (e.g. svt) */ 9 48 ( (chinepia = Forkin) == 0) C/* emt #7 10 sleep}: n SE ( (nw ReaQifat0l, ge. 10) t= a err_quit ("childs read returned 84", n); 3 printf (venild read ee\nt, ¢)7 Fr Weicecratol, -c. 10 5 exit (0) wo} ” J+ parent */ 48 mriteceatt), spr. 11: 19 $f (in = Reaa(fali}, ee, 1 I= 20 err_quit (parent: read returned at, ml; 21 prinef(-parert read tein", 17 22 exitioy 2 P * $$ pipetftopexe Figure 414. Testa full-duplex pipe for two-way communicsion. We create a full-duplex pipe and fork. The parent writes the character p to the pipe, and then reads a character from the pipe. The child sleeps for 3 seconds, reads a character from the pipe, and then writes the character c to the pipe. ‘The purpose of the sleep in the child is to allow the parent to call read before the child can call read, to see whether the parent reads back what it wrote, If we run this program under Solaris 2.6, which provides full-duplex pipes, we observe the desired behavior. solaris ¢ féuplex child reas p parent read ¢ ‘The character p goes across the half-duplex pipe shown in the top of Figure 4.13, and the character c goes across the half-duplex pipe shown in the bottom of Figure 4.13. ‘The parent does not read back what it wrote (the character p). If we run this program under Digital Unix 4.0B, which by default provides half duplex pipes (it also provides full-duplex pipes like SVR&, if different options are speci- fied at compile time), we see the expected behavior of a half-duplex pipe. 2 Pipes and FIFOs Chapter 4 45 ‘alpha + sauplex read error: Sad file miner aipha # cniad read p weite error: bad file number ‘The parent writes the character p, which the child reads, but then the parent aborts when it tries to read from fif1J, and the child aborts when it tries to write to flO! (recall Figure 4.11). The error returned by read is FRADE, which means that the descriptor is not open for reading, Similarly, write returns the same etror if its descriptor is not open for writing, popen and pclose Functions As another example of pipes, the standard I/O library provides the poper: function that creates a pipe and initiates another process that either reeds from the pipe or writes to the pipe, Minclude FILE ‘*pepen(const char ‘command, const char “type): Returns: le pointer if OK, nutt.on error int pelese (FILE ‘stream) ; Returns: termination satus of shell or “1 on error command isa shell command line. It is processed by the sh program (normally a Bourne shell), $0 the PATH environment variable is used to locate the command. A pipe is cre- ated between the calling process and the specified command. The value returned by popen is a standatd I/O FILE pointer that is used for either input or output, depend- ing on the character string type. ‘+ le typeis x, the calling process reads the standard output of the commana + Tetype is.w, the calling process writes to the standard input of the command. ‘The pcLose function closes a standard I/O stream that was created by popen, waits for the command to terminate, and then returns the termination status of the shell. ‘Section 14.8 of APUE provides an implementation of poper and pelese. Example Figure 4.15 shows another solution to our client-server example using the popen func- tion and the Unix cat: program. Section 4.5 open and pclose Functions 53 T Winelude vonpipe Feta 3 main(int aruc, char **argv) ac 3 sizet ny 6 char bueerter.tne), command (ec.tme) 7 FILE tp; e /* reed pathnane */ 9 Focte(butl, MOXLINE, stdin); 30 n= strlon(bors); 1 fyota() guarantees null kyte at ond */ 31 Gf dbutetn - 1) ae Any 2 nea /* delete newline from fgets) */ 13° anprint#(ccomand, sizeof (comand). “cat =", buff) 14 Ep = Popentconmand, *r"); as 7? copy fron pipe to standard output */ Ae white (egets(outf, MAXLINE, fp) != MULL) 7 puts (buf, staour); 1e— Petosettp) 19 exits 204 - Pipe/maixpopene Figure 4.5 Client-server using popen, ‘The pathname is read from standard input, as in Figure 4.9. A command is built and passed to popen. The output from either the shell or the cat program is copied to standard output. (One difference between this implementation and the implementation in Figure 4.8 is that now we are dependent on the error message generated by the system's cat. pro- gram, which is often inadequate. For example, under Solaris 2.6, we get the following ‘error when trying to read a file that we do not have permission to read: solaris cat /ete/shadow cats cannot open /etc/ shadow But under BSD/OS 3.1, we get a more descriptive ertor when trying to read a similar file: Ded * cat saec/mancer pase eat: /ete/aster.passwd: canmct open [Permission denied] Also realize that the call to popen succeeds in such a case, but fgets just returns an end-of-file the first time itis called. The cat: program writes its error message to stan- dard error, and popen does nothing special with it—only standard output is redirected to the pipe that it creates. 54 Pipes and FIFOs Chapter 4 46 FIFOs Pipes have no names, and their biggest disadvantage is that they can be used only between processes that have a parent process in common. Two unrelated processes can- not create a pipe between them and use it for IPC (ignoring descriptor passing). FIFO stands for first in, first out, and a Unix FIFO is similar to a pipe. Itis a one-way (half-duplex) flow of data. But unlike pipes, a FIFO has a pathname associated with it, allowing unrelated processes to access a single FIFO. FIFOs are also called name pipes. A FIFO is created by the mk#ifo function, include Ant mkEifo(const char ‘pathname, medat male) Returns: Of OK, -1 on error ‘The pathname isa normal Unix pathname, and this is the name of the FIFO. ‘The miode argument specifies the file permission bits, similar to the second argument to open. Figure 24 shows the six constants from the header used to specify these bits for a FIFO. ‘The mk£ fo function implies 0_CREAT | 0 EXCL. That is, it creates a new FIFO or retums an error of EEXIS? if the named FIFO already exists. If the creation of a new. FIFO is not desired, call open instead of wkEifo. To open an existing FIFO or create a new FIFO if it does not already exist, call mk#i fo, check for an error of #EXIST, and if, this occurs, call open instead ‘The mk fifo command also creates a FIFO. This can be used from shell scripts or from the command line. ‘Once a FIFO is created, it must be opened for reading or writing, using either the open function, or one of the standard I/O open functions such as foren. A FIFO must be opened either read-only or write-only. It mast not be opened for read-write, because 28 FIFO is half-duplex. ‘A write toa pipe or FIFO always appends the data, and a read always returns hat is at the beginning of the pipe or FIFO. If Lseek is called for a pipe or FIFO, the error ESPIPS is returned. Example We now redo our client-server from Figure 4.8 to use two FIFOs instead of two pipes. Our client and server functions remain the same; all that changes is the main. func tion, which we showin Figure 4.16, 1 ¥inelude — "onpipe.h” a cae iprclnee tc 2 Aeefine FIFoL */tmp/tito.1 43 #éefine FIFO? */tmp/tito.2° 4 void client(int, int), ge jer(int, int; Section 4.6 FIFOs 55 5 ine 6 main(int arge, char *arsyy 70 6 int reacted, writefa; S pidt chilepiay 10 [> create tuo FIROs; OK if they alreagy exist */ 11 Af ((ekELfo(PTFOL, FILE-MODE) < 0) && (errno I= EXTST)) 2 err_eys("can't create ts", FIFOL}: 33 Af (mkFifo(erro2, FILE_MCDE) < 0) Gu (erro [= EEXIST)) ( ad uplink(e1e0r) ; rey erz_sys("can't create $s", FIFO); 1) 37 Af ( (ehilepid = Fork() = 0) ¢ _/* enila */ 18 readtd = Open(FIFOL, O.REOMLY, 0)7 19 weitefa = Open(PIFOZ, © HRONLY, 0); 20 server (readta, weiteta); aL exit (0)? 2) 23 J parent */ 24 © writefa ~ Open(FIFOL, 0 WAOMLY, 0); 23 reacfe = Open(FIFO2, 0 RDGULY, 0); 26 client (readté. writefal; 27 Waitpidichilepia, aULL, 0); /* walt for child to terminate */ 28 © Close(readsa) 25 Chonw(urieate) 30 Cnlink(rrFo1) ; 31 Unlink(rreo2) 32 exie(oy: 33) Pipelminffoe Figure 4.16 air function for our client-server that uses two FIFOs. Create two FIFOs ‘Two FIFOs are created in the /tnp filesystem. If the FIFOs already exis, that is OK. ‘The FILE_NODE constant is defined in our unpipe.h header (Figure C.1) as fefine FILE MODE (S_IRUSR | S_IWUSR |S IRGRP | s_rmoTm) J @efault perissions for new flee */ ‘This allows user-read, user-write, group-read, and other-read. ‘These permission bits are ‘modified by the fle mode creation mask of the process. fork We call fork, the child calls our server function (Figure 4.10), and the parent calls, our client function Figure 4.9). Before executing these calls, the parent opens the first FIFO for writing and the second FIFO for reading, and the child opens the first FIFO for reading and the second FIFO for writing, This is similar to our pipe example, and Fig- ure 4.17 shows this arrangement. 56 Pipes and FIFOs Chapter 4 parent hit sort} — vite Jemp/£ito.1 HIROT Stawordaa /emp/tito.2 FIFO2 be Towordma = Figure 427 Client-server example using two FIFOs, ‘The changes from our pipe example to this FIFO example are as follows: * To create and open a pipe requires one call to pipe. To create and open a FIFO requires one call to mk£ i fo followed by a call to open. * A pipe automatically disappears on its last close. A FIFO’s name is deleted from the filesystem only by calling un}ink. The benefit in the extra calls required for the FIFO is that a FIFO has a name in the file system allowing one process to create a FIFO and another unrelated process to open the FIFO. This is not possible with a pipe. Subtle problems can occur with programs that do not tse FIFOs correctly. Consider Figure 4.16: if we swap the order of the two calls to open in the parent, the program. does not work, The reason is that the open of a FIFO for reading blocks if no process currently has the FIFO open for writing. If we swap the order of these two opens in the parent, both the parent and the child are opening a FIFO for reading when no process has the FIFO open for writing, so both block. This is called a deadlock. We discuss this scenario in the next section. Example: Unrelated Client and Server In Figure 4.16, the client and server are still related processes. But we can redo this example with the dient and server unrelated. Figure 4.18 shows the server program. ‘This program is nearly identical to the server portion of Figure 4.16. ‘The header £3 f0.h is shown in Figure 4.19 and provides the definitions of the two FIFO names, which both the client and server must know. Figure 4.20 shows the client program, which is nearly identical to the client portion of Figure 4.16. Notice that the client, not the server, deletes the FIFOs when done, because the client performs the last operation on the FIFOs. Section 46 FIFOs 57 1 Hincluce °fifo.b ie server_maine 2voia —server(ine, int); 3 ine 4 main(int arge, char *targv) 5¢ 6 Ant readfd, writesdy 7 1% cxeate two FIFOs; OK if they alzeady exist */ & Af ((mkFigo(wxPoL, FILE voDE) < 0) bk (errno 1= FEXIST)) 3 err_sys("can‘t create ts", PIFOL); 104 ((@KELEOUFTFO2, PILE MODE)’ < 0) && (exeno I= EEKTST)) ( a unl irk (FEFOL) + 12 erx_sys(*can't create ts", FIFO2); fai 9 14 reaefa = Open(FIFOL, O_RDOULY, 0); 15 writefa = Open(FIFO2, _HRONLY, 0); 16 server{readfé, writefay: 17 exte(o) 18) ipejserver_main.e Figure 416 Stand-alone server mat function. he T Hinelude -unpipe. 5 Pept 2 Yewtine FIFOL —*/emp/tito.1~ 3 teetine FIFO? */tmp/#ifo.2" pipelfifohc Figure 419 £10. header that both the client and server indude ipelclient_maine 7 Finclade “fifo. Piel 2 void client (int, int); 3 ine 4 main(int arge, char **argv) 5st 6 ane eaata, writeta; 7 writeta = open(FIFOL, o_NREELY, 0); 8 readtd = open (FZFO2, “0 RDORLY, 0) 9 client (raadta, weitetay; 10 Close(readta + 11 Close(uriteta) : 12 Unlink(rtron) 13 Untink(rtro2): a4 exit ia); as) plpefclient_maine Figure 420 Standalone clint rats function. 58 Pipes and FIFOs Chapter 4 Inthe ease ofa pipe cr FIFO, where the kernel Keeps 2 reference count of the number of open descriptors that ree othe piper FIFO, ether the cic or server could call un ic without a problem. Even though tis fanetion removes the pathname from the filesystem, this does nat stot cpen descriptors that had previously opened the pathname, But for ether forms of 1°, such as System V message queues, no counter evs and Hf theserver wore to delete the uous tier writing ts ial message to the queue the queue could be gone when the cent te read the final message ‘Torun this client and server, start the server in the background & server_tito & and then start the client, Alternately, we could start only the client and have it invoke the server by calling fork and then exec. The client could also pass the names of the ‘two FIFOs to the server as command-line arguments through the exec function, instead of coding them into a header. But this scenario would make the server a child of the client, in which case, a pipe could just as easily be used 4.7 Additional Properties of Pipes and FIFOs We need to describe in more detail some properties of pipes and FIFOs with regard to their opening, reading, and writing, First, a descriptor can be set nonblocking in two ways. 1. The 0_NONBLOCK flag can be specified when open is called. For example, the first call to oper: in Figure 420 could be weitefd = oper(FTPO1, O_WROWEY | o_xeRELCCE, 0); 2. Ifa descriptor is already open, font can be called to enable the O_NONBLOCK flag, This technique must be used with a pipe, since open is not called for a pipe, and no way exists to specify the O_NONELOCK flag in the call to pipe. ‘When using font, we first fetch the current file status flags with the F_GETFL command, bitwise-OR the 0_NONBLOCK flag, and then store the file status flags with the F_SETPL command: ane flage, Ar ( (fags = fenti(#4, FGETFL, 09) < 0) err_eys (°F GETFL error"): flags |= 0 -NoNBLOcK; Af (font (te, P_SETFL, flags) < 0) ferr_sys("F_SEIFL arzer"); Beware of code that you may encounter that simply sets the desired flag, ‘because this also clears all the other possible file status flags: [7 wrong wy to set nonblocking */ Af (font (Fe, F_SETEL, o_NeMBLOCE) <0) ‘err_eys(°F_SETPL error"); Section 47 Additional Properties of Pipes and FIFOs 58 Figure 4.21 shows the effect of the nonblocking flag for the opening of a FIFO and for the reading of data from an empty pipe or from an empty FIFO. empty FIFO | pipear FIFO not | read retumsO dor Curent | Bxstingopens | Faure epention | ofpipeor FO Blocking tant S_NORBLECKt FO euursOK aun OK even FIFO | open for wating, | redenly | FIFO not ‘Beds uni FIFOs opener | ratums OR open for writing | wating Hate FO ‘ehares OK ‘turns OK open FIFO | open forreeding wrte-only FIFO not dks nil FIFO m opened or | wtormsan ero oF TO openforneading | reading Pipeor FEO | Blocks untildataisin the pspeor ] stores an oral ESCA ead pen for wnting | FEO, oruntilehe piper empty pipe FIFOs no longer open for o swnting _open for writing weitere — | openfor reading PipeorFIFO [pipe orFiFOuct | STGPIPR generated for thread | SIGPTPE generated for Hira PipeorTIFO | Geetexd open for reading Figure 4.21, Effet of ¢_NeNBCCCK flag on pipes an FIFOS Note a few additional rules regarding the reading and writing of a pipe or FIFO. + Ifwe ask to read more data than is currently available in the pipe or FIFO, only the available data is returned. We must be prepared to handle a return value from read that is less than the requested amount. + If the number of bytes to write is less than or equal to PIPE_BUF (a Posix limit that we say more about in Section 4.10, the write is guaranteed to be atomic. ‘This means that if two processes each write to the same pipe or FIFO at about the same time, either all the data from the first process is written, followed by all. the data from the second process, or vice versa. The system does not intermix the data from the two processes. If, however, the number of bytes to write is greater than PIPE_BUP, there is no guarantee that the write operation is atomic. osc requires that BYFE_EUP be at last 512 bytes. Commenly encounters! values range front 1024 for BSD/O5 41 to 5120 for Slats 26. We show a program in Soc ‘on 4.1 shat prinesthis value. * The setting of the 0_noNSLOCK flag has no effect on the atomicity of writestoa pipe or FIFO—atomicity is determined solely by whether the requested number Of bytes is less than or equal to PIPF_BUP. But when a pipe or FIFO is set non- blocking, the return value from wri te depends on the number of bytes to write « Pipes and FIFOs Chapter 48 and the amount of space currently available in the pipe or FIFO. If the number of bytes to write is less than or equal to PIPE_Et a. Ifthere is room in the pipe or FIFO for the requested number of bytes, all the bytes are transferred. b. If there is not enough room in the pipe or FIFO for the requested number of, bytes, return is made immediately with an error of ERGAIN. Since the ©_NONBLOCK flag is set, the process docs not want to be put to sleep. But the kernel cannot accept part of the data and still guarantee an atomic wei te, so the kernel must return an error and tell the process to try again later. If the number of bytes to wri te is greater than PEPE_BUE: If there is room for at least 1 byte in the pipe or FIFO, the kemel transfers whatever the pipe or FIFO can hold, and that is the return value from write, b. If the pipe or FIFO is full, return is made immediately with an error of EAGAIN. + If we write toa pipe or FIFO that is not open for reading, the STGPIPE signal is generated: a. If the process does not catch or ignore STGPIPE, the default action of termi- nating the process is taken. b. If the process ignores the STGPTPE signal, or if it catches the signal and returns from its signal handler, then write returns an error of EPTPE. [STOPIEE is considered a synchronous signal thet is, signa atibutable to one specific thread, the ape that called weite, But the easest way to handle this signal i to ignore it (et its disposttion to SIC_IGN) and let write return an frror of EPZPE. An application should always detect an ecror return from torte, but detecting the termination ofa process by STGPTEE is harder. If the Sgmal isnot caught, we must look atthe termination status ofthe process from the shell to determine thatthe proces as Killed by’ a Sigal, and which signal ‘Section 5.13 of UNPVI talks more about STCPTPE, One Server, Multiple Clients The real advantage of a FIFO is when the server is a long-running process (e.g., a dae- mon, as described in Chapter 12 of UNPv1) thet is unrelated to the ctient. The daemon creates a FIFO with a well-known pathname, opens the FIFO for reading, and the client then starts at some later time, opens the FIFO for writing, and sends its commands or whatever to the daemon through the FIFO. One-way communication of this form (client to server) is easy with a FIFO, but it becomes harder if the daemon needs to send. something back to the client, Figure 4.22 shows the technique that we use with our ‘example. ‘The server creates a FIFO with a well-known pathname, /tmp/£ifo.sery in this ample. The server will read client requests from this FIFO. Each client creates its own FIFO when it staris, with a pathname containing its process ID. Each client writes its Section 48 One Server, Multiple Clients 61 tay [eegTETtowery a Pewp/ #1001234 semprerco.9576 FIFO sdeny tient 2 PIDs Figure 422 One server, multiple clients request to the server's well-known FIFO, and the request contains the client process ID along with the pathname of the file that the client wants the server to open and send to the client. Figure 4.23 shows the server program. (Create well-known FIFO and open for read-only and write-only 118 The server's well-known FIFO is created, and it is OK if it already exists. We then open the FIFO twice, once read-only and once write-only. The reacf ifo descriptor is, used to read each client request that arrives at the FIFO, but the dunmy£d descriptor is never used. The reason for opening the FIFO for writing can be seen in Figure 4.21. If we do not open the FIFO for writing, then each time a client terminates, the FIFO becomes empty and the server's read returns 0 to indicate an end-offile. We would then have to close the FIFO and call open again with the 0_RDONLY flag, and this will block until the next client request arrives. But if we always have a descriptor for the FIFO that was opened for writing, reac will never retum () to indicate an end-of file when no clients exist. Instead, our server will just block in the call to read, waiting for the next client request. This trick therefore simplifies our server code and reduces the number of calls to open for its well-known FIFO. ‘When the server starts, the first open (with the 0_RDONLY flag) blocks until the first 0) ( ” Af Guffin = 1] w= in) 18 S (7 delete newline from readline() */ 2 buff (r] = "V0"; 7 null terminate pathnane */ 20 AE ( (ptr = strchrtbuff, © 1)) 22 mmm a ferr_msg("bogus request: ts", buff) 22 continue; 23 ) 24 wperts = 0; (/* cull terminate PID, ptr = pathname */ 25 pid ~ atol (buff); 26 snprintt (fiforane, sizeof (titonane), */trp/tito.€1é", (long) pid): 2 AE ( (writefifo = open(tifonane, O_WRONLY, 0)) <0) { 28 ‘err_nsg(*cannot open: ts", £1 fenamwe 23 continue; 30 d a AE ( (fd = opentptr, © RDGELY)) < 0) ¢ a2 Js error: must tel) client */ 3 suprintf (cuff +n, sizeof (buff) - n, *: can’t open, @s\n*, 36 strerror (erro) }; a5 f= atrlenipts}; 36 write (writetito, ptr, n): 37 Close (uriteeiso : 38 detec ¢ 39 /* open succeeded: copy file to FIFO */ 40 while ( (9 = Reed(fa, buff, VAXLINE)) > 0) a Write(writefife, ‘buff, 2); 2 chose ta) ; a Chose (writerite): 44 , 40 460 ffoctiserof mainservere Figure 425 FIFO server that handles mulipe cients. Section 4.8 (One Server, Multiple Clients 63 250 Parse client's request ‘The newline that is normally retuned by readline is deleted. This newline is missing only if the buffer was filled before the newline was encountered, or ifthe final line of input was not terminated by a newline. The strchr function returns pointer to the first blank in the line, and pex is incremented to point to the first character of the pathname that follows. The pathname of the client's FIFO is constructed from the pro- cess ID, and the FIFO is opened for write-only by the server. Open file for client, send file to client's FIFO ‘The remainder of the server is similar to our server function from Figure 4.10. The file is opened and if this fails, an error message is returned to the client across the FIFO. If the open succeeds, the file is copied to the client's FIFO. When done, we must close the server's end of the client’s FIFO, which causes the client's read to return 0 (end-offile). The server does not delete the client's FIFO; the client must do so after it reads the end-offile from the server. We show the client program in Figure 4.24 Create FIFO The client's FIFO is created with the process ID as the final part of the pathname, Build client request line ‘The client’s request consists of its process ID, one blank, the pathname for the server to send to the client, and a newline. This line is built in the array buff, reading the pathname from the standard input. Open server's FIFO and write request ‘The server's FIFO is opened and the request is written to the FIFO. If this client is, the first to open this FIFO since the server was started, then this open unblocks the server from its call fo open (with the 0_RDONLY flag). Read file contents or error message from server The server's reply is read from the FIFO and written to standard output. The lient’s FIFO is then closed and deleted. We can start our server in one window and run the client in another window, and it works as expected. We show only the client interaction. solaris 9 mainelient Jotc/ahadow 4 fle we cannot read yetc/stador: can't open, Permission denied solaric * mainelient Jete/inet /ntp.cont 4 2ine file ulticostclient 226.0.1.1 GrLftElle /ete/inet/ntp drift We can also interact with the server from the shell, because FIFOs have names in the filesystem. Pipes and FIFOs Chapters sero mainchiont.c T Winclude“Eife.h ‘Aloetiserop 2 ant 3 main(int arce, char **arov) ae 5 int xeadfifo, writetifor 6 cizet len: 2 asizet n 8 char “ptr, fifonavelMARLINE), buf (MAKLINE]; 5 plat pias 10 /* create FIFO with cur PID ae part of none */ pia = getpieny; 12 srprintf(fifonane, sizeof (tifonane), “/tmp/fito.81e", (Leng) pid + 1a GF ((mkeiEo(FSfoname, FILE MODE) < 0) Gk (errno {= FEXIST)) uw err eys(*can’t create 4s", £ifonane) ; as (+ stare butter with pid ard a blank */ 16 snprint# (batt, sizeof (buff), “tld *, (long) pid); 17 len = atxlen(bute) 18 oper = bute + ten: 1 /* read pathnane */ 20 Foetetper, MAMLINE ~ len, stein): 21 len = strien(butf); J fgers() guarantees null byte at end */ 22 /* oper: FIFO to server and write PID and pathname to FIFO */ 23 weitefifo = pen(SERV_PIFO, O.WRONLY, 0) 7 24 Write(weitefife, butt, len): 25 /* now open our FIFO; blocks until server opens for writing */ 26 readtifo = open(#ifonane, O_RDONLY, 0); 27 /* read from TEC, write to standara output */ 28 while ( (n= Read(readfifo, buff, MAXLINE)) > 0) 29 Welte(STDOUr_FILENO, Butt, n): 30 Close (readtito): 32 Unlink(fitoname) ; 32 exit (ole 3 7 {ifetisero|mainclientc Figure 4.24. FIFO client that works with the server in Figure 423. solaris § pides¢ process 1D of his sell solaris © mk£ifo /tap/Eite. $044 make the eens FIEO Solaris € echo "Pia /ete/inet /ntp.cont* > /tmp/fito.sery solaris © cat < /emp/tito.§Pld arn sero’ reply multicastelient 224.0.1.1 erifttsie /ete/inet tp dei fe solarie © rm /tmo/fifo.§pia We send our process ID and pathname to the server with one shell command (ecto) and read the server's reply with another (cat). Any amount of time can occur between these two commands, Therefore, the server appears to write the file to the FIFO, and the client later executes cat to read the data from the FIFO, which might make us think Section 48 One Server, Multiple Clients 65 that the data remains in the FIFO somehow, even when no process has the FIFO open. ‘This is not what is happening. Indeed, the rulc is that when the final close of a pipe ot FIFO occurs, any remaining data in the pipe or FIFO is discarded. What is happening in ‘our shell example is that after the server reacis the request line from the client, the server blocks in its call to oper: on the client's FIFO, because the client (our shell) has not yet ‘opened the FIFO for reading (recall Figure 4.21). Only when we execute cat sometime later, which opens the client FIFO for reading, does the server's call to open for this FIFO return. This timing also leads to a denial-of-seroice attack, which we discuss in the next section. Using the shell also allows simple testing of the server’s error handling. We can easily send a line to the server without a process ID, and we can also send a line to the server specifying a process ID that does not correspond to a FIFO in the /tmp directory. For example, if we invoke the client and enter the following lines solaris € ent > /tmp/tito.eery fnoferocest 999999 /invalid/process/ié then the server's output (in another window) is solaris § server ‘yne/process/id erp /£it6.999999 Atomicity of FIFO writes (Our simple client-server also lets us see why the atomicity property of writes to pipes and FIFOs is important, Assume that two clients send requests at about the same time to the server. ‘The first client's request is the line 2234 /ete/inot /atp.cont and the second client’s request is the line 9876 sete /vassw Ie we assume that cach client issties one write function call for its request line, and that each line is less than or equal to PTPE_BUF (which is reasonable, since this limit is usu- ally between 1024 and 5120 and since pathnames are often limited to 1024 bytes), then. ‘we are guaranteed that the data in the FIFO will be either 1234 jeteyinet mep-cont 9876 /ete/passd 9875 /ote/rasewa 1234 /ecesinee /atp-cont ‘The data in the FIFO will not be something like 1234 /ete/anet9876 /ete/panawd pavp.cont 6 Pipes and FIFOs Chapter FIFOs and NFS 49 FIFOs are a form of IPC that can be used on a single hest. Although FIFOs have names in the filesystem, they can be used only on local filesystems, and not on NFS-mounted filesystems, polaris # mktifo /née/bedi/uer/ratevens/tito.temp mRfifo: 1/0 excor In this example, the filesystem /nfs/bsdi /usr is the /usr filesystem on the host bsdi. Some systems (e.g,, BSD/OS) do allow FIFOs to be created on an NFS-mounted file- system, but data cannot be passed between the two systems through one of these FIFOs. In this scenario, the FIFO would be used only as a rendezvous point in the filesystem between clients and servers on the same host. A process on one host cannot send data to 2 process on another host through a FIFO, even though both processes may be able to ‘open a FIFO that is accessible to both hosts through NFS. lterative versus Concurrent Servers ‘The server in our simple example from the preceding section is an iterative server. Ititer- ates through the client requests, completely handling cach client's request before pro- ‘ceeding to the next client. For example, if two clients each send a request to the server at about the same time—the first for a 10-megabyte file that takes 10 seconds (say) to ‘send to the client, and the second for a 10-byte file—the second client must wait at least 10 seconds for the first client to be serviced. ‘The alternative is a concurrent server. The most common type of concurrent server under Unix is called a one-child-per-client server, and it has the server call £oxk to create a new child each time a client request arrives. ‘The new child handles the client request to completion, and the multiprogramming features of Unix provide the concurrency of all the different processes. But there are other techniques that are discussed in detail in Chapter 27 of UNPVI: * create a pool of children and service a new client with an idle child, ‘+ create one thread per client, and * create a pool of threads and service a new client with an idle thread. Although the discussion in UNP¥1 is for network servers, the same techniques apply to IPC servers whose clients are on the same host. Deniabof-Service Attacks We have already mentioned one problem with an iterative server—some clients must wait longer than expected because they are in line following other clients with Jonger roquests—but another problem exists. Recall our shell example following Figure 4.24 and our discussion of how the server blocks in its call to open for the client FIFO if the 0 */ char mesg data [MAXMESGDATA]: dM 12 seize + nosg_pend(int, struct mesg 13 vod Meeg_send(inc, struct myrosy *) 14 seize_t meeg_recv(iac, struct ayness *) 15 seive.t Meeg_recv(ine, struct aymesg *); ee Pipemesg|mesg i Figure 428. Ourmyneeg structure and related definitions, Each message has a mesg_type, which we define as an integer whose value must be greater than 0. We ignore the type field for now, but return to it in Chapter 6, when we describe System V message queues. Each message also has a length, and we allow the length to be zero. What we are doing, with the mymesg structure is to precede each mes- sage with its length, instead of using newlines to separate the messages. Earlier, we mentioned two benefits of this design: the receiver need not scan each received byte looking for the end of the message, and there is no need to escape the delimiter (a new- line) ifit appears in the message. Figure 4.26 shows a picture of the mymesg structure, and how we use it with pipes, FIFOs, and System V message queues. Section 4.10 Streams and Messages 68 second argument for write and reed second argument for mogend and negrew }_—— ses ten ——e| 1 meog_ten | mesg. tyne mesg. data, Spolem Vineige mage, used with Systom V message quits, nagend and megeev functors ‘Our message mymeeat, ‘use with pipes and FIFOs, ‘write and read functions Figure 4.26 Ourmynosa structure We define two functions to send and receive messages. Figure 4.27 shows our mesg_send function, and Figure 4.28 shows our mesg_recv function. pipemesg mesg. send I include “neeg.b 2 seize t 3 mesg cond(int £4, struct mymeeg *mper) at 5 return (write(fa, mptr, mSGHDRSIZE + mptr-smesg lee): 6) pipemesy/mese sende Figure (27 meeg_sena function piemess mesg rec.c “mong BP | 2 3 mesg recv(ine fa, struct mmeeg tants) 5 sicet len; 6 eeiset n; 7 J vead message header first, to get len of data that follow */ @ it C (n= Read(ta, mptr, MESGHDRSIZ=)} == 0) ° return (0 7? end of file */ 10 else if (n tm MESGHDRSIZE) a err quit(*message header: expected $4, got 44°, MESGIDRSTZE. nb: 12 Af { Gen = aptr-omesg_ten) > 0) a SE ( (n= Read(fa, mptr-smacg data, len)) != lent 44 err quit(mereage data: expocted 44, got td", len, n)y 15 return (len) 16} pees mesgrecoe Figure 428 seog_recv function 70 Pipes and FIFOs Chapter 1k now takes two reads for each message, one to read the length, and another to read the actual message (ifthe length is greater than 0). (Careful readers may note that mesg_vecv checks for all possible errors and terminates if one ‘occurs, Nevertheless, we sll define a wrapper function named Yeeg_ recy and calli fom four programs, for coneistency We now change our client and server functions to use the mesg_send and nosg_xecv functions. Figure 4.29 shows our client. pipemesg cliente i Hincluds-nesg.h ai 2 vole 43 client (int react, int writera) ac 5 sizer len: 6 seize t ns 7 struct myrasg mesgy 8 J+ read pathname */ 9 Foetislmeng.merg Gate, NAXMESGORTA, stdin} 10 en = strlen(nesg.mesg_data) ; 11 Af (weeg.weeg datalien ~ 2] == "\n') 2 ter 7* delete newline from foots) */ 33 mesg.mesg_len = Tent 14 mesg.mesg_type = 1; a3 /* weite pathnane te TEC channel */ 16 Mesg_sond(writoté, Lneeg? uv (* road fron TPC, write to standard output */ 18 while ( (= Mesg_recu(readfa, tnexg)) > 0) a9 write(sti0Ur FILANO, mesg.nesg_data, 1) 209 ipemesg/clientc Figure 429 Qur cient function that uses messages Read pathname, send to server e-ze The pathname is read from standard input and then sent to the server using mesg_sené. Read file's contents or error message from server vis Theclient calls mesg_recv in a loop, reading everything that the server sends back By convention, when mesc_recv returns a length of 0, this indicates the end of data from the server. We will see that the server includes the newline in each message that it sends to the client, so a blank line will have a message length of 1. Figure 4.30 shows our server. Section 4.10 Steams and Messages 71 Winches "neeg a Pees eae void server (int readfa, int weitefd) c ssizet ni sad pathrane from TPC channel */ snesg.mesg_type = 1: AE ( (n= Mesa recv(readfa, smesg)) err_quit (*pathnane missing") nosg.mesg_cataln] = ‘\O"; /* mull terminate patheare */ ou) 0 LE | (fp = fopen(nesg.mesg data, *1")) (> error: must tell client */ enpeintf(weeg-necg data +n, slzco! (noee.mesg_datal ~ ny st can't open, S6\n", strerrer (errno!) ; nesy.oeeg_len = strlen(sexg.netg_datal: Mesgsend(writeta, enesg) ; ‘ /* fopen eucceeded: copy ile to TEC channel */ while (Fgete(mecg.moeg_data, MAXMESGDATA, fp) I= MULL) ( neag.mesg_len = stvlen(nesg.nesg_data): Nesg_sendiwritefa, knesg! de , Felose fp [+ send a O-length message to signify the end */ smecg.mesg_len = 07 Mesg_fend(writets, smeeg): pipemesg [servere Figure 4.30 Our server function that uses messages, Read pathname from IPC channel, open file ‘The pathname is read from the client. Although the assignment of 1 to mesq_type appears useless (it is overwritten by mesg_recv in Figure 4.28), we call this same func- tion when using System V message queues (Figure 6.10), in which case, this assignment is needed (eg,, Figure 6.13). The standard I/O function fopen opens the file, which differs from Figure 4.10, where we called the Unix 1/O function open to obtain a descriptor for the file. ‘The reason we call the standard I/O librazy here is to call fgets to read the file one line at a time, and then send each line to the client as a message. Copy file to client 19-26 If the call to fopen succeeds, the file is read using fget's and sent to the client, one line per message. A message with a length of 0 indicates the end of the fie. n Pipes and FIFOs Chapter 411 When using either pipes or FIFOs, we could also close the IPC channel to notify the peer that the end of the input file was encountered. We send back a message with a length of 0, however, because we will encounter other types of IPC that do not have the concept of an end-oF file. ‘The main functions that call our client and server functions do not change at all, Wecan use either the pipe version (Figure 4.8) or the FIFO version (Figure 4.16). Pipe and FIFO Limits ‘The only system-imposed limits on pipes and FIFOs are OPEN_MAX the maximum number of descriptors open at any time by @ process (Posix requires that this be at least 16), and PIPE_BUF the maximum amount of data that can be written to a pipe or FIFO atomically (we described this in Section 4.7; Posix requires that this be at least 512). “The value of OPEN_MAX can be queried by calling the sysconé function, as we show shortly. It can normally be changed from the shell by executing the ulimit command (Bourne shell and KornSbell, as we show shortly) or the Limit command (C shel). It can also be changed from a process by calling the setrlimit function (described in detail in Section 7.11 of APUE). ‘The value of P1PF_BUF is often defined in the <1 imits -h> header, but it is consid ‘ered a pathname oariable by Posix. This means that its value can differ, depending on the pathname that is specified (for a FIFO, since pipes do not have names), because differ- ent pathnames cen end up on different filesystems, and these filesystems might have Gifferent characteristics. ‘The value can therefore be obtained at run time by calling, either pathcon or fpathconé. Figure 431 shows an example that prints these two limits. T fincas “onpipe-Br Bie pipecon 2 int 3 main(int arge, char argv) ac 5 Sf (arge t= 2) é ferr_quit "usage: pipeconf mgd_t nopen(const char *name, int oflag, J mede_t mode, gtruct mqlattr ‘atir */ ); Returns: message queue descriptor if OK,-I onenior

You might also like