Addison-Wesley Professional Computing Ser
Brian W. Kernighan, Consulting Editor
Ken Amold/John Peyton, A C User's Guide to ANSI C
David R. Butenhof, Programming with POSIX” Threads
Tom Cargill, C++ Programming Style
William R. Cheswick/Steven M. Bellovin, Firewalls and Internet Security: Repelling the Wily Hacker
David A. Curry, UNIX? System Security: A Guide for Users and System Administrators
Erich Gamma/Richard Helm Ralph Johnson/John Vlissides, Design Patterns: Elements of
Reusable Object-Oriented Software
Erich Gamma/Richard Helm/Ralph Johnson John Vlissides, Design Patterns CD: Elements of
Reusable Object-Oriented Software
David R. Hanson, C Interfaces and Implementations: Techniques for Creating Reusable Software
S. Keshav, An Engineering Approach to Computer Networking: ATM Networks, The Internet, and
the Telephone Network
John Lakos, Large-Scale C++ Software Design
Scott Meyers, Effective C+-+, Second Edition: 50 Specific Ways to Improve Your Programs and Designs
Scott Meyers, More Effective C++: 35 New Ways to Improve Your Programs and Designs
Robert B. Murray, C++ Strategies and Tactics
David R. Musser/Atul Saini, STL Tutorial and Reference Guide: C++ Programming with the
Standard Template Library
John K. Ousterhout, Tel and the Tk Toolkit
Craig Partridge, Gigabit Networking
J. Stephen Pendergrast Jr, Desktop KornShell Graphical Programming
Radia Perlman, Interconnections: Bridges and Routers
David M, Piscitello/ A. Lyman Chapin, Open Systems Networking: TCP/P and OSI
Stephen A. Rago, UNIX® System V Network Programming
Curt Schimmel, UNIX® Systems for Modern Architectures: Symmetric Multiprocessing i
Caching for Kernel Programmers
W. Richard Stevens, Adcanced Programming in the UNIX® Environment
W. Richard Stevens, TCP/IP Iilustrated, Volume 1: The Protocols
W. Richard Stevens, TCP/P Illustrated, Volume 3: TCP for Transactions, HTTP, NNT? =
UNIX® Domain Protacols
Gary R. Wright/W. Richard Stevens, TCP/IP Illustrated, Volume 2: The Imperse:Programming
with
POSIX® Threads
David R. Butenhof
a
vv
ADDISON-WESLEY _ -
‘An Imprint of Addison Wesley Longman, Inc.
Reading, Massachusetts * Harlow, England + Menlo Park, California
Berkeley, California * Don Mills, Ontario + Sydney
Bonn + Amsterdam + Tokyo * Mexico City‘Trademark acknowledgments:
UNIX is a registered trademark in the United States and other countries, licensed ex-
clusively through X/Open Company Ltd. Digital, DEC, Digital UNIX, DECthreads,
VMS, and OpenVMS are trademarks of Digital Equipment Corporation. Solaris,
SPARC, SunOS, and Sun are trademarks of Sun Microsystems Incorporated. SGI and
IRIX are trademarks of Silicon Graphies, Incorporated. HP-UX is a trademark of
Hewlett-Packard Company. AIX. IBM, and OS/2 are trademarks or registered trade-
marks of the IBM Corporation. X/Open is a trademark of X/Open Company Ltd. POSIX
is a registered trademark of the Institute of Electrical and Electronics Engineers, Inc.
Many of the designations used by manufacturers and sellers to distinguish their prod-
ucts are claimed as trademarks. Where those designations appear in this book and
Addison-Wesley was aware of a trademark claim, the designations have been printed
in initial caps or all caps.
‘The authors and publishers have taken care in the preparation of this book, but make
no expressed or implied warranty of any kind and assume no responsibility for errors
or omissions. No liability is assumed for incidental or consequential damages in con-
nection with or arising out of the use of the information or programs contained herein.
‘The publisher offers discounts on this book when ordered in quantity for special sales.
For more information, please contact
Corporate & Professional Publishing Group
Addison-Wesley Publishing Company
‘One Jacob Way
Reading, Massachusetts 01867
Library of Congress Cataloging-in-Publication Data
Butenhof, David R., 1956-
Programming with POSIX threads / David R. Butenhof.
. cm. -- (Addison-Wesley professional computing series)
Includes bibliographical references and index.
ISBN 0-201-63392-2 (pbk.)
1. Threads (Computer programs) 2. POSIX (Computer software
standard) 3, Electronic digital computers--Programming, 1. Title
I. Series,
QA76.76.T55B88 1997
005.4'32--de21 97-6835,
cp
Copyright © 1997 by Addison Wesley Longman, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted, in any form, or by any means, electronic. mechanical. photo.
copying, recording, or otherwise, without the prior consent of the publisher. Printed in
the United States of America. Published simultaneously in Canada,
‘Text printed on recycled and acid-free paper.
23456789 MA 00999897
2nd Printing Oster, 1997To Anne, Amy,
and
Alyssa.‘Quote acknowledgments:
American Heritage Dictionary of the English Language: page 1
ISO/IEC 9945-1:1996, © 1996 by IBEE: page 29.
Lewis Carroll, Alice's Adventures in Wonderlanct pages xv, 47, 70, 8, 97, 98, 106, 131,
142, 161, 189, 197. Reproduced by permission of Macmillan Children's Books.
Lewis Carroll, Through the Looking Glass: pages 1, 4, 8, 20, 25, 29, 35, 45, 172, 214,
241, 283, 290, 302. Reproduced by permission of Macmillan Children’s Books.
Lewis Carroll, The Hunting of the Snaric pages 3, 13, 28, 39, 120, 131, 134, 289, 967.
Reproduced by permission of Macmillan Children's Books.Contents
List of Example Programs . xii
Preface xv
Intended audience. ........ : sees XO
About the author xvi
Acknowledgments xvi
1 Introduction ............45 seeee 2
1.1 The “bailing programmers” . 3
1.2 Definitions and terminology. 4
1.2.1 Asynchronous 4
1.2.2 Concurrency 4
1.2.3 Uniprocessor and multiprocessor . 5
1.2.4 Parallelism . 5
1.2.5 Thread safety and reentraney 6
1.2.6 Concurrency control functions. . 7.
1.3. Asynchronous programming is intuitive.....-.. pee
1.3.1 . .. because UNIX is asynchronous aete eisai
1.3.2... because the world is asynchronous . ell
1.4. About the examples in this book . . 12
1.5 Asynchronous programming, by example... . ad
1.5.1 The baseline, synchronous version .. . . 214
1.5.2 Aversion using multiple processes .......--.+-+ 15
1.8.3 Aversion using multiple threads . . arene al
1.5.4 Summary ........ F : 19
1.6 Benefits of threading .. : ee 20
1.6.1 Parallelism ......... aati 20
1.6.2. Concurrency .. a eeneeaas 22
1.6.3 Programming model. Aegan e)
17 Costs of threading . ee ce eeees 25
1.7.1 Computing overhead...) sss sssveeees 26
1.7.2 Programming discipline 26
1.7.3. Harder to debug 27
1.8 To thread or not to thread? .... 66.0000 pias
1.9 POSIX thread concepts ....
1.9.1 Architectural overview : eee
1.9.2 Types and interfaces ...... 6.06. e eee +. 30Ac Contents
1.9.3. Checking for errors. . 31
2 Threads............... 35
2.1 Creating and using threads 35
2.2 The life ofa thread . 39
2.2.1 Creation, 40
2.2.2 Startup 41
2.2.3 Running and blocking . 42
2.2.4 Termination . 43
2.2.5. Recycling . . 44
3 Synchronization ............. 00... cee ee eee e es 48
3.1 Invariants, eritical sections, and predicates 45
3.2 Mutexes. . . a7
3.2.1 Creating and destroying a mutex 49
3.2.2 Locking and unlocking a mutex 52
3.2.2.1 Nonblocking mutex locks 58
8.2.3. Using mutexes for atomicity 61
3.2.4 Sizing a mutex to fit the job 62
3.2.5 Using more than one mutex 63
3.2.5.1 Lock hierarchy... . 63
8.2.5.2 Lock chaining. 70
3.3. Condition variables. - 70
3.3.1 Creating and destroying a condition variable 74
3.3.2. Waiting on a condition variable. 7
3.3.3. Waking condition variable waiters. . 81
3.3.4 One final alarm program . 82
3.4 Memory visibility between threads. . 88
4 A few ways to use threads foie
4.1 Pipeline...
4.2, Work Crew. 106
4.3 Client/Server. 120
5 Advanced threaded programming ... eee ASL
5.1 One-time initialization +131
5.2 Attributes objects 134
5.2.1 Mutex attributes. 7 135
5.2.2 Condition variable attributes: 137
5.2.3 Thread attributes . . - 138
5.3 Cancellation 142
5.3.1 Deferred cancelability. . 147
5.3.2 Asynchronous cancelability . » 150Contents
5.3.3 Cleaning up. . 154
5.4 Thread-specific data ... 161
5.4.1 Creating thread-specifc data. 163
5.4.2 Using thread-specific data . . 166
5.4.3. Using destructor functions . ++ +167
5.5 Realtime scheduling. .... = 172
POSIX realtime options... . : 173
‘Scheduling policies and priorities .. 174
Contention scope and allocation domain .
Problems with realtime scheduling .
Priority-aware mutexes . . een
5.5.5.1 Priority ceiling mutexes...
5.5.5.2 Priority inheritance mutexes
5.6 Threads and kernel entities...
5.6.1 Many-to-one (user level). .
5.6.2 One-to-one (kernel level)
5.6.3. Many-to-few (two level)
6 POSIX adjusts to threads.
6.1 fork .....
6.1.1 Fork handlers
6.2 exec
63 Process extt .
64 Sidio.......
644.1 flockfile and funtockfile .
6.4.2. getchar_unlocked and putchar_unlocked.
65 Thread-safe functions .....
5.1 User and terminal identification
2 Directory searching
3 String token ........
5.4 ‘Time representation .
6.5.5 Random number generation,
-181
gegag
aa ebe
210
212
-212
erala
213
6.5.6 Group and user database 213
6.6 Signals...... a 214
6.6.1 Signal actions . 215
6.6.2 Signal masks. . 216
6.6.3 pthread Kill........... 217
6.6.4 sigwait and sigwaitinfo . 227
6.6.5 SIGEV_THREAD . 230
666
Semaphores: synchronizing with a
signal-catching function ns
7 “Real code”.7 Contents
7.1 Extended synchronization ..... 2... 6ee eee cee vee eves 241
7.1.1 Barriers . 242
7.1.2. Read-write locks. ee eeeana)
7.2 Work queue manager....... = ee 1)
7.3: But what about existing libraries? .......ve eee eve. 283
7.3.1 Modifying libraries to be thread-safe 284
7.3.2. Living with legacy libraries 285
8 Hints to avoid debugging.......................289
8.1 Avoiding incorrect code... . . eae 200
8.1.1 Avoid relying on “thread inertia” Sone 291
8.1.2 Never bet your mortgage on a thread race egg:
8.1.3 Cooperate to avold deadlocks . 297
8.1.4 Beware of priority inversion . : - 299
8.1.5. Never share condition variables between |
Predicates .... cette 300
8.1.6 Sharing stacks and related memory _
corrupters .. seston 301
8.2 Avotding performance problems... 302
8.2.1 Beware of concurrent serialization ||... css 802
8.2.2 Use the right number of mutexes .. 303
8.2.2.1 Too many mutexes will not help... 304
8.2.3 Never fight over cache lines.......... cee. BOR
9 POSIX threads mini-reference .....
9.1 POSIX 1003.1¢-1995 options
9.2 POSIX 1003. 1c-1995 limits. .
9.3 POSIX 1003. 1c-1995 interfaces ..
9.3.1 Error detection and reporting
9.3.2. Use of void* type. .
9.3.3 Threads :
9.3.4 Mutexes .
9.8.5 Condition variabies,
9.3.6 Cancellation. ..........
9.3.7 Thread-specific data. .
9.3.8 Realtime scheduling. .
9.3.9 Fork handlers.
9.3.10 Sido.
9.3.11 ‘Thread-safe functions.
9.3.12 Signals........
9.3.18 Semaphores
10 Future standardization ....................-..-347
10.1 X/Open XSHS [UNIX98)........ cee 347Contents
10.1.1 POSIX options for XSHS
10.1.2 Mutex type ....--
10.1.3 Set concurrency level.
10.1.4 Stack guard size ..
10.1.5 Parallel 1/0
10.1.6 Cancellation points
10.2 POSIX 1003.1).
10.2.1 Barriers..........
10.2.2 Read-write locks ..
10.2.3 Spinlocks ....
10.2.4 Condition variable wait clock.
10.2.5 Thread abort
10.3 POSIX 1003.14 .
Bibliography .
Thread resources on the Internet.
Index...Example programs
sample.c part 1 sample info
alarm.c
alarm_fork.c.
alarn_thread.c part 1 definitions
alarm_thread.c part 2 alarm thread
alarm_thread.c part 3 main
thread_error.c.
errors-h .
lifecycle.c.
mutex_static.c.
mutex_dynamic.c.....
alarm_mutex.c part 1 definitions.
alarm_mutex.c part 2 alarm thread.
alarm_mutex.c part 3 main
trylock.c -.
backof£.c
cond_static.c..
cond_dynamic.c ..
cond-c
alarm_cond.c part 1 declarations.
alarm_cond.c part 2 alarm insert
alarm_cond.c part 3 alarm routine
alarm_cond.c part 4 main
pipe.c part 1 definitions
pipe.c part 2 pipe_send
pipe-c part 3 pipe_stage
pipe.c part 4 pipe_create .
pipe.e part 5 pipe_start, pipe result
pipe-c part 6 main
crew.c part 1 definitions ..
crew.c part 2 worker_routine
crew.c part 3 crew create .
crew.c part 4 crew_start
crew.c part 5 main
server.c part 1 definitions
server.c part 2 tty_server_routine .
server.c part 3 tty_server_request .
server.c part 4 client_routine.
xi
13
14
15
7
18
18
32
33
38
50
50
52
54
56
59
66
74
75
78
83
84
86
87
99
100
101
- 102
104
105
= 108
110
5
7
119
121
123
125
127Example programs
server.c part 5 main.. 129
Gaceie secrete 2-188
mutex_ater.c 2-136
cond_attr.c. 2.137
thread_attr-c +140
cancel.c 2145
cancel_disable.c .. 149
cance 2-152
cancel_cleanup.c .. 1156
cancel_subcontract.c. . 158
| once.c. : 164
ted_destructor.c .. 169
ached_attr.c 176
ached thread.c 179
atfork.c 201
flock. 205
putchar.c .. +208
getlogin.c eee eee aneeate 2i1
susp.c part 1 signal-catchingfunctions: 218
suep.c part 2 initialization . 220
susp.c part 3 thd suspend... 2-221
susp.c part 4 thd continue 1-223
susp. part § sampleprogran . 1225
Bigvait.c 2c cceeee eres 228
sigey_thread.c 232
semaphore_signal.c +238
barrier-h part 1 barrier_t 245
barrier.h part 2 interfeces 245
barrier.c part 1 barrier init 246
barrier.c part 2 barrier destroy .. 2247
barrier.c part 3 barrier_wait.. 250
barrier main.c +251
rvlock.b part 1 rwlock = 1255
rwlock.h part 2 interfaces 257
rwlock.c part 1 rwl_init . 1287
rwlock.c part 2 rwl_destroy
rwlock.c part 3 cleanup handle:
rwlock.c part 4 rwi_readlock . .
xwlock.c part 5 rwl_readtrylock
Ewlock.c part 6 rwl_readunlock.
xwlock.c part 7 rwl_writelock. .
xwlock.c part 8 rwl_writetrylock
rwlock.c part 9 rwi_writeunlock.
rwlock_main.c.
workg.h part 1 workqt.
= 264
+265,
:271workq-h part
workg-c part.
workg.c part
workg-c part
workg.c part
workg_main.c
inertia.c
2
1
2
3
4
interfaces,
workg_init.
workg_destroy .
workq_add.
workq server.
Example programs
- 272
272
274
: 275
277
~ 280
= 291Preface
S_|_
The White Rabbit put on his spectacles,
“Where shall! begin, please your Majesty?” he asked.
“Begin at he beginning,” the King said, very gravely,
‘and go on il you come fo the end: then stop."
Lewis Carol Alice's Adventures in Wonderiand
‘This book is about "threads" and how to use them, “Thread” is just a name for
a basic software “thing” that can do work on a computer. A thread is smaller,
faster, and more maneuverable than a traditional process. In fact, once threads
have been added to an operating system, a “process” becomes just data—address
space, files, and so forth—plus one or more threads that do something with all
that data,
With threads, you can build applications that utilize system resources more
efficiently, that are more friendly to users, that run blazingly fast on multiproces-
sors, and that may even be easier to maintain. To accomplish all this, you need
only add some relatively simple function calls to your code, adjust to a new way of,
thinking about programming, and leap over a few yawning chasms. Reading this
book carefully will, T hope, help you to accomplish all that without losing your
sense of humor.
‘The threads model used in this book is commonly called “Pthreads." or
“POSIX threads.” Or, more formally (since you haven't yet been properly intro-
duced), the POSIX 1003.1c-1995 standard. I'l give you a few other names later—
but for now, “Pthreads” is all you need to worry about.
‘As I write this, Sun's Solaris, Digital's Digital UNIX, and SGI's IRIX already
support Pthreads. The other major commercial UNIX operating systems will soon
have Pthreads as well, maybe even by the time you read this, including IBM's AIX
and Hewlett-Packard’s HP-UX. Pthreads implementations are also available for
Linux and other UNIX operating systems.
In the personal computer market, Microsoft's Win32 API (the primary pro-
gramming interface to both Windows NT and Windows 95) supports threaded
programming, as does IBM's OS/2. These threaded programming models are
quite different from Pthreads, but the important first step toward using them pro-
ductively is understanding concurrency, synchronization, and scheduling. The
rest is (more or less) a matter of syntax and style, and an experienced thread pro-
grammer can adapt to any of these models.xvi Preface
‘The threaded model can be (and has been) applied with great success to a
wide range of programming problems. Here are just a few:
+ Large scale, computationally intensive programs
+ High-performance application programs and library code that can take
advantage of multiprocessor systems
* Library code that can be used by threaded application programs
* Realtime application programs and library code
* Application programs and library code that perform 1/O to slow external
devices (stich as networks and human beings).
Intended audience
This book assumes that you are an experienced programmer, familiar with
developing code for an operating system in “the UNIX family” using the ANSI C
language. I have tried not to assume that you have any experience with threads
or other forms of asynchronous programming. The Introduction chapter provides
a general overview of the terms and concepts you'll need for the rest of the book.
Ifyou don't want to read the Introduction first, that's fine, but if you ever feel like
You're “missing something" you might try skipping back to get introduced.
Along the way you'll find examples and simple analogies for everything, In the
end I hope that you'll be able to continue comfortably threading along on your
own. Have fun, and “happy threading.”
About the author
1 have been involved in the Pthreads standard since it began, although 1
stayed at home for the first few meetings. I was finally forced to spend a grueling
week in the avalanche-proof concrete bunker at the base of Snowbird ski resort
in Utah, watching hard-working standards representatives from around the
world wax their skis. This was very distracting, because I had expected a stan-
dards mecting to be a formal and stuffy environment. As a result of this
misunderstanding, I was forced to rent ski equipment instead of using my own.
After the Pthreads standard went into balloting, I worked on additional thread
synchronization interfaces and multiprocessor issues with several POSIX work-
ing groups. | also helped to define the Aspen threads extensions, which were fast-
tracked into X/Open XSH5.
Thave worked at Digital Equipment Corporation for (mumble, mumble) years,
In various locations throughout Massachusetts and New Hampshire. I was one of
the creators of Digital's own threading architecture, and I designed (and imple-
mented much of) the Pthreads interfaces on Digital UNIX 4.0, I have been helping
people develop and debug threaded code for more than eight years.Preface xvii
My unofficial motto is “Better Living Through Concurrency.” Threads are not
sliced bread, but then, we're programmers, not bakers, so we do what we can.
Acknowledgments
‘This is the part where I write the stuff that I'd like to see printed, and that my
friends and coworkers want to see. You probably don't care, and I promise not to
be annoyed if you skip over it—but if you're curious, by all means read on.
No project such as this book can truly be accomplished by a single person,
despite the fact that only one name appears on the cover. I could have written a
book about threads without any help—I know a great deal about threads, and 1
am at least reasonably competent at written communication. However, the result
would not have been this book, and this book is better than that hypothetical
work could possibly have been.
‘Thanks first and foremost to my manager Jean Fullerton, who gave me the
time and encouragement to write this book on the job—and thanks to the rest of,
the DECthreads team who kept things going while I wrote, including Brian
Keane, Webb Scales, Jacqueline Berg, Richard Love, Peter Portante, Brian Silver,
Mark Simons, and Steve Johnson.
‘Thanks to Garret Swart who, while he was with Digital at the Systems
Research Center, got us involved with POSIX. Thanks to Nawaf Bitar who worked
with Garret to create, literally overnight, the first draft of what became Pthreads,
and who became POSIX thread evangelist through the difficult period of getting
everyone to understand just what the heck this threading thing was all about
anyway. Without Garret, and especially Nawaf, Pthreads might not exist, and cer-
tainly wouldn't be as good as it is. (The lack of perfection is not their
responsibility—that's the way life is.)
‘Thanks to everyone who contributed to the design of cma, Pthreads, UNIX98,
and to the users of DCE threads and DECthreads, for all the help, thought-pro-
voking discourse, and assorted skin-thickening exercises, inchiding Andrew
Birrell, Paul Borman, Bob Conti, Bill Cox, Jeff Denham, Peter Gilbert, Rick Greer,
Mike Grier, Kevin Harris, Ken Hobday, Mike Jones, Steve Kleiman, Bob Knighten,
Leslie Lamport, Doug Locke, Paula Long, Finnbarr P. Murphy, Bill Noyce. Simon
Patience. Harold Seigel, Al Simons, Jim Woodward, and John Zolnowsky.
Many thanks to all those who patiently reviewed the drafts of this book (and
even to those who didn't seem so patient at times). Brian Kernighan, Rich Stevens,
Dave Brownell. Bill Gallmeister, lan Ginzburg, Will Morse, Bryan O'Sullivan, Bob
Robillard, Dave Ruddock, Bil Lewis, and many others suggested or motivated
improvements in structure and detail—and provided additional skin-thickening
exercises to keep me in shape. Devang Shah and Bart Smaalders answered some
Solaris questions, and Bryan O'Sullivan suggested what became the “bailing pro-
grammers” analogy.xviii Preface
Thanks to John Wait and Lana Langlois at Addison Wesley Longman, who
waited with great patience as a first-time writer struggled to balance writing a
book with engineering and consulting commitments. Thanks to Pamela Yee and
Erin Sweeney, who managed the book’s production process, and to alll the team
(many of whose names I'll never know), who helped.
Thanks to my wife, Anne Lederhos, and our daughters Amy and Alyssa, for all
the things for which any writers may thank their families, including support, tol-
erance, and Just being there. And thanks to Charles Dodgson (Lewis Carroll), who
wrote extensively about threaded programming (and nearly everything else) in his
classic works Alice’s Adventures in Wonderland, Through the Looking Glass, and
The Hunting of the Snark.
Dave Butenhof
Digital Equipment Corporation
110 Spit Brook Road, ZKO2-3/Q18
Nashua, NH 03062
butenhof@zko.dec.com
December 19961 Introduction
“The time has come,” the Walrus said,
“To alk of many things;
(0f shoes—and ships-—and sealing wax—
(Of cabbages—and kings—
‘And why the sea is Boling hot—
‘And whether pigs have wings.”
Lewis Carol, Trough the Looking Glass
In a dictionary, you would probably see that one of several definitions for
“thread” is along the lines of the third definition in the American Heritage paper~
back dictionary on my desk: “Anything suggestive of the continuousness and
sequence of thread.” In computer terms, a thread is the set of properties that sug-
gest "continuousness and sequence” within the machine. A thread comprises the
machine state necessary to execute a sequence of machine instructions—the loca-
tion of the current instruction, the machine's address and data registers, and so
forth.
‘A UNIX process can be thought of as a thread, plus an address space, file
descriptors, and an assortment of other data. Some versions of UNIX support
“lightweight” or “variable weight” processes that allow you to strip some or all of
that data from some of your processes for efficiency. Now, whether you're using a
“thread” or a “lightweight process,” you still need the address space, file descrip-
tors, and everything else. So, you might ask, what's the point? The point is that
you can have many threads sharing an address space. doing different things. On
‘a multiprocessor, the threads in a process can be doing different things
simultaneously,
When computers lived in glass caves and were fed carefully prepared punch
cards, the real world outside could be kept waiting with no consequences more
severe than some grumbling programmers. But the real world doesn't do one
thing at a time, and gradually computers began to model that world by adding
capabilities such as multiprogramming, time sharing, multiprocessing, and, even-
tually, threads.
Threads can help you bring your application out of the cave, and Pthreads
helps you do it in a way that will be neat, efficient, and portable. This chapter
briefly introduces you to what you need to begin understanding and using
threads, Don't worry—the rest of the book will follow up on the details left dan-
aling in this chapter.
Section 1.1 presents the framework for a number of analogies that I will use to
explain threading as we go. There is nothing all that unusual in the brief story—
12 CHAPTER 1 Introduction
but hereafter you will understand when I talle about programmers and buckets,
which, otherwise, might seem mildly odd.
‘Section 1.2 defines some essential concepts and terms used in this book. The
‘most important of these concepts deserves a special introduction, which will also
Serve to demonstrate the convention with which various particularly important
Points shall be emphasized throughout this book:
Asynchronous:
Any two operations are “asynchronous when they can proceed
independentiy of each other
Section 1.8 describes how you already use asynchronous programming on a
Fegular basis, both as a UNIX programmer and user, and as a human being in the
Teal world, I wouldn't dare to claim that asynchronous programming is easy. but
the basic concepts it tries to model are so easy and natural that you rarely need
even to think about them until you try to apply them to software
Threads are, to some extent, just one more way to make applications asyn-
chronous. but threads have some advantages over other models that have been
used to build asynchronous applications, Section 1,5 will show you some of the
advantages as we apply various programming methods in several versions of
Simple alarm clock. You will get to see “threads in action’ right away, with a brict
description of the few Pthreads interfaces needed to build this simple application,
Armed, now, with a basic understanding of what threads are all about. you
can go on to Section 1.6, where we will explore some of the fundamental advan,
tages of a threaded programming model.
Although there are a lot of excellent reasons to use threads, there is a price to
be Paid Section 1.7 provides a counterpoint to the previous section by deseribing
some of the costs. What it botls down to, though, is simply that you need te lear
how the model works, and then apply it carefully. Its not as hard as some folke
would have you believe.
‘You have seen some of the fundamental benefits and costs, It may be obvious
that you do not want to rush out and put threads into every application or brary
you write. Section 1.8 asks the question “To thread, or not to thread?” and | wil
attempt to guide you toward determining the proper answer in various cases
You will know at that point what threads are, what they do, and when te use
them. Aside from brief examples, you haven't yet seen any detailed information
about the particular programming interfaces (APIs) that compose Pthreads, Sex,
tion 1.9 points out some of the basic landmarks of the Pthreads universe to get
You ortented before we plunge ahead. The most important part of this section Is
1.9.3, which describes the Pthreads model for reporting errors-—-which is some.
what different than the rest of UNIX and POSIX,‘The “bailing programmers” 3
dll
The “bailing programmers”
This wos charming, no doubt but they shortly found out
That the Captain they trusted so well
Had only one notion for crossing the ocean,
‘And that was fo fingle his bell.
Lewis Carol The Hunting of the Snark
Tree programmers sal cut to sea one fine day in a small boat They sail quite some cis
tance from shore, enjoying the sun and sea breeze, allowing the wind to carry them,
The sky darkens, and a storm strikes, Ine small boat fs tossed violently about, and when
the storm abates the programmers are missing thelr boat's sail and most of the mast. The
boat hos sprung a smal leak, and there sno land in ight.
The boat is equipped with food, water, oars, and a baling bucket, and the pro:
‘grammers set to work, One programmer rows, and monitors the accumulating water in
the bottom of the boat. The other programmers alternately sleep, watch the water
level. or scan the horizon for sight of land or another ship,
‘An idle progrommer may notice rising water in the boat. and begin balling. When
both idle programmers are awake, and become simultaneously conceined regarding
thelr increasing dampness, they may both lunge for the bailing bucket—but one wil
inevitably teach it fist,and the other will have to wait:
Ifthe rower decides that balling is required while both his companions sleep.@ nudge
is usualy sufficient to awaken a programmer, lowing the other to continue sleeping. But
ifthe rower is in a bad mood, he may resort to a loud yell, awakening both steeping pro-
grommers. While one programmer assumes the necessary duty, the other can try fo fall
asleep again.
When the rower tires. he can signal one of the other programmers to take over the
‘tosk, and immediately fal into @ deep seep waiting to be signaled in tum. In this woy,
they Jourey on for some time.
So, just what do the Bailing Programmers have to do with threads? I'm glad
you asked! The elements of the story represent analogies that apply to the
Pthreads programming model. We'll explore some additional analogies in later
sections, and even expand the story a little, but for now consider a few basics:
‘A programmer is an entity that is capable of independent activity. Our pro-
grammers represent threads. A thread is not really much like a
programmer, who, as we all know, is a fascinatingly sophisticated mixture
of engineer, mathematician, and artist that no computer can match. Still
as a representation of the “active element” in our programming model, it
will be sufficient.
‘The bailing bucket and the oars are “tokens” that can be held by only one
individual at a time. They can be thought of as shared data, or as synchro-
nization objects. The primary Pthreads synchronization object, by the way,
is called a mutex.4 CHAPTER 1 Introduction
‘Nudges and shouts are communication mechanisms associated with a syn-
chronization object, on which individuals wait for some condition,
Pthreads provides condition variables, which may be signaled or broadcast
to indicate changes in shared data state.
1.2 Definitions and terminology
“When | use a word,” Humpty Dumpty said, in rather a scomtul fone,
“it means just what | choose it fo mean—nelther more nor less.”
Lewis Carol rough the Looking Glass
‘This book will use several critical terms that may be unfamiliar to you unless
you've already had some experience with parallel or asynchronous programming.
Even if you are familiar with them, some of the terms have seen assorted and
even contradictory uses within research and industry, and that is clearly not
going to help communication. We need to begin by coming to a mutual agreement
regarding the meaning of these terms, and, since I am writing the book, we will
agree to use my definitions, (Thank you.)
1.2.1 Asynchronous
Asynchronous means that things happen independently (concurrently) unless
there's some enforced dependency. Life is asynchronous. The dependencies are
‘supplied by nature, and events that are not dependent on one another can occur
simultaneously. A programmer cannot row without the oars, or bail effectively
without the bucket—but a programmer with oars can row while another pro:
grammer with a bucket bails. Traditional computer programming, on the other
hand, causes all events to occur in series unless the programmer takes “extraor-
dinary measures” to allow them to happen concurrently.
‘The greatest complication of “asynchrony” has been that there's little advan-
tage to being asynchronous unless you can have more than one activity going at a
time. If you can start an asynchronous operation, but then you can do nothing
but wait for it, you're not getting much benefit from the asynchrony.
1.2.2 Concurrency
Concurrency, which an English dictionary will tell you refers to things hap-
ening at the same time, is used to refer to things that appear to happen at the
same time, but which may occur serially. Concurrency describes the behavior of
threads or processes on a uniprocessor system. The definttion of concurrentDefinitions and terminology 5
1.2.3
execution in POSIX requires that “functions that suspend the execution of the
calling thread shall not cause the execution of other threads to be indefinitely
suspended.”
Concurrent operations may be arbitrarily interleaved so that they make
progress independently (one need not be completed before another begins), but
concurrency does not imply that the operations proceed simultaneously. Never-
theless, concurrency allows applications to take advantage of asynchronous
capabilities, and “do work” while independent operations are proceeding.
‘Most programs have asynchronous aspects that may not be immediately obvi-
ous. Users, for example, prefer asynchronous interfaces. They expect to be able to
issue a command while they're thinking about it, even before the program has
finished with the last one. And when a windowing interface provides separate
windows, don't you intuitively expect those windows to act asynchronously?
Nobody likes a “busy” cursor. Pthreads provides you with both concurrency and
asynchrony, and the combination is exactly what you need to easily write respon-
sive and efficient programs. Your program can “wait in parallel” for slow 1/O
devices, and automatically take advantage of multiprocessor systems to compute
1m parallel.
Uniprocessor and multiprocessor
1.2.4
‘The terms uniprocessor and multiprocessor are fairly straightforward, but let's
define them just to make sure there's no confusion. By uniprocessor, I mean a
computer with a single programmer-visible execution unit (processor). A single
general-purpose processor with superscalar processing, or vector processors, or
other math or I/O coprocessors {s still usually considered a uniprocessor.
By multiprocessor, I mean a computer with more than one processor sharing a
common instruction set and access to the same physical memory. While the pro-
cessors need not have equal access to all physical memory, it should be possible
for any processor to gain access to most memory. A “massively parallel processor”
(MPP) may or may not qualify as a multiprocessor for the purposes of this book.
Many MPP systems do qualify, because they provide access to all physical mem-
ory from every processor, even though the access times may vary widely.
Parallelism
Parallelism describes concurrent sequences that proceed simultaneously. In
other words, software “parallelism” is the same as English “concurrency” and if-
ferent from software “concurrency.” Parallelism has a vaguely redeeming analogy
to the English definition: It refers to things proceeding in the same direction
Independently (without intersection)
True parallelism can occur only on a multiprocessor system, but concurrency
can occur on both uniprocessor and multiprocessor systems. Concurrency can6 CHAPTER 1 Introduction
1.25
occur on a uniprocessor because concurrency is, essentially, the illusion of paral-
Ielism. While parallelism requires that a program be able to perform two
computations al once, concurrency requires only that the programmer be able to
pretend that two things can happen at once.
Thread safety and reentrancy
“Thread-safe” means that the code can be called from multiple threads with.
out destructive results. It does not require that the code run efficiently in
multiple threads, only that it can operate safely in multiple threads. Most exist
ing functions can be made thread-safe using tools provided by Pthreads—
mutexes, condition variables, and thread-specific data. Functions that don't
require persistent context can be made thread-safe by serializing the entire func-
ton, for example, by locking a mutex on entry to the function, and unlocking the
mutex before returning. Functions made thread-safe by serializing the entire
function can be called in multiple threads—but only one thread can truly perform
the function at a time.
More usefully. thread-safe functions can be broken down into smaller critical
sections. That allows more than one thread to execute within the function,
although not within the same part. Even better, the code can be redesigned to
protect critical data rather than critical code, which may allow fully parallel exe-
cution of the code, when the threads don’t need to use the same data at the same
time.
The putchar function, for example, which writes a character into a standard
1/0 (stdo} buffer. might be made thread-safe by turning putchar into a critical
section. That is, putchar might lock a “putchar mutex,” write the character, and
then unlock the putchar mutex. You could call putchar from two threads, and no
data would be corrupted—it would be thread-safe. However, only one thread
could write its character at a ime, and the others would wait, even if they were
writing to different sidio streams.
The correct solution Is to associate the mutex with the stream, protecting the
data rather than the code. Now your threads, as long as they are writing to differ-
ent streams, can execute putchar in parallel. More importantly, all functions that
access a stream can use the same mutex to safely coordinate their access to that
stream,
‘The term “reentrant” is sometimes used to mean “efficiently thread-safe." That
is, the code was made thread-safe by some more sophisticated measures than
converting the function or library into a single serial region. Although existing
code can usually be made thread-safe by adding mutexes and thread-specific
data, it is often necessary to change the Interface to make a function reentrant.
Reentrant code should avoid relying on static data and, ideally, should avoid relt-
ance on any form of synchronization between threads,
Often, a function can avoid internal synchronization by saving state in a “con-
text structure” that is controlled by the caller. The caller is then responsible forDefinitions and terminology 7
1.2.6
any necessary synchronization of the data. The UNIX readdir function, for exam-
ple, returns each directory entry in sequence. To make readdir thread-safe, you
might add a mutex that readdir locked each time it was called, and unlocked
before it returned to the caller. Another approach, as Pthreads has taken with
readdir_r, is to avoid any locking within the function, letting the caller allocate a
structure that maintains the context of readdir_r as it searches a directory.
At first glance, it may seem that we're just making the caller perform what
ought to be the job of readdiz_r, But remember that only the caller knows how
the data will be used. If only one thread uses this particular directory context, for
example, then no synchronization is needed. Even when the data is shared
between threads, the caller may be able to supply more efficient synchronization,
for example, if the context can be protected using a mutex that the application
also uses for other data.
Concurrency control functions
Any “concurrent system" must provide a core set of essential functions that
you need to create concurrent execution contexts, and control how they operate
within your library or application. Here are three essential facilities, or aspects.
of any concurrent system:
1. Execution context is the state of a concurrent entity. A concurrent system
must provide a way to create and delete execution contexts, and maintain
thelr state independently. It must be able to save the state of one context
and dispatch to another at various times, for example, when one needs to
wait for an external event. It must be able to continue a context from the
point where it last executed, with the same register contents, at a later
time,
2. Scheduling determines which context (or set of contexts) should execute at
any given point in time, and switches between contexts when necessary.
3, Synchronization provides mechanisms for concurrent execution contexts to
coordinate their use of shared resources. We use this term in a way that is,
nearly the opposite of the standard dictionary meaning. You'll find a de!
tion much like “cause to occur at the same time,” whereas we usually
mean something that might better be expressed as “prevent from occurring
at the same time.” In a thesaurus, you may find that “cooperate” is a
synonym for “synchronize"—and synchronization is the mechanism by
which threads cooperate to accomplish a task. This book will use the term,
“synchronization,” though, because that is what you'll see used, almost
universally.
‘There are many ways to provide each of these facilities~but they are always
present in some form. The particular choices presented in this book are dictated
by the book’s subject—Pthreads. Table 1.1 shows a few examples of the three
facilities in various systems.s CHAPTER 1 Introduction
Execution context _ Scheduling ‘Synchronization
Real traffic automobile traffic lights and | turn signals and brake
signs lights
ee ee cate cesses a
UNIX process priority (nice) | wait and pipes
(before threads) |
Pthreads thread policy, priority | condition variables and
mutexes
TABLE 1.1 Execution contexts, schedulers, and synchronization
A system's scheduling facility may allow each thread to run until it voluntarily
yields the processor to another thread (‘run until block’), It may provide time-
sticing, where each thread is forced to periodically yield so that other threads may
run (‘round-robin’). It may provide various scheduling policies that allow the
application to control how each thread is scheduled according to that thread's
function. It may provide a “class scheduler” where dependencies between threads
are described so that, for example, the scheduler can ensure that members of a
tightly coupled parallel algorithm are scheduled at the same time.
‘Synchronization may be provided using a wide variety of mechanisms. Some
of the most common forms are mutexes, condition variables, semaphores, and
events. You may also use message passing mechanisms, such as UNIX pipes,
sockets, POSIX message queues, or other protocols for communicating between
asynchronous processes—on the same system or across a network. Any form of
communication protocol contains some form of synchronization, because passing
data around with no synchronization results in chaos, not communication,
The terms thread, mutex, and condition variable are the main topics of this
book. For now, it is enough to know that a thread represents an “executable thing”
on your computer. A mutex provides a mechanism to prevent threads from collid-
ing unexpectedly, and a condition variable allows a thread, once it has avoided
such a collision, to wait until it is safe to proceed, Both mutexes and condition
variables are used to synchronize the operation of threads,
1.3 Asynchronous programming is intuitive
“in most gardens,” the Tigerlly said,
“they make the beds 100 soft—so that the lowers are always asleep.”
This sounded a very good reason, and Alice wos quite
leased fo know i
“Inever thought of hat before!” she said,
Lewis Carrol Trough the Looking Glass
If you haven't been involved in traditional realtime programming, asynchro-
nous programming may seem new and different. But you've probably been usingAsynchronous programming is intuitive 9
1.3.1
asynchronous programming techniques all along. You've probably used UNIX, for
example, and, even as a user, the common UNIX shells from sh to ksh have been
designed for asynchronous programming. You've also been using asynchronous
“programming” techniques in real life since you were born.
Most people understand asynchronous behavior much more thoroughly than,
they expect, once they get past the complications of formal and restricted
definitions,
- because UNIX is asynchronous
In any UNIX system, processes execute asynchronously with respect to each
other, even when there is only a single processor. Yes, until recently it was diffi-
cult to write individual programs for UNIX that behaved asynchronously—but
UNIX has always made it fairly easy for you to behave asynchronously. When you
type a command to a shell, you are really starting an independent program—if
you run the program in the background, it runs asynchronously with the shell
When you pipe the output of one command to another you are starting several
independent programs, which synchronize between themselves using the pipe.
1 Time iso synchronization mechanism,
In many cases you provide synchronization between a series of processes your:
self, maybe without even thinking about It. For example, you run the compiler
after you've finished editing the source files. It wouldn't occur to you to compile
them first, or even at the same time. That’s elementary real-life synchronization.
| UNIX pipes and fies can be synchronization mechanisrs.
In other cases you may use more complicated software synchronization mech-
anisms. When you type “1s|more” to a shell to pass the output of the 1s
command into the moxe command, you're describing synchronization by specily-
ing a data dependency. The shell starts both commands right away, but the more
command can’t generate any output until it recetves input from 1s through the
pipe. Both commands proceed concurrently (or even in parallel on a multiproces
sor) with 1s supplying data and more processing that data, independently of each
other. If the pipe buffer is big enough, 1s could complete before more ever started;
but more can't ever get ahead of 1s.
‘Some UNIX commands perform synchronization internally. For example, the
command “ce -o thread thread.c might involve a number of separate pro-
cesses. The ce command might be a “front end” to the C language environment,
which runs a filter o expand preprocessor commands (like #include and #1), a
compiler to translate the program into an intermediate form, an optimizer to reor-
der the translation, an assembler (o translate the intermediate form into object
language, and a loader to translate that into an executable binary file. As with10 CHAPTER 1 Introduction
1s|more, all these programs may be running at the same time, with synchroniza-
tion provided by pipes, or by access to temporary file.
UNIX processes can operate asynchronously because each process includes
all the information needed to execute code, The operating system can save the
state of one process and switch to another without affecting the operation of
either. Any general-purpose asynchronous “entity” needs enough state to enable
the operating system to switch between them arbitrarily. But a UNIX process
includes additional state that is not directly related to “execution context.” such
as an address space and file descriptors.
A thread is the part of a process that's necessary to execute code. On most
computers that means each thread has a pointer to the thread’s current instruc-
tion (often called a “PC” or “program counter’), a pointer to the top of the thread's,
stack (SP), general registers, and floating-point or address registers if they are
kept separate. A thread may have other things, such as processor status and
coprocessor control registers. A thread does not include most of the rest of the
state associated with a process; for example, threads do not have their own file
descriptors or address space. All threads within a process share all of the files,
and memory, including the program text and data segments.
1 Threads are “simpler” than processes.
You can think of a thread as a sort of “stripped down” process, lean and mean
and ready to go. The system can switch between two threads within a process
much faster than it can switch between processes. A large part of this advantage
comes from the fact that threads within a process share the address space—code.
data, stack, everything,
When a processor switches between two processes, all of the hardware state
for that process becomes invalid. Some may need to be changed as part of the
context switch procedure—data cache and virtual memory translation entries
may be flushed, for example. Even when they do not need to be flushed immedi-
ately, however, the data is not useful to the new process. Each process has a
separate virtual memory address space, but threads running within the same
process share the virtual address space and all other process data.
‘Threads can make high-bandwidth communication easier between indepen-
dent parts of your program. You don't have to worry about message passing
mechanisms like pipes or about keeping shared memory region address refer-
ences consistent between several different address spaces. Synchronization is
faster, and programming is much more natural. If you create or open a file, all
threads can use it. Ifyou allocate a dynamic data structure with malloc, you can
pass the address to other threads and they can reference it. Threads make it easy
to take advantage of concurrencyAsynehronous programming is intultve un
1.3.2
because the world is asynchronous
‘Thinking asynchronously can seem awkward at first, but itl become natural
with a little practice. Start by getting over the unnatural expectation that every:
thing will happen serially unless you do something “unusual.” On a one-lane
road cars proceed one at a time—but on a two-lane road two cars go at once. You
‘can go out for a cup of coffee, leaving your computer compiling some code and
fully expecting that it will proceed without you. Parallelism happens everywhere
in the real world, and you expect it.
‘Arrow of cashiers in a store serve customers in parallel; the customers in each
line generally wait their turn. You can improve throughput by opening more lines,
as long as there are registers and cashiers to serve them, and enough customers
to be served by them. Creating two lines for the same register may avoid confu-
sion by keeping lines shorter—but nobody will get served faster. Opening three
registers to serve two customers may look good, but itis just a waste of resources.
In an assembly line, workers perform various parts of the complete job in par-
allel, passing work down the line. Adding a station to the line may improve
performance if it parallels or subdivides a step in the assembly that was so con
plicated that the operator at the next station spent a lot of time waiting for each
plece. Beware of improving one step so much that it generates more work than
the next step on the assembly line can handle.
In an office. each project may be assigned to a “specialist.” Common special
les include marketing, management, engineering, typing pool, sales, support,
and so forth. Each specialist handles her project independently on behalf of the
customer or some other specialist, reporting back in some fashion when done.
Assigning a second specialist to some task, or defining narrower specialties (for
example, assigning an engineer or manager permanently to one product) may
improve performance as long as there's enough work to keep her busy. If not,
some specialists play games while others’ in-baskets overflow.
Motor vehicles move in parallel on a highway. They can move at different
speeds, pass each other, and enter and exit the highway independently. The driv-
ers must agree to certain conventions in order to avoid collisions. Despite speed
limits and traffic signs, compliance with the “rules of the road” is mostly volun-
tary. Similarly, threads must be coded to “agree” to rules that protect the program,
or risk ending up undergoing emergency debugging at the thread hospital
Software can apply parallelism in the same ways you might use it in real life,
and for the same reasons. When you have more than one “thing” capable of doing
‘work, you naturally expect them to all do work at the same time. A multiproces-
sor system can perform multiple computations, and any time-sharing system can
perform computations while waiting for an external device to respond. Software12 CHAPTER 1 Introduction
14
parallelism is subject to all of the complications and problems that we have seen
in real life—and the solutions may not be as easy to see or to apply. You need
enough threads, but not too many; enough communication, but nat too much. A.
key to good threaded programming {s learning how to judge the proper balance
for each situation.
Each thread can process similar parts of a problem, just like supermarket
cashiers handling customers. Each thread can perform a specific operation on
each data item in turn, just like the workers on an assembly line, Each thread
can specialize in some specific operation and perform that operation repeatedly
on behalf of other threads. You can combine these basic models in all sorts of
ways: for example, in parallel assembly lines with some steps performed by a pool
of servers.
‘As you read this book you'll be introduced to concepts that may seem unfamil-
jar: mutexes, condition variables, race conditions, deadlocks, and priority
inversions. Threaded programming may feel daunting and unnatural. But I'll
explain all those concepts as we move through this book, and once you've been
writing multithreaded code for a while you may find yourself noticing real-world
analogies to the concepts. Threads and all this other stuff are formalized and
restricted representations of things you already understand.
If you find yourself thinking that someone shouldn't interrupt you because
you have the conversation mutex locked, you've begun to develop an intuitive
understanding of threaded programming.” You can apply that understanding to
help you design better threaded code with less effort. If something wouldn't make
sense in real life, you probably shouldn't try it in a program either.
About the examples in this book
This book contains a number of examples. All are presented as complete pro-
grams, and they have been built and tested on Digital UNIX 4.0d and Solaris 2.5.
All of these programs do something, but many do not do anything of any par-
ticular importance. The purpose of the examples is to demonstrate thread
management and synchronization techniques, which are mere overhead in most
real programs. They would be less effective at revealing the details if that “over
head” was buried within large programs that “did something.”
‘Within the book, examples are presented in sections, usually one function at a
time. The source code is separated from the surrounding text by a header and
trailer block which include the file name and, if the example comprises more than
one section, a section number and the name of the function. Each line of the
source code has a line number at the left margin. Major functional blocks of each
section are described in specially formatted paragraphs preceding the source
code. These paragraphs are marked by line numbers outside the left margin of
“Itmay also be a good time to take a break and read some healthy escapist fletion for a whileAsynchronous programming, by example 13
1.5
the paragraph, denoting the line numbers in the source listing to which the para-
graph refers, Here’s an example:
‘These lines show the header files included in most of the examples. The
header file declares constants and prototypes for the Pthreads func-
tions, and the errors.h header file includes various other headers and some
error-checking functions,
m_sample.c part 1 sampleinto
#include
finclude “errors.h
zi zs
pies Fare 1 ean!
T have written these examples to use error checking everywhere. That 1s, 1
check for errors on each function call. As long as you code carefully, this isn't
necessary, and some experts recommend testing only for errors that can result
from insufficient resources or other problems beyond your control. I disagree,
unless of course you're the sort of programmer who never makes a mistake.
Checking for errors is not that tedious, and may save you a lot of trouble during
debugging.
You can build and run all of the examples for yourself—the source code is
available online at http://www.aw.com/butenhof/ posixcode.html. A Makefile is,
provided to build all of the examples, though it requires modifications for various
platforms. On Digital UNIX, the examples were built with crLacs=-pthread
=stdl ~wi. On Solaris, they were built with CFLAGS=-D_REENTRANT -D_POSIX_C_
SOURCE=199506 -ipthread. Some of the examples require interfaces that may Not
be in the Pthreads library on your system, for example, clock_gettine, which is
part of the POSIX. 1b realtime standard. The additional realtime library is speci-
fied by the RT#LAGS variable, which is defined as RrFLAGS=-1re on Digital UNIX,
and as RTFLAGS=-1posixd on Solaris,
On Solaris 2.5 systems, several of the examples require calls to thr_setcon~
currency to ensure proper operation. This function causes Solaris to provide the
process with additional concurrency. In a few cases, the example will not operate
at all without this call, and in other cases, the example would fail to demonstrate
some behavior.
Asynchronous programming, by example
“In one moment I've seen what has hitherto been
Enveloped in absolute mystery,
‘And without extra charge | will give you at iarge
‘A Lesson in Natural History.”
Lewis Carol. The Hunting of the Snark1.5.1
a
rr
1s
16
a
ae
4 CHAPTER 1_ Introduction
‘This section demonstrates some basic asynchronous programming, using a
simple program that does something vaguely useful, by pretending to be an alarm
clock with a command interface for which you would not consider paying a dime
ina store. But then, this book is about threads, not user interfaces, and the code
that I need to show takes up quite enough space already.
The program prompts for input lines in a loop until it receives an error or end
of file on stdin. On each line, the first nonblank token is interpreted as the num-
ber of seconds to wait, and the rest of the line (up to 64 characters) is a message
that will be printed when the wait completes. I will offer two additional versions —
one using multiple processes, and one using multiple threads. Well use the three
examples to compare the approaches.
The baseline, synchronous version
Include the header file errors.h, which includes standard headers like
and and defines error reporting macros that are used
throughout the examples in this book. We don't use the error reporting macros in
this particular example, but consistency is nice, sometimes,
The “baseline” version, alarm.c, is a synchronous alarm program with a sin-
gle routine, main. Most of main is a loop, which processes simple commands until
fgets returns a NULL (error or end of file). Each line is "parsed" with sscanf to
separate the number of seconds to wait (td, the first sequence of digits) from the
message string to print (864(*\n}, the rest of the line, up to 64 characters exclud-
ing newline),
m_starn.c
eee eee eee eee!
Hinclude “errors.
int main (int argc, char *argvi})
4
int seconds;
char Line[128);
char message[64
while (1) {
printf ("Alarm> *);
if (fgets (Line, sizeof (line), stdin)
if (strlen (line) <= 1) continue;
NULL) exit (0);
% Parse input line into seconds (#4) and a message
* (864(7\n}), consisting of up to 64 characters
* separated from the seconds by whitespace.
“”Asynchronous programming, by example 15
20
21
2
23
24
25
15.2
if (sscanf (Line, "td 864(*\n}",
keeconds, message) < 2) {
fprintf (stderr, "Bad conmand\n");
} else ¢
sleep (seconds);
printf ("(%d) %5\n", seconds, message);
>
wie
‘The problem with the program alarm.c is that only one alarm request can be
active at a time. If you set an alarm to remind you to do something in 10 minutes
(600 seconds}, you can't decide to have it remind you of something else in 5 min-
utes, The program Is doing something synchronously that you would probably
lke to be asynchronous.
A version using multiple processes
27-37
‘There are lots of ways to make this program asynchronous; for example, you
could run more than one copy of the program. One way to run multiple copies is
to fork a child process for each command, as shown in alarm_fork.c. The new
version is asynchronous—you can enter commands at any time. and they will be
carried out independently. It isn't much more complicated than the original,
which is nice.
The main difference between alarm.c and alarm fork.c is that instead of
calling s1eep directly, it uses fork to create a new child process, which then calls
sleep (and, eventually, printf) asynchronously, while the parent process
continues.
‘The primary complication in this version is the need to “reap” any child pro-
cesses that have terminated. If the program fails to do this, the system will save
them all until the program terminates. The normal way to reap terminated child
processes is to call one of the wait functions. In this case, we call waitpid, which
allows the caller to specify the WNOHANG flag. The function will immediately reap
‘one child process if any have terminated, or will immediately return with a pro-
cess ID (pid) of 0. The parent process continues to reap terminated child
processes until there are no more to reap, When the loop terminates, main loops
back to line 13 to read a new command.
alarm forkc
finclude
#include
#include “errors.h*2
4
15
16
v7
18
19
20
2
23
25
26
27
2a
29
30
3
32
34
35
36
a7
38
39
4
a1
2
45
46
“a
48
ery
50
16 CHAPTER 1 _ Introduction
int main (int argc, char *argv[})
‘
int status;
char Line(i28];
int seconds;
pid_t pid:
char message[ 64];
white (1) (
printf ("Alarn>
if (fgets (Line, sizeof (line), stdin)
if (strlen (Line) <= 1) continue;
NULL) exit (0);
Parse input line into seconds (#4) and a message
(964(*\n]), consisting of up to 64 characters
* separated from the seconds by whitespace.
“
if (sscanf (Line, "td 864[*\n]",
seconds, message) < 2) {
fprintf (stderr, "Bad command\n");
} else {
pid = fork (7
Af (pid == (pid_t)-1)
errno_abort ("Fork");
if (pid == (pid_tyo) (
7
* In the child, wait and then print a message
7
sleep (seconds);
printé (*($d) $8\1
exit (0);
> else {
y
+ In the parent, call waitpid() to collect children
+ that have already terminated.
+ seconds, message);
/
do {
pid = waitpid ((pid_t)-1, NULL, WNOHANG);
if (pid == (pid t)-)
errno_abort ("Wait for child");
} while (pid T= (pid_tyo);
>
w clmtoreAsynchronous programming, by example 17
15.3
A version using multiple threads
Now, let us try another alarm program, alarm_thread.c. It is much like the
fork version in alarn_fork.c, except that it uses threads instead of child pro-
cesses to create asynchronous alarms. Four Pthreads calls are used in this
program:
+ pthread_create creates a thread running the routine specified in the third
argument (e1arm_thread). returning an identifier for the new thread to the
variable referenced by thread
+ pthread_detach allows Pthreads to reclaim the thread's resources as soon
as it terminates.
‘+ pthread_exit terminates the calling thread
+ pthread_self returns the calling thread's identifier.
‘The alarm_t structure defines the information stored for each alarm com-
mand, the number of seconds until the alarm is due, and the message string that
will be printed by the thread.
#include
Hinclude “errors.h"
typedef struct alarmtag {
int seconds;
char message[ 64
} alarm_t;
waters tres i arin
‘The alarm_thread function is the “alarm thread.” That is, each alarm thread
is created running this function, and when the function returns the thread termi-
nates. The function's argument (void *arg) Is the fourth argument that was
passed to pthread_create, in this case, a pointer to the control packet (alarm _t)
created for the alarm request that the thread is to satisfy. The thread starts by
“mapping” the void * argument as a pointer to a control packet. The thread
detaches itself by calling pthread_detach, which informs Pthreads that the appli-
cation does not need to know when the thread terminates or its termination
status.
The thread sleeps for the number of seconds specified in its control packet,
and then prints the message string. Finally, the thread frees the control packet
and returns. When a thread returns from its initial routine, as it does here, the
thread terminates. Normally, Pthreads would hold the thread's resources so that
another thread could later determine that it had exited and retrieve a final result.
Because the thread detached itself, none of that is necessary.1a
n
2
2
2s
a
12
4
15
16
v
18
1
20
18 CHAPTER 1 Introduction
alarm thread.c part 2__ alarm thread
void ‘alarm thread (void +arg)
‘
elarmt ‘alarm = (alarm t*)arg;
int status;
status = pthread detach (pthread_self ());
if (status t= 0)
err_abort (status, “Detach thread”);
Sleep (alarm->seconds);
prints ("(td) 48\n", alarm->seconds, alarm->nessage);
free (alarm);
return NULL;
}
a ‘wiara_thread
[he ain program of alarn_thread.c is much the same as the other two varl-
see, l0ePs. reading and interpreting command lines as long as it can read em
stdin.
‘Av alarm thread is created, running function alara_thread, with the alarm
data (alarm_t) as the thread's argument,
alarm thresd.c part 3 main
int main (int arge, char *argvi})
«
int status;
char Line( 128);
alara_t ‘alarn;
pthread t thread;
while (1) (
printf (*Alarm> ");
Af (fgets (Line, sizeof (1ine), stdin)
if (strlen (Line) <= 1) continues
alarm = (alarm t*)malloc (sizeof (alarm_t));
if (alarm == NOLL)
errno_abort ("Allocate alarm’
NULL) exit (0);
Ie
7 Parse input Line into seconds (4) and a message
* (864("\n]), consisting of up to 64 characters
* separated from the seconds by whitespace,
"22
23
24
26
20
29
30
2
1.5.4
Asynchronous programming, by example 19
Af (sscanf (Line, "td 464(7\n}",
galarm->seconds, alarm->message) <2) {
Eprintf (stderr, "Bad comnand\n");
free (alarm);
) else {
status = pthread create (
Sthread, NULL, alarm_thread, alarm);
if (status I= 0)
err_abort (status, "Create alarm thread");
>
WW aiara threate part main
Summary
‘A good way to start thinking about threads is to compare the two asynchro-
nous versions of the alarm program. First, in the fork version, each alarm has an
independent address space, copied from the main program. That means we can
put the seconds and message values into local variables—once the child has
‘been created (when fork returns), the parent can change the values without
affecting the alarm. In the threaded version, on the other hand, all threads share
the same address space—so we call malloc to create a new structure for each
alarm, which is passed to the new thread. The extra bookkeeping required intro-
duces a little complexity into the threaded version.
In the version using fork, the main program needs to tell the kernel to free
resources used by each child process it creates, by calling waitpid or some other
member of the wait “family.” The alarm_fork.c program, for example, calls
waitpid in a loop after each command, to collect all child processes that have
completed. You do not need to wait for a thread unless you need the thread's
return value—in alarn_thread.c, for example, each alarm thread detaches itself
(at line 6, part 2) so that the resources held by the thread will be returned imme-
diately when it terminates.
In the threaded version, the “primary activities” (sleeping and printing the
message) must be coded in a separate routine. In alarm.c and alarm_fork.c.
those activities were performed without a call. In simple cases such as our alarm
program, it is often easier to understand the program with all code in one place,
so that might seem like an advantage for alarm_fork.c. In more complicated pro-
grams, though, it is rare that a program's “primary activities” are so simple that
they can be performed in a single routine without resulting in total confusion,
Ina real alarm program, you wouldn't want to create a process for each alarm,
You might easily have hundreds of alarms active, and the system probably
‘wouldn't let you create that many processes. On the other hand, you probably can
create hundreds of threads within a process. While there is no real need to main
tain a stack and thread context for each alarm request, it is a perfectly viable
design.20 CHAPTER 1 Introduction
A more sophisticated version of alarm_thread.c might use only two threads:
‘one to read input from the user, and another to wait for expiration of the next
alarm—T'l show that version later, after we've worked through some more basic
concepts. You could do the same thing with two processes, of course, but it would
be more cumbersome. Passing information between two threads is easy and
fast—no shared memory to map, no pipes to read or write, no concerns about
whether you are passing addresses that may not mean the same thing in both
Processes, Threads share everything in their address space—any address that's
valid in one thread is valid in all threads.
1.6 Benefits of threading
“0 Looking-Giass creatures,’ quoth Alice, ‘draw near!
‘Ts an honour fo see me, «favour fo her:
"Ts a privilege high to have dinner and tea
‘Along withthe Rec! Queen, the While Queen, and me!"
Lewis Care, Through the Looking Giass
Some advantages of the multithreaded programming model follow:
1. Exploitation of program parallelism on multiprocessor hardware, Parallel-
ism is the only benefit that requires special hardware. The others can help
most programs without special hardware.
2. More efficient exploitation of a program’s natural concurrency, by allowing
the program to perform computations while waiting for slow I/O opera-
tions to complete
3. A modular programming model that clearly expresses relationships
between independent “events” within the program.
‘These advantages are detailed in the following sections,
1.6.1 Parallelism
On a multiprocessor system, threading allows a process to perform more than
one independent computation at the same time. A computation-intensive
threaded application running on two processors may achieve nearly twice the
Performance of a traditional single-threaded version. “Nearly twice” takes into
account the fact that you'll always have some overhead due to creating the extra
thread(s) and performing synchronization. This effect is often referred to as
“sealing.” A two-processor system may perform 1.95 times as fast as a single pro-
cessor, a three-processor system 2.9 times as fast, a four-processor system 3.8
limes as fast, and so forth. Scaling almost always falls off as the number of pro-
cessors increases because there's more chance of lock and memory collisions,
which cost time.Benefits of treading 21
‘Scaling can be predicted by “Amdah's law,” which is shown in Figure 1.1. In
the equation, p represents the ratio of “parallelizable code” over “total execution
time,” and n represents the number of processors the code can use. The total
elapsed time for a parallel job is the sum of the elapsed time for the nonparalleliz-
able work (1 ~ p) and the elapsed time for each processor executing the
parallelizable work (p/n).
Amdah!'s law is a simple relationship showing how parallelism is limited by
the amount of serialization needed. When the program has no parallelizable code
(p is 0), the speedup is 1. That is, it is not a parallel program. If the program
requires no synchronization or other serial code (p is 1), then the speedup is n
(the number of processors). As more synchronization is required, parallelism pro-
vides less benefit. To put it another way, you'll get better scaling with activities
that are completely independent than with activities that are highly dependent:
‘The independent activities need less synchronization.
‘The diagram in Figure 1.2 shows the effect of Amdaht's law. “Clock time”
progresses from left to right across the page, and the diagram shows the number
of processors working in parallel at any moment. Areas where the diagram has
only a single horizontal line show that the process is serialized. Areas that have
several horizontal lines in parallel show where the process benefits from multiple
processors. If you can apply multiple processors for only 10% of your program's
execution time, and you have four processors, then Amdahl's law predicts a
speedup of 1/(0.9+(0.1/4)), or about 8%:
time
FIGURE 1.2. Parallelism charted against time22 CHAPTER 1_ Introduction
1.6.2
Operations on large matrices can often be “parallelized” by splitting the matrix
into pieces. For example, each thread may be able to operate on a set of rows or
columns without requiring any data written by threads operating on other slices.
You still generally need to synchronize threads at the beginning and end of pro-
cessing the matrix, frequently using a barrier.” Amdahi's law shows that you'll get
better performance by giving each thread a large and relatively independent
“chunk” of work, requiring infrequent synchronization, than by giving them
smaller chunks.
‘Amdahi's law is an excellent thought exercise to help you understand scaling.
It is not, however, a practical tool, because it is nearly impossible to accurately
compute p for any program. To be accurate, you need to consider not only all
serialized regions within your code, but also within the operating system kernel
and even in hardware, Multiprocessor hardware must have some mechanism to
synchronize access to the contents of memory. When each processor has a pri-
vate data cache, the contents of those caches must be kept consistent with each
other and with the data in memory. All of this serialization must be included in
any accurate calculation.
Concurrency
‘The threaded programming model allows the program to make computational
progress while waiting for blocking operations like 1/0. This is useful for network
servers and clients, and it is the main reason that client/server systems (such as
OSF DCE) use threads. While one thread waits for a long network read or write
operation, that thread is blocked and other threads in your application can exe-
cute independently. Some systems support asynchronous 1/O operations, which
can give similar advantages; but most UNIX-based systems do not have asynchro-
nous 1/0." Furthermore, asynchronous I/O is generally a lot more complicated to
use than threads.
For example, you need to either handle asynchronous notification when the
1/0 completes, or poll for completion. If you issued an asynchronous I/O and
then entered a polling loop. you would lose the advantage of asynchronous 1/O—
your application would just wait. If you poll elsewhere, or handle asynchronous
notification, then issuing the 1/O and processing the resulting data occur in dif-
ferent locations within your program. That makes the code more difficult to
“A barrier is a simple synchronization mechanism that blocks each thread until a certain
number has reached the barrier: then all threads are unblocked. Barriers can be used. for
example, to keep any thread from executing a parallel region of code until all threads are ready
to execute the region. Section 7.1.1 describes barriers in more detail, and demonstrates the con-
struction of a simple barrier package.
+ UNIX systems support “nonblocking 1/0." but this fs not the same thing as asynchronous
1/0. Nonblocking 1/0 allows the program to defer Issuing an 1/O operation until it ean complete
‘without blocking, but asynchronous 1/0 can proceed while the program does something eis.Benefits of threading 23
analyze and maintain, When you use synchronous I/O you just perform the 1/0
and then do whatever comes next. Synchronous I/O within multiple threads
fives nearly all the advantages of asynchronous 1/O. In most cases you will find it
much easier to write complex asynchronous code using threads than using tradi-
tional asychronous programming techniques.
You could write an alarm program, like those shown in Section 1.5, as an
asynchronous program without using processes or threads, with timer signals for
the alarms and asynchronous reads for input. Using timer signals is more com-
plicated in many ways, since you are severely limited in what you can do within a
signal handler. Asynchronous 1/0 does not allow you to take advantage of the
convenience of stdio functions. The basic program function will be scattered
through a series of signal handlers and functions, and will probably be harder to
understand
‘Asynchronous 1/0 does have one advantage over threaded concurrency,
though. Just as a thread is usually “cheaper” (in execution time and storage
space) than a process, the context required for an asynchronous 1/0 operation is
almost always cheaper than a thread. If you plan to have a lot of asynchronous
1/0 operations active at the same time, that might be important enough to justify
using the more complicated programming model. But watch out—some “asyn-
chronous 1/0” packages just distribute your 1/0 requests across a pool of
threads! Most of the time you will be better off using threads
‘Another method of coding an asynchronous application is for each action to
be treated as an “event.” Events are queued by some “hidden” process. and dis-
patched serially to be handled by the application, usually through “callback”
routines registered with the dispatcher. Event dispatchers have been popularized
by windowing interface systems such as the Apple Macintosh toolbox, Microsoft
Windows, and X Windows on UNIX (used by Motif and CDE),
‘The event mechanism alleviates much of the complication of using signals and
asynchronous 1/0. as long as the events are supported direetly by the event dis-
patcher, All, for example, handle input from the keyboard or pointer device, and
generally one can request a timer event to be inserted automatically at a desired
time. Thus, the alarm program, written to an event interface, need only initialize
the event dispatcher and enter a loop to process events. Input events would be
dispatched to the parser, resulting in a request for a new timer event; and timer
events would be dispatched to a function that would format and print the alarm
message.
For very simple applications (and the alarm program here is certainly one
example), an event-based implementation may be simpler than the multiprocess
or multithread variations I've shown—at least when the (often substantial) over-
head of initializing the event dispatcher is removed. The limitations of events
become more obvious when you build larger and more sophisticated applica
tions—the problem is that the events are sequential
vents are not concurrent, and the program can do only one thing at a time
‘Your application receives an event, processes it, and then receives the next event.
If processing an event takes a long time, for example, sorting a large database,24 CHAPTER 1_ Introduction
1.6.3
the user interface may remain unresponsive for quite a while. If an event involves
a long wait. for example, reading data over a slow network connection, then,
again, the user must wait.
‘The response problems can be minimized by liberally sprinkling extended
operations with calls to the event dispatcher—but getting them in the right place,
without substantially impacting the performance of the operation, can be diffi
cult. Furthermore, you may not have that option, if the database sort is taking
place in a shared library you bought from somebody else,
On the other hand, one might code the application to create a new thread that
runs the database sort, or reads from the slow network, leaving the “user inter-
face” thread to immediately request another event. The application becomes
responsive, while the slow operation continues to run, You can do this even if a
database package, for example, cannot tolerate being run in multiple threads, by
queuing a “sort” command to a server thread that runs database operations
serially—while still retaining interface responsiveness.
Programming model
It may be surprising that programming with threads is a good idea even if you
know your code will never run on a multiprocessor. But it is true, Writing with
threads forces you to think about and plan for the synchronization requirements
of your program. You've always had to think about program dependencies, but
threads help to move the requirements from comments into the executable struc-
ture of the program.
Assembly language programs can use all the same sequential control struc-
tures (loops, conditional code) as programs written in a high-level language.
However, it can be difficult to determine whether a branch instruction represents
the top or bottom of a loop, a simple conditional, a “conditional goto.” or some-
thing more exotic. Switching to a higher-level language that supports these
sequential controls directly in source, for example, the C language do, while, for.
if, and switch statements, makes these sequential programming constructs
explicit in the source language. Making control structures explicit in the program
source code means that more of your program's design is explicit in the source,
and that makes it easier for someone else to understand.
Similarly, a C language program (or even an assembler program) may use data
encapsulation and polymorphism by adhering to programming conventions, and
with luck those conventions may be carefully documented and the documenta
tion kept updated. But if that same code is written in an object-oriented language,
the encapsulation and polymorphism become explicit in the source language.
In a sequential program, synchronization requirements are implicit in the
ordering of operations. The true synchronization requirements, for example, that
“a file must be opened before data can be read from the file,” may be documented
only by source comments, if at all. When you program using threads, sequential
assumptions are (or at least should be) limited to small segments of contiguousCosts of threading 25
17
code—for example, within a single function. More global assumptions, to be at all
safe, must be protected by explicit synchronization constructs.
In traditional serial programming you call function A to do one thing, then call
another function B to do something else, even when those two functions don't
require serialization, If a developer is trying to determine what the program is
doing, perhaps to trace a bug, it often isn't obvious that there may be no need to
follow both calls. Furthermore, the strictly serial model makes it easy for some-
one to inadvertently make function B dependent on some side effect of function A.
Ifa later modification reverses the order of the calls, the program may break in
‘ways that aren't obvious, Program dependencies may be documented using source
code comment blocks, but comments are often ambiguous and may not be prop-
erly updated when code is changed.
‘The threaded programming model isolates independent or loosely coupled
functional execution streams (threads) in a clear way that’s made explicit in the
program's source code. If activities are designed as threads, each function must
include explicit synchronization to enforce its dependencies. Because synchroni-
zation is executable code, it can't be ignored when dependencies are changed.
‘The presence of synchronization constructs allows anyone reading the code to
follow temporal dependencies within the code, which can make maintenance
substantially easier, especially for large programs with a lot of independent code.
‘An assembly language programmer can write better, more maintainable assem-
bly code by understanding high-level language programming; a C language
programmer can write better, more maintainable C code by understanding object-
oriented programming. Even if you never write a threaded program, you may ben-
efit from understanding the threaded programming model of independent
functions with explicit dependencies. These are “mental models" (or that dread-
fully overused word, “paradigms") that are more or less independent of the specific
code sequences you write. Cleanly isolating functionally independent code may
even make sequential programs easier to understand and maintain.
Costs of threading
All this time the Guard was looking at her. rst rough a telescope, then
through a microscope, and then through an opera-glass.
Al last he said, "You're traveling the wrong way,”
‘and shut up the window, and went away.
Lewis Carol, Trough the Looking-Glass
Of course there's always “the flip side.” As I showed in the previous section,
threads provide definite and powerful advantages, even on uniprocessor systems.
‘They provide even more advantages on a multiprocessor.
‘So why wouldn't you want to use threads? Everything has a cost, and
threaded programming is no exception. In many cases the advantage exceeds the26 CHAPTER 1 Introduction
17.1
cost; in others it doesn't. To be fair, the following subsections discuss the cost of
threaded programming.
Computing overhead
1.7.2
Overhead costs in threaded code include direct effects such as the time it
takes to synchronize your threads, Many clever algorithms are available for
avoiding synchronization in some cases, but none of them is portable. You'll have
to use some synchronization in just about any threaded code. It is easy to lose
performance by using too much synchronization; for example, by separately pro-
tecting two variables that are always used together. Protecting each variable
separately means you spend a lot more time on synchronization without gaining
parallelism, since any thread that needs one variable will need the other as well.
‘The overhead of threaded programming can also include more subtle effects,
For example, threads that constantly write the same memory locations may spend
a lot of time synchronizing the memory system on processors that support “read/
write ordering.” Other processors may spend that time synchronizing only when
you use special instructions such as a memory barrier, or a “multiprocessor
atomic” operation like test-and-set. Section 3.4 says a lot more about these effects.
Removing a bottleneck in your code, for example, by adding threads to perform,
multiple concurrent I/O operations, may end up revealing another bottleneck at a
lower level—in the ANSI C library, the operating system, the file system, the device
driver, the memory or I/O architecture, or the device controller. These effects are
often difficult to predict, or measure, and are usually not well documented,
A compute-bound thread, which rarely blocks for any external event, cannot
effectively share a processor with other compute-bound threads. An I/O thread
might interrupt it once in a while, but the 1/0 thread would block for another
external event and the compute-bound thread would run again. When you create
more compute-bound threads than there are available processors, you may gain
better code structuring over a single-threaded implementation, but you will have
worse performance. The performance suffers because the multithreaded imple-
mentation adds thread synchronization and scheduling overhead to the work you
‘wanted to accomplish—and does it all using the same compute resources.
Programming discipline
Despite the basic simplicity of the threaded programming model, writing real-
world code is never trivial, Writing code that works well in multiple threads takes
careful thought and planning. You have to keep track of synchronization proto-
cols and program invariants. You have to avoid deadlocks, races, and priority
inversions. I'll describe all of these things in later sections, show how to design
code to avoid the problems, and how to find and repair them after the fact.Costs of threading 27
1.7.3
‘You will almost certainly use library code that you did not write. Some will be
supplied with the operating system you use. and most of the more common
libraries will likely be safe to use within multiple threads, POSIX guarantees that
most functions specified by ANSI C and POSIX must be safe for use by mullti-
threaded applications. However, a lot of “interesting” functions you will probably
need are not included in that list. You will often need to call libraries that are not
supplied with the operating system, for example, database software. Some of that
code will not be thread-safe. I will discuss techniques to allow you to use most
unsafe code, but they will not always work, and they can be ugly.
All threads within a process share the same address space, and there's no
protection boundary between the threads. If a thread writes to memory through
an uninitialized pointer, it can wipe out another thread's stack, or heap memory
being used by some other thread. The eventual failure will most likely occur in
the innocent victim, possibly long after the perpetrator has gone on to other
things. This can be especially important if arbitrary code is run within a thread.
For example, in a library that supports callbacks to functions supplied by its
caller, be sure that the callback, as well as the library, is thread-safe.
The important points are that good sequential code is not necessarily good
threaded code, and bad threaded code will break in ways that are more difficult to
locate and repair, Thinking about real-life parallelism can help a lot, but pro-
gramming requires a lot more detailed work than most things in real life.
Harder to debug
You will learn more about debugging threaded code, and, more importantly.
not debugging threaded code, in Chapter 8. You will see some of the tools you
may encounter as well as some techniques you can use on your own. By then
you will know all about mutexes and memory visibility, and you will be ready to
deal with deadlocks and races. Don't worry about the details now—the point of
this brief section is to demonstrate that you will have to learn about threaded
debugging, and it {s not as easy yet as anyone would like it to be. (So when was
debugging ever easy?)
‘Systems that support threads generally extend traditional sequential debug-
ging tools to provide basic debugging support. The system may provide a
debugger that allows you to see the call tree for all of your program's threads, for
‘example, and set breakpoints that activate only in particular threads. The system
may provide some form of performance analyzer that lets you measure the pro-
cessor time accumulated within each function for a specific thread or across all
threads.
Unfortunately that's only the beginning of the problems when you're debug-
ging asynchronous code. Debugging inevitably changes the timing of events. That
doesn't matter much when you're debugging sequential code, but it is critical
when you're debugging asynchronous code. If one thread runs even slightly
slower than another because it had to process a debugger trap. the problem you're1.8
28 CHAPTER 1 Introduction
trying to track down may not happen. Every programmer has run into problems
that won't reproduce under the debugger. You'll run into a lot more of them when
you use threads,
Itis difficult to track down a memory corruptor, for example, a function that
writes through an uninitialized pointer, in a sequential program. It is even harder
in a threaded program. Did some other thread write to memory without using @
mutex? Did it use the wrong mutex? Did it count on another thread setting up a
pointer without explicit synchronization? Was it just an old fashioned sequential
memory corruptor?
Various additional tools are provided by some systems to help you. None of
these {s standard or widely available. Tools may check source code for obvious
violations of locking protocol, given a definition of which variables are shared and
how they should be locked. They may record thread interactions while the pro-
gram runs, and allow you to analyze or even replay the interactions to determine
what happened. They may record and measure synchronization contention and
overhead. They may detect complicated deadlock conditions between a set of
mutexes.
Your most powerful and portable thread debugging tool is your mind, applied
through the old fashioned manual human-powered code review. You'll probably
spend a lot of time setting up a few breakpoints and examining lots of states to try
to narrow the problem down a little and then carefully reading the code to find
problems. It is best if you have someone available who didn't write the code,
because a lot of the worst errors are embarrassingly obvious to someone who's.
not burdened with detailed knowledge of what the code was supposed to do.
To thread or not to thread?
“My poor client's fale now depends on your votes.”
Here the speaker sat down in his place,
‘And directed the Judge to reter to his notes.
‘And briefly fo sum up the case.
Lows Carol, Ihe Hunting of the Snark
Threads don't necessarily provide the best solution to every programming
problem. They're not always easier to use, and they don't always result in better
performance.
‘A few problems are really “inherently nonconcurrent.” and adding threads will
only slow the program down and complicate it. If every step in your program
depends directly on the result of the previous step, then using threads probably
won't help. Each thread would have to wait for another thread to complete.
The most obvious candidates for threaded coding are new applications that
accomplish the following:19
POSIX thread concepts 29
1. Perform extensive computation that can be parallelized (or “decomposed")
into multiple threads, and which is intended to run on multiprocessor
hardware, or
2. Perform substantial [/O which can be overlapped to improve throughput—
many threads can wait for different I/O requests at the same time. Distrib-
uted server applications are good candidates, since they may have work to
do in response to multiple clients, and they must also be prepared for
unsolicited I/O over relatively slow network connections.
Most programs have some natural concurrency, even if it is only reading a
command from the input device while processing the previous command.
‘Threaded applications are often faster, and usually more responsive, than
sequential programs that do the same job. They are generally much easier to
develop and maintain than nonthreaded asynchronous applications that do the
same job.
‘So should you use threads? You probably won't find them appropriate for
every programming job you approach. But threaded programming is a technique
that all software developers should understand.
POSIX thread concepts
“You seem very clever at explaining words, Si,” said Alice.
“Would you kindly tell me the meaning of the poem
‘called ‘Jabberwocky’?”
“Let's hear It," said Humpty Dumpty. "ican explain all
the poems that ever were Invented—and a good many
that haven't been invented just yet.”
Lewis Carol, Trough the Looking Glass
First of all, this book focuses on “POSIX threads.” Technically, that means the
thread “application programming interfaces” (API) specified by the international
formal standard POSIX 1003.1¢-1995. This standard was approved by the IEEE
in June 1995, A new edition of POSIX 1003.1, called ISO/IEC 9945-1:1996 (ANSI/
IEEE Std 1003.1, 1996 Edition) is available from the IEEE.” This new document
includes 1003.1b-1993 (realtime), 1003.1c-1995 (threads), and 1003.1i-1995
(corrections to 1003. 1b-1993). Unless you are writing an implementation of the
standard, or are extremely curious, you probably don't want to bother buying the
* Contact the IEEE at 1-800-678-IEEE, 9945-1:1996 Information Technology Portable Opera-
ting System interface (POSIX)—Part I: System Application: Program Interface (API) [C Languagel
ISBN 1-55937-573-6, order number SH94352.30 CHAPTER 1_ Introduction
19.1
POSIX standard. For writing threaded code, you'll find books like this one much
more useful, supplemented by the programming documentation for the operating
system you're using.
As | explained in the preface, I will use the informal term "Pthreads” to refer to
“POSIX 1003. 1c-1995." | will use the slightly more formal term “POSIX.1b" to
refer to “POSIX 1003.1b-1993" in the text, "POSIX.14" to refer to the POSIX
1003.14 "Multiprocessor Profile,” and similar abbreviated notation for other
POSIX standards. I'll use the full names where precision is important, for exam-
ple, to compare POSIX 1003.1-1990 and POSIX 1003.1-1996, and also in section
titles and captions that appear in the table of contents.
Architectural overview
You may remember from Section 1.2 that the three essential aspects of a
thread system are execution context, scheduling, and synchronization, When
you evaluate any thread system, or compare any two thread systems, start by
categorizing the features into capabilities that support execution contexts,
scheduling, and synchronization,
With Pthreads, you create an execution context (thread) by calling pthread_
create. Creating a thread also schedules the thread for execution, and it will
begin by calling a “thread start function” that you specified. Pthreads allows you
to specify scheduling parameters either at the time you create the thread, or later
on while the thread is running. A thread normally terminates when it calls
pthread_exit, or returns from the thread start function, although we will
encounter a few other possibilities later.
The primary Pthreads synchronization model uses mutexes for protection and
condition variables for communication. You can also use other synchronization
mechanisms such as semaphores, pipes, and message queues. A mutex allows
one thread to lock shared data while using it so that other threads cannot acel-
dentally interfere. A condition variable allows a thread to wait for shared data to
reach some desired state (such as “queue not empty" or “resource available’)
Types and interfaces
‘This section briefly outlines the Pthreads data types, and some of the rules for
interpreting them. For a full description of the “object” represented by each type
and how to create and use those objects in a program, see the appropriate sec-
tions later in this book, as shown in Table 1.2.POSIX thread concepts 31
1.9.3
Section Description
pehread_t 2 thread identifier
pthread mutex t | 3.2 | mutex
pthread_cond_t 3.3 condition variable
read:fey 2 BA ica
Praread attr § [S237 thread atitbutes objet
eicoe area] S| er aes est
5.2.2 __ condition variable attributes object
pebread condattz_t
pthread_oncet | 5.1 __| “one time initialization” control context
TABLE 1.2 POSIX threads types
All Pthreads types are “opaque”
Portable code cannot moke assumptions regarding the representation
of these types,
All of the “pthread” types listed in Table 1.2 are considered opaque. There is no
public definition of these types’ representation, and programmers should never
assume anything about the representation. You should use them only in the man-
ner specifically described by the standard. A thread identifier, for example, may be
‘an integer, or a pointer, or a structure, and any code that uses a thread identifier
in a way that is not compatible with all of those definitions is incorrect.
Checking for errors
Pthteads introduces a new way to report errors, without using the
errno variable.
‘The Pthreads amendment is the first part of POSIX to depart from the ancient
UNIX and C language conventions regarding error status. Traditionally, functions
that succeed returned a useful value if appropriate, or otherwise indicated suc-
cess by returning the value 0. On failure, they returned the special value ~1, and
set the global value errno to a code specifying the type of error.
‘The old mechanism has a number of problems, including the fact that it is dif-
ficult to create a function that can both report an error and return a useful value
of -1. There are even worse problems when you add multiple threads to a pro-
cess. In traditional UNIX systems, and in the original POSIX.1-1990 standard,
errno was an extern int variable. Since such a variable can have only one value
at a time, it can support only a single stream of execution within the process.32 CHAPTER 1_ Introduction
n
12
1
16
”
1
Pthecids functions don’t sot errne on errors!
(But most other POSIX functions do.)
New functions in the Pthreads standard reserve the return value for error sta-
tus, and errno 1s not used. Pthreads functions return the value 0 on success,
and include an extra output parameter to specify an address where “useful
results” are stored. When a function cannot complete successfully, an error code
from the header file is returned instead of 0 as the function value.
Pthreads also provides a per-thread errno, which supports other code that
uses errno. This means that when one thread calls some function that reports an
error using errno, the value cannot be overwritten, or read, by any other thread—
you may go on using errno just as you always have. But if you're designing new
interfaces you should consider following the new Pthreads convention for report-
ing crrors. Setting or reading the per-thread errno involves more overhead than
reading or writing a memory location, or returning a value from a function,
To wait for a thread, for example, and check for an error, you might use code
like that shown in the following code example, thread_error.c. The pthread_
Join function, used to wait for a thread to terminate, will report an invalid thread
identifier by returning the error code esRc#. An uninitialized pthread_t is likely
to be an invalid thread identifier on most implementations, The result of running
this program should be a message such as “error 3: no such process.
In the unlikely event that the uninitialized thread variable has a pthread_t
value that is not invalid, it should be the ID of the initial thread (there are no
other threads in this process). In this case, pthread_join should either fail with
EDEADLK, if your implementation of Pthreads detects self-deadlock, or the thread
will hang waiting for itself to exit.
thread error.c
#include
Hinclude
#include
int main (int argc, char *argv[})
‘
pthread _t thread;
int status;
r
Avtempt to join with an uninitialized thread ID. On most
implementations, this will return an ESRCH error code. If
the local (and uninitialized) pthread_t happens to be a valid
thread ID, it is almost certainly that of the initial thread,
which is running main(). In that case, your Pthreads
implementation may either return EDEADLK (self-deadlock) ,
or it may hang. If it hangs, quit and try again.
/POSIX thread concepts 33
status = pthread_join (thread, NULL);
if (status I= 0)
fprinté (stderr, “error td: te\n", status, strerror (status);
return status;
>
W thread errone
Note that there is no equivalent to the perror function to format and print an
error value returned by the Pthreads interfaces. Instead, use strerror to get a
string description of the error number, and print the string to the file stream
stderr,
To avold cluttering each function call in the example programs with a block of
code to report each error and call abort, I have built two error macros—err_abort
detects a standard Pthreads error, and errno_abort is used when a value of -1
means that errno contains an error code. The following header file, called
errors.h, shows these macros. The errors.h header file also includes several
system header files, which would otherwise be required by most of the example
programs—this helps to reduce the size of the examples.
fitndef _errors_h
define _ errors h
1
3
4 Hinelude
5 Hinclude
6 finclude
7 #inelude
8 #include
I
* Define a macro that can be used for diagnostic output from
* examples. when compiled -DDESUG, it results in calling printt
43. * with the specified argument list. When DEBUG is not defined, it
14 * expands to nothing.
aso
16 #ifdef DEBUG
17 # define DPRINTF(arg) printf arg
18 #else
19 # define DPRINTF(arg)
20 endif
a
22 /*
23. * NOTE: the "do {" ... "} while (0);" bracketing around the macros
24 * allows the err_abort and errno_abort macros to be used as if they
25 * were function calls, even in contexts where a trailing ";" would
26 * generate a null statement. For example,27
28
29
30
2
2
33
4
35
36
37
39
0
a
2
a
4a
4s
a
o
a CHAPTER 1_ Introduction
* if (status t= 0)
. err_abort (status, *message");
* else
. return status;
bort is a macro ending with "}", because
to follow the "}". Because ¢ does expect
fohlowing the *)" in the do...while construct, err abort and
[/ftine_abort can be used as if they were function calls
7
fdefine err_abort(code,text) do ( \
fprintf (stderr, "te at \nts\":td: ta\n", \
text, _FILE_, LINE, strerror (Code); \
abort (7 V
} white (0)
define errno _abort(text) do ( \
fprinté (stderr, “ts at \"$s\":4d: 45\n", \
text, PILE, _LINE_, strerror (errno); \
abort (); 7
} while (0)
fendi
9
_The one exception to the Pthreads error rules is pthread_getspeciic, which
Trans the thread-specific data value of a shared *key.” Section 5.4 deseribes
so the pthread_
Hea fe function doesn't report errors at all. If the pthread key t value 18
legal, or if no value has been set in the thread, pehread geterccitie Just
returns the value NULL.2 Threads
eee
21
“Wt seven maids with seven mops
‘Swept itfor half a year,
Do you suppose," the Walrus sd,
“That they could get if clear?”
“I doubt it," sal the Carpenter,
‘And shed a biter tear
Lows Carell, Trough the Looking-Giass
‘Threads are (and perhaps this will come as no surprise) the essential basis of
the style of programming that I am advocating. Although this chapter focuses on
threads, you will never learn everything you need to know about threads by simn-
ply skipping to this chapter and reading it. Threads are a critical part of the
landscape, but you can't do much with only threads. Nevertheless, one must
start somewhere, and here we are.
‘Section 2.1 describes the programming aspects of creating and managing
threads in your program, that is, how to create threads, how they are represented
in your program, and the most basic things you can do to them once you've cre~
ated them.
Section 2.2 describes the life cycle of a thread, from creation through “recy~
cling,” taking you through all the scheduling states threads can assume along
the way.
Creating and using threads
“A loat of bread,” the Walrus said,
“is what we chiefly need:
Pepper and vinegar besides
‘Are very good indeed—
‘Now, If you're ready, Oysters dear,
We can begin fo feed.”
Lewis Carol, Trough the Looking-Glass
3836 CHAPTER 2 Threads
pthread t thread;
int pthread equal (pthread_t tl, pthread t 2);
int pthread create (pthread_t ‘thread,
const pthread attr_t vattr,
void *(*start)(void *), void *arg);
pthread t pthread self (void);
int sched_yield (void);
int pthread exit (void tvalue_ptr);
int pthread detach (pthread t thread);
int pthread_join (pthread t thread, void *+value ptr);
The introduction covered some of the basics of what a thread is, and what it
means to the computer hardware. This section begins where the introduction left
off. It explains how a thread is represented in your program, what it means to
your program, and some of the operations you can perform on threads. If you
haven't read the introduction, this would be a good time to skip back to it. (I'll
wait for you here.)
Within your program a thread is represented by a thread identifier, of the
opaque type pthread _t. To create a thread, you must declare a variable of type
pthread_t somewhere in your program. If the identifier is needed only within a
function, or if the function won't return until the thread is done, you could
declare the identifier with auto storage class. Most of the time, though, the iden-
lifier will be stored in a shared (static or extern) variable, or in a structure
allocated from the heap.
A Pthreads thread begins by calling some function that you provide. This
“thread function” should expect a single argument of type void *, and should
return a value of the same type. You create a thread by passing the thread func-
tion's address, and the argument value with which you want the function to be
called, to pthread_create.
When you create a thread, pthread create returns an identifier, in the
pthread_t value referred to by the thread argument, by which your code refers to
the new thread. A thread can also get its own identifier using the pthread_self
function. There {s no way to find a thread's identifier unless either the creator or
the thread itself stores the identifier somewhere. You need to have a thread's
identifier to do anything to the thread. If you'll need to know when a thread com-
pletes, for example, you must keep the identifier somewhere.
Pthreads provides the pthread_equal function to compare two thread identifi
ers. You can only test for equality. It doesn't make any sense to ask whether one
thread identifier is “greater than” or “less than” another, because there is no
ordering between threads. The pthread_equal function returns a nonzero value
if the thread identifiers refer to the same thread, and the value 0 if they do not
refer to the same thread.
1 The intiol thread (main) is specialCreating and using threads 37
When a C program runs, it begins in a special function named main. In a
threaded program, this special stream of execution is called the “initial thread” or
sometimes the “main thread.” You can do anything within the initial thread that
you can do within any other thread. It can determine its own thread identifier by
calling pthread_se1f, for example, or terminate itself by calling pthread_exit. If
the initial thread stores {ts thread identifier somewhere accessible to another
thread, that thread can wait for the initial thread to terminate, or detach the ini-
tial thread.
‘The initial thread ts special because Pthreads retains traditional UNIX process
behavior when the function rain returns; that is, the process terminates without
allowing other threads to complete. In general, you do not want to do this in a
threaded program, but sometimes it can be convenient. In many of the programs
{in this book, for example, threads are created that have no effect on anything
outside the process. It doesn’t really matter what those threads are doing, then, if
the process goes away. When the process exits, those threads, all their states,
and anything they might accomplish, simply “evaporate"—there's no reason to
clean up.
Detaching o thread that is stil running doesn’t atfect the thread in ony
\way-it ust informs the system that the thread's resources can be
reclaimed when the thread eventually terminates.
Although “thread evaporation” is sometimes useful, most of the time your pro-
cess will outlive the individual threads you create. To be sure that resources used
by terminated threads are available to the process, you should always detach
each thread you create when you're finished with it: Threads that have termi-
nated but are not detached may retain virtual memory, including their stacks, as
well as other system resources. Detaching a thread tells the system that you no
longer need that thread, and allows the system to reclaim the resources it has
allocated to the thread.
If you create a thread that you will never need to control, you can use an
attribute to create the thread so that it is already detached. (We'll get to attributes
later, in Section 5.2.3.) If you do not want to wait for a thread that you created,
and you know that you will no longer need to control that thread, you can detach
it at any time by calling pthread_detach. A thread may detach itself, or any other
thread that knows Its pthread_¢ identifier may detach it at any time. If you need
to know a thread's return value, or if you need to know when a thread has com
pleted, call pthread_join, The pthread_join function will block the caller until
the thread you specify has terminated, and then, optionally, store the terminated
thread's return value, Calling pthread_join detaches the specified thread
automatically.
‘As we've seen, threads within a process can execute different instructions,
using different stacks, all at the same time, Although the threads execute inde-
pendently of cach other, they always share the same address space and file38 CHAPTER 2 Threads
18-25
26-29
n
2
14
descriptors. The shared address space provides an important advantage of the
threaded programming model by allowing threads to communicate efficiently.
Some programs may create threads that perform unrelated activities, but
‘most often a set of threads works together toward a common goal. For example,
one set of threads may form an assembly line in which each performs some spe-
cifie task on a shared data stream and then passes the data on to the next
thread. A set of threads may form a work crew and divide independent parts of a
common task. Or one "manager" thread may take control and divide work among
a “crew” of worker threads. You can combine these models in a variety of ways;
for example, a work crew might perform some complicated step in a pipeline,
such as transforming a slice of an array.
The following program, Lifecycie.c, creates a thread. We'll refer to this sim-
ple example in the following sections about a thread's life cycle.
The thread function, thread_routine, returns a value to satisfy the standard
thread function prototype. In this example the thread returns its argument, and
the value is always NULL.
“The program creates a thread by calling pthresd_create, and then waits for it
by calling pthread_join. You don't need to wait for a thread, but if you dont,
you'll need to do something else to make sure the process runs until the thread
completes. Returning from main will cause the process to terminate, along with
all threads. You could, for example, code the main thread to terminate by calling
pthread_exit, which would allow the process to continue until all threads have
terminated.
When the join completes, the program checks the thread's return value, to be
sure that the thread returned the value it was given. The program exits with 0
(Success) if the value is NULL, oF with 1 otherwise.
It is a good idea for all thread functions to return something, even if itis sim-
ply tit. If you omit the return statement, pthzead_join will still return some
value—whatever happens to be in the place where the thread's start function
would have stored a return value (probably a register)
lifecycle,
#include
Ainclude “errors.h"
”
* Thread start routine.
7
void *thread_routine (void *arg)
‘
return arg;
>
main (int arg, char *argvi})
‘
pthread_t thread_id;
void *thread_resuit:1
vv
18
19
a
2
25
26
27
2a
29
30
2.2
The life of a thread 39
int status;
status = pthread_create (
kthread_id, NULL, thread_routine, NULL);
if (status T= 0)
err_abort (status, "Create thread");
status = pthread_join (thread id, éthread_result);
if (status I= 0)
err_abort (status, “Join thread");
if (thread_result == NULL)
return 0;
else
return 1;
Titer
If the “Joining” thread doesn't care about the return value, or if it knows that
the “joinee” (the thread with which it is joining) didn't return a value, then it can
pass NULL instead of 6retval in the call to pthread_join. The joinee’s return
value will be ignored.
When the call to pthread_join returns, the joinee has been detached and you
can't join with it again. In the rare cases where more than one thread might need
to know when some particular thread has terminated, the threads should wait on
a condition variable instead of calling pthread_join. The terminating thread
would store its return value (or any other information) in some known location,
and broadcast the condition variable to wake all threads that might be interested,
The life of a thread
Come, listen, my men, while I Yell you again
The five unmistakable marks
By which you may know, wheresoever you go,
The warranted genuine Snarks.
Lewis Carol, The Hunting ofthe Snark
At any instant, a thread {s in one of the four basic states described in Table 2.1
In implementations, you may see additional “states” that distinguish between
various reasons for entering the four basic states. Digital UNIX, for example, repre-
sents these finer distinctions as “substates,” of which each state may have several.
Whether they're called “substates" or additional states, “terminated” might be
divided into “exited” and “cancelled”; “blocked” might be broken up into “blocked
on condition variable,” “blocked on mutex,” “blocked in read,” and so forth.2.2.1
40 CHAPTER 2 Threads
State Meaning
Ready The thread is able to run, but is waiting for a processor. It may
have just started, or just been unblocked, or preempted by anoth-
er thread.
Running | The thread is currently running; on a multiprocessor there may be
more than one running thread in the process.
Blocked | The thread is not able to run because it is waiting for something;
for example, it may be waiting for a condition variable, or waiting
to lock a mutex, or waiting for an I/O operation to complete.
Terminated | The thread has terminated by returning from its start function,
calling pthread_exit. or having been cancelled and completing all,
cleanup handlers. It was not detached, and has not yet been
Joined. Once it is detached or joined, it will be recycled.
TABLE 2.1. Thread states
These finer distinctions can be important in debugging and analyzing
threaded programs. However, they do not add substantially to the basfe under-
standing of thread scheduling, and we will not deal with them here.
‘Threads begin in the ready state. When the new thread runs it calls your
specified thread start function. It may be preempted by other threads, or block
itself to wait for external events any number of times. Eventually it completes and
either returns from the thread start function or calls the pthread_exit function,
In either case it terminates. If the thread has been detached, it is immediately
recycled. (Doesn't that sound nicer than “destroyed"—and most systems reuse
the resources to make new threads.) Otherwise the thread remains in the term
nated state until joined or detached. Figure 2.1 shows the relationships between
these thread states, and the events that cause threads to move from one state to
another.
‘Creation
‘The “initial thread” of a process is created when the process is created. In a
system that fully supports threaded programming, there's probably no way to
execute any code without a thread. A thread Is likely to be the only software con-
text that includes the hardware state needed to execute code: registers, program
counter, stack pointer, and so forth.
‘Additional threads are created by explicit calls. The primary way to create
threads on a Pthreads system is to call pthread_create. Threads may also be
created when the process receives a POSIX signal if the process signal notify
mechanism is set to SIGEV_THREAD, Your system may provide additional non-
standard mechanisms to create a thread.Magia
‘The life of a thread 41
(~ ready
wait satisfied
blocked
Stat
created schedyled walt for resource
running
Jdone, or cancelled
FIGURE 2.1. Thread state transitions
When a new thread is created, its state ts ready. Depending on scheduling
constraints, it may remain in that state for a substantial period of time before
executing. Section 5.5 contains more information on thread scheduling, Going
back to Lifecycle.c, the thread running thread_routine becomes ready during
rains call to pthread_create, at line 18,
‘The most important thing to remember about thread creation is that there is
no synchronization between the creating thread's return from pthread_create
and the scheduling of the new thread. That is, the thread may start before the
creating thread returns. The thread may even run to completion and terminate
before pthread_create returns. Refer to Section 8.1.1 for more information and
warnings about what to expect when you create a thread.
Startup
Once a thread has been created, it will eventually begin executing machine
instructions. The initial sequence of instructions will lead to the execution of the
‘read start function that you specified to pthread_create. The thread start func-
tion is called with the argument value you specified when you created the thread.
In Lifecycte.c, for example, the thread begins executing user code at function
thread_routine, with the formal parameter argunent having a value of NULL.
In the initial thread, the thread “start function” (nain) is called from outside
your program: for example, many UNIX systems link your program with a file
called crt0.0, which initializes the process and then calls your main. This is a42 CHAPTER 2 Threads
2.2.3
minor implementation distinction, but it is Important to remember because there
are a few ways in which the initial thread is different. For one thing, main is called
with different arguments than a thread start function: the program's argument
array (age and argv} instead of a single void* argument. For another thing,
when a thread start function returns, the thread terminates but other threads
continue to run, When the function main returns in the initial thread, the process
will be terminated immediately. If you want to terminate the initial thread while
allowing other threads in the process to continue running, call pthread_exit
instead of returning from main.
Another important difference to remember is that on most systems, the initial
thread runs on the default process stack, which can grow to a substantial size,
“Thread” stacks may be much more limited on some implementations, and the
program will fail with a segmentation fault or bus error if a thread overflows its
stack.
Running and blocking
Like us, threads usually can't stay awake their entire life. Most threads occa-
sionally go to sleep. A thread can go to sleep because it needs a resource that is
not available (it becomes “blocked or because the system reassigned the proces-
sor on which it was running (it is “preempted’). A thread spends most of its
active life in three states: ready, running, and blocked.
‘A thread is ready when it is first created, and whenever its unblocked so that
it 18 once again eligible to run. Ready threads are waiting for a processor. Also,
when a running thread is preempted, for example, if it is timesliced (because it
has run too long). the thread immediately becomes ready.
'A thread becomes running when it was ready and a processor selects the
thread for execution, Usually this means that some other thread has blocked. or
has been preempted by a timeslice—the blocking (or preempted) thread saves its
context and restores the context of the next ready thread to replace itself. On a
multiprocessor, however, a previously unused processor may execute a readied
thread without any other thread blocking.
‘A thread becomes blocked when it attempts to lock a mutex that is currently
locked, when it waits on a condition variable, when it calls eigvait for a signal
that is not currently pending, or when it attempts an 1/O operation that cannot
be immediately completed. A thread may also become blocked for other system
operations, such as a page fault.
When a thread is unblocked after a wait for some event, it is made ready
again. It may execute immediately, for example, if a processor is available. In
Lifecyeie.c, the main thread blocks at line 23, in pthread_join, to wait for the
thread it created to run. If the thread had not already run ai this point, it would
move from ready to running when main becomes blocked. As the thread runs to
completion and returns, the main thread will be unblocked—returning to the
ready state, When processor resources are available, either immediately or after
the thread becomes terminated, main will again become running, and complete.2.2.4
‘The life of a thread 43
Termination
A thread usually terminates by returning from its start function (the one you
ass to the pthread_create function). The thread shown in 1ifecycie.c term!-
nates by returning the value NULL, for example. Threads that call pthread_exit
or that are cancelled using pthread_cancel also terminate after calling each
cleanup handler that the thread registered by calling pthread_cleanup_push and
that hasn't yet been removed by calling pthread_cleanup_pop. Cleanup handlers
are discussed in Section 5.3.3.
Threads may have private “thread-specific data” values (thread-specific data is
discussed in Section 5.4). If the thread has any non-NutL thread-specific data
values, the associated destructor functions for those keys (if any) are called,
If the thread was already detached it moves immediately to the next section,
recycling. Otherwise, the thread becomes terminated. It will remain available for
another thread to join with it using pthread_join. This is analogous to a UNIX
process that's terminated but hasn't yet been “reaped” by a wait operation. Some-
times it is called a “zombie” because it still exists even though it is “dead.” A
zombie may retain most or all of the system resources that it used when running,
80 it {5 not a good idea to leave threads in this state for longer than necessary.
Whenever you create a thread with which you won't need to join, you should use
the detachstate attribute to create it “detached” (see Section 5.2.3).
At a minimum, a terminated thread retains the identification (pthread_t
value) and the void* return value that was returned from the thread's start fune-
tion or specified in a call to pthread_exit, The only external difference between a
thread that terminated “normally” by returning or calling pthread_exit, and one
that terminated through cancellation, is that a cancelled thread's return value is
always PTHREAD_CANCELLED. (This is why “cancelled” is not considered a distinct
thread state.)
If any other thread is waiting to join with the terminating thread, that thread
is awakened. It will return from its call to pthread_join with the appropriate
return value. Once pthread_join has extracted the return value. the terminated
thread is detached by pthread_join, and may be recycled before the call to
pthread_join returns. This means that, among other things, the returned value
should never be a stack address associated with the terminated thread's stack—
the value at that address could be overwritten by the time the caller could use it.
In Lifecyele.c, the main thread will return from the pthread_join call at line 23
with the value wut,
1 pthread_joinis a convenience,not rule.
Even when you need a return value from a thread that you create, it is often at
least as simple to create the thread detached and devise your own customized
return mechanism as it is to use pthread_join. For example, if you pass infor-
mation to a worker thread in some form of structure that another thread can find
later, you might have the worker thread simply place the result in that same2.2.5
44 CHAPTER 2 Threads
structure and broadcast a condition variable when done. The Pthreads context
for the thread, including the thread identifier, can then be recycled immediately
when the thread is done, and you still have the part you really need, the return
value, where you can find it easily at any time.
If pthread_join does exactly what you want, then by all means use it. But
remember that it is nothing more than a convenience for the simplest and most
limited model of communicating a thread’s results, If it does not do exactly what
you need, build your own return mechanism instead of warping your design to fit
the limitations of pthread_join,
Recycling
If the thread was created with the detachstate attribute set to PTHREAD_
OREATE_DBTACHED (see Section 5.2.3), or if the thread or some other thread has
already called pthread_detach for the thread's identifier, then the thread is
immediately recycled when it becomes terminated,
If the thread has not been detached when it terminates, it remains in the ter
minated state until the thread's pthread_t identifier Is passed to pthread_detach
or pthread_join. When cither function returns, the thread cannot be accessed
again, In 1ifecycle.c, for example, the thread that had run thread_routine will
be recycled by the time the main thread returns from the pthread_join call at
ne 23,
Recycling releases any system or process resources that weren't released at
termination. That includes the storage used for the thread's return value, the
stack, memory used to store register state, and so forth. Some of these resources
may have been released at termination: it is important to remember that none of
it should be accessed from any other thread after termination. For example, if a
thread passes a pointer to its stack storage to another thread through shared
data, you should treat that information as obsolete from the time the thread that
owns the stack terminates,3 Synchronization
3.1
“That's right” said the Tiger-tly. “The daisies are worst of al.
When one speaks, they ail begin together, and i's
‘enough fo make one wither fo hear ihe way they go on!”
—tews Corel Through the Looking-Giass
‘To write a program of any complexity using threads, you'll need to share data
between threads, or cause various actions to be performed in some coherent
order across multiple threads. To do this, you need to synchronize the activity of
your threads.
Section 3.1 describes a few of the basic terms we'll be using to talk about
thread synchronization: critical section and invariant.
Section 3.2 describes the basic Pthreads synchronization mechanism, the
mutex.
Section 3.3 describes the condition variable, a mechanism that your code can
use to communicate changes to the state of invariants protected by a mutex.
Section 3.4 completes this chapter on synchronization with some important
information about threads and how they view the computer's memory.
Invariants, critical sections, and predicates
“1know what you'te thinking about,”
sald Tweedledum; “but i in’? $0, nohow.”
“Contrariwise,” continued Tweecledee,
“itt was so, it might be; and ifit were $0, t would be;
ut as tisn', rain". Mars logic.”
Lows Carrol, Through the Looking-Giass
Invariants are assumptions made by a program, especially assumptions about
the relationships between sets of variables. When you bulld a queue package, for
example, you need certain data. Each queue has a queue header, which is a
pointer to the first queued data element. Each data element includes a pointer to
the next data element. But the data isn't all that’s important—your queue pack-
age relies on relationships between that data. The queue header, for example,
4546 CHAPTER $ Synchronization
‘must either be NULL oF contain a pointer to the first queued data element. Each
data element must contain a pointer to the next data element. or NULL if itis the
last. Those relationships are the invariants of your queue package.
It is hard to write a program that doesn't have invariants, though many of
them are subtle. When a program encounters a broken invariant, for example, if
it dereferences a queue header containing a pointer to something that is not a
valid data element, the program will probably produce incorrect results or fail
immediately.
Critical sections (also sometimes called “serial regons") are areas of code that
affect a shared state. Since most programmers are trained to think about pro-
‘gram functions instead of program data, you may well find it easier to recognize
critical sections than data invariants. However, a critical section can almost
always be translated into a data invariant, and vice versa. When you remove an
element from a queue, for example, you can see the code performing the removal
as a critical section, or you can see the state of the queue as an invariant. Which
you see first may depend on how you're thinking about that aspect of your
design.
‘Most invariants can be “broken,” and are routinely broken, during {solated
areas of code, The trick is to be sure that broken invariants are always repaired
before “unsuspecting” code can encounter them. That is a large part of what “syn-
chronization” is all about in an asynchronous program. Synchronization protects
your program from broken invariants. If your code locks a mutex whenever it
must (temporarily) break an invariant, then other threads that rely on the invari-
ant, and which also lock the mutex, will be delayed until the mutex is unlocked—
when the invariant has been restored.
Synchronization is voluntary, and the participants must cooperate for the sys-
tem to work. The programmers must agree not to fight for (or against) possession
of the bailing bucket. The bucket itself does not somehow magically ensure that
‘one and only one programmer bails at any time. Rather, the bucket is a reliable
shared token that, if used properly, can allow the programmers to manage their
resources effectively.
“Predicates” are logical expressions that describe the state of invariants
needed by your code. In English, predicates can be expressed as statements like
“the queue is empty” or “the resource is available.” A predicate may be a bootean
variable with a TRUE or FALSE value, or it may be the result of testing whether a
pointer is NULL. A predicate may also be a more complicated expression, such as
determining whether a counter is greater than some threshold. A predicate may
even be a value returned from some function. Fer example, you might call select
or poll to determine whether a file is ready for input.Mutexes a7
3.2
Mutexes
“How are you getting on?” said the Cot,
{5 s00n as there was mouth enough fort fo speak with
‘Alice waited ti he eyes appeared, and then nodded.
“i's no use speaking 0 i,” she thought,
“ts ears have come, or atleast one of them.”
Lewis Carol Ace's Adventures in Wonderland
Most threaded programs need to share some data between threads, There may
be trouble if two threads try to access shared data at the same time, because one
thread may be in the midst of modifying some data invariant while another acts
on the data as if it were consistent. This section is all about protecting the pro-
‘gram from that sort of trouble.
‘The most common and general way to synchronize between threads is to
ensure that all memory accesses to the same (or related) data are “mutually
exclusive.” That means that only one thread is allowed to write at a time—others
must wait for their turn, Pthreads provides mutual exclusion using a special
form of Edsger Dijkstra's semaphore [Dijkstra, 1968ql, called a mutex. The word
‘mutex is a clever combination of “mut” from the word “mutual” and "ex" from the
word “exclusion.”
Experience has shown that it is easier to use mutexes correctly than it is to
use other synchronization models such as a more general semaphore. It is also
easy to build any synchronization models using mutexes In combination with
condition variables (we'll meet them at the next corner, in Section 3.3). Mutexes
are simple, flexible, and can be implemented efficiently.
‘The programmers’ bailing bucket is something like a mutex (Figure 3.1). Both
are “tokens” that can be handed around, and used to preserve the integrity of the
concurrent system. The bucket can be thought of as protecting the bailing critical
section—each programmer accepts the responsibility of bailing while holding the
bucket, and of avoiding interference with the current batler while not holding the
bucket. Or, the bucket can be thought of as protecting the invariant that water
can be removed by only one programmer at any time.
Synchronization isn't important just when you modify data. You also need
synchronization when a thread needs to read data that was written by another
thread, if the order in which the data was written matters. As we'll see a little
later, in Section 3.4, many hardware systems don't guarantee that one processor
will See shared memory accesses in the same order as another processor without
a “nudge” from software.48 CHAPTER $ Synchronization
FIGURE 3.1. Mutex analogy
Consider, for example, a thread that writes new data to an element in an
array, and then updates a max_index variable (o indicate that the array element
is valid. Now consider another thread, running simultaneously on another pro-
cessor, that steps through the array performing some computation on each valid
element. If the second thread “sees” the new value of max_index before it sees the
new value of the array element, the computation would be incorrect. This may
seem irrational, but memory systems that work this way can be substantially
faster than memory systems that guarantee predictable ordering of memory
accesses. A mutex is one general solution to this sort of problem. If each thread
locks a mutex around the section of code that’s using shared data, only one
thread will be able to enter the section at a time.
Figure 3.2 shows a timing diagram of three threads sharing a mutex. Sections
of the lines that are above the rounded box labeled “mutex” show where the asso-
clated thread does not own the mutex. Sections of the lines that are below the
center line of the box show where the associated thread owns the mutex, and sec-
Uons of the lines hovering above the center line show where the thread is waiting
to own the mutex,
Initially, the mutex {s unlocked. Thread 1 locks the mutex and, because there
is no contention, it succeeds immediately—thread 1's line moves below the center