4 Threads
4 Threads
4 Threads
1
Processes
• Recall that a process includes many things
• An address space (code, data),
• OS resources (e.g., open files) and accounting information
• Execution state (program counter, registers, and stack pointer etc.).
2
Example: Web Server
• How does a web server handle a request?
• Solution 1:
• forks off copies of itself to handle multiple simultaneous requests
• Processes communicate with each other via inter-process communication
3
Concurrent Programs
• To execute these programs, we need to
• Create several processes that execute in parallel
• Cause each to map to the same address space to share data
• They are all part of the same computation
• Have the OS schedule these processes in parallel (logically or physically)
4
Rethinking Processes
• What is similar in these cooperating processes?
• They all share the same code and data (address space)
• They all share the same privileges
• They all share the same resources (files, communication channels, etc.)
• Idea: Why not separate the process concept from its execution state?
• Process: address space, privileges, resources, etc.
• Execution state: PC, SP, registers
• Memory-management information
• memory allocated to the process (segments, page tables etc.)
7
Thread Control Block (TCB)
• Shared information
• Process info: parent process
• Memory: code/data segment, page table, and stats
• I/O and file: communication ports, open file descriptors
• Private information
• State (ready, running and waiting)
• Registers, Program counter
• Execution stack
8
Threads in a Process
10
Threads vs. Processes
Process Thread
Processes are containers in which threads execute. Threads are the unit of scheduling.
Each process operates independently of the others. A thread can access and alter another thread's data.
If a process is blocked by some I/O request, then no If a thread is blocked by some I/O request, another
further execution is possible until it gets unblocked. thread in the same task can continue its execution.
• Parallelism implies a system can perform more than one task simultaneously
14
Multicore Programming
• Types of parallelism
• Data parallelism – distributes subsets of the same data across multiple cores,
same operation on each
• Task parallelism – distributing threads across cores, each thread performing unique
operation
15
Amdahl’s Law
• Identifies performance gains from adding additional cores to an application that has
both serial and parallel components
• S is serial portion with N processing cores
• That is, if application is 75% parallel / 25% serial, moving from 1 to 2 cores results
in speedup of 1.6 times
• As N approaches infinity, speedup approaches 1/S
• Serial portion of an application has disproportionate effect on performance
gained by adding additional cores 16
• But does the law take into account contemporary multicore systems?
Amdahl’s Law
17
Motivation
• Most modern applications are multithreaded
• Threads run within application
• Multiple tasks with the application can be implemented by separate threads
• Update display
• Fetch data
• Spell checking
• Answer a network request
• Process creation is heavy-weight while thread creation is light-weight
• Can simplify code, increase efficiency
• Kernels are generally multithreaded
18
Benefits of Threads
• Responsiveness – may allow continued execution if part of process is
blocked, especially important for user interfaces.
• E.g., user interaction and image loading in multi-threaded web browser:
19
• Scalability – process can take advantage of multiprocessor architectures
Kernel-Level Threads
• All thread operations are implemented in the kernel
• Examples:
• Windows
• Solaris
• Linux
• Tru64 UNIX
• Mac OS X 20
Kernel Thread Limitations
• Every thread operation must go through kernel
• create, join, exit, synchronize, or switch for any reason
• Generally: threads 10x to 30x slower when implemented in kernel
• Examples:
• POSIX Pthreads,
• Java threads
22
User-Level Thread Limitations
• Cannot take advantage of multiple CPUs or cores
• User-level threads
• Faster to create, manipulate, synchronize
• Not integrated with OS (uninformed scheduling)
• In user thread implementation, all user threads of the same process are
effectively mapped to one kernel thread
25
Many-to-One
• Many user-level threads mapped to single kernel thread
• One thread blocking causes all to block
• Thread management is done by a thread library in user space
• Multiple threads may not run in parallel on multi-core system because only one
may be in kernel at a time
• Few systems currently use this model
• Examples:
• Solaris Green Threads
• GNU Portable Threads
26
One-to-One
• Each user-level thread maps to a kernel thread
• Creating a user-level thread creates a kernel thread
• More concurrency than many-to-one model
• System keeps running while one thread executes a blocking system call
• It also allow multiple threads to run in parallel on multiprocessor system
• Number of threads per process sometimes restricted due to overhead
• May lead to too many kernel threads
• Examples:
• Windows
• Linux
27
Many-to-Many
• Allows many user level threads to be mapped to many kernel threads
• Examples:
• Windows ThreadFiber package
• Otherwise not very common
28
Two-level Model
• Similar to many-to-many model, except that it allows a user thread to be bound
to kernel thread
• Examples:
• IRIX
• HP-UX
• Tru64 UNIX
• Solaris 8 and earlier
29
Thread Context Switch
• The context switch routine does all of the magic
• Saves context of the currently running thread
• Push all machine state onto its stack
• Restore context of the next thread
• Pop all machine state from the next thread’s stack
• The next thread becomes the current thread
• Returns to caller as new thread
30
Context Switching of User-Level Threads
• If both threads belong to the same process
• Handled by the dispatcher in the library
• Only need to store/load the TCB information
• OS does not do anything
31
Programming with Threads: Thread
Libraries
• Thread library provides programmer with API for creating & managing threads
32
Programming with Threads: Pthreads
• May be provided either as user-level or kernel-level
• A POSIX standard (IEEE 1003.1c) API for thread creation and synchronization
33
Programming with Threads: Pthreads
34
Programming with Threads: Windows
35
Programming with Threads: Java Threads
• Java threads are managed by the JVM
• Creating a thread:
• Waiting on a thread:
37
Programming with Threads: Java Threads
• Rather than explicitly creating threads, Java also allows thread creation around
the Executor interface:
38
Programming with Threads: Java Threads
39
Programming with Threads: Implicit
Threading
• Growing in popularity as numbers of threads increase, program correctness
more difficult with explicit threads
• Advantages:
• Usually slightly faster to service a request with an existing thread than to create a
new thread
• Allows the number of threads in the application(s) to be bound to the size of the pool
• Separating task to be performed from mechanics of creating task allows different
strategies for running task
• i.e., tasks could be scheduled to run periodically
42
Programming with Threads: Fork-Join
Parallelism
• Multiple threads (tasks) are forked, and then joined.
43
Programming with Threads: Fork-Join
Parallelism
44
Programming with Threads: Fork-Join in Java
45
Programming with Threads: Fork-Join in Java
• The ForkJoinTask is an abstract base class
• RecursiveTask and RecursiveAction classes extend ForkJoinTask
• RecursiveTask returns a result (via the return value from compute() method)
• RecursiveAction does not return a result
46
Programming with Threads: OpenMP
• Set of compiler directives and an API for C, C++, FORTRAN
• Provides support for parallel programming in shared-memory environments
• Identifies parallel regions – blocks of code that can run in parallel
47
Programming with Threads: Grand Central
Dispatch
• Apple technology for Mac OS X and iOS operating systems
• Extensions to C, C++ and Objective-C languages, API, and run-time library
• Allows identification of parallel sections and manages most details of threading
• Block is in “^{ }” - ˆ{ printf("I am a block"); }
• Blocks placed in dispatch queue and assigned to available thread in thread pool
when removed from queue
• Two types of dispatch queues:
• Serial: blocks removed in FIFO order, queue is per process, called main queue
• Programmers can create additional serial queues within program
• concurrent – removed in FIFO order but several may be removed at a time
• Four system wide queues divided by quality of service
QOS_CLASS_USER_INTERACTIVE
48
QOS_CLASS_USER_INITIATED
QOS_CLASS_USER_UTILITY
QOS_CLASS_USER_BACKGROUND
Programming with Threads: Intel TBB
• Template library for designing parallel C++ programs
• The same for loop written using TBB with parallel_for statement:
49
Programming with Threads: Summary
• Flexible, but error-prone, since there is no protection between threads
• In C/C++
• Automatic variables are private to each thread
• Global variables and dynamically allocated memory are shared
• Need Synchronization!!!
50
Threading Issues: Semantics of fork() and
exec()
• Does fork( ) duplicate only the calling thread or all threads?
• Some UNIXes have two versions of fork
51
Threading Issues: Signal Handling
• UNIX systems use signals to notify occurrence of a particular event to a process
53
Threading Issues: Thread Cancellation
• Invoking thread cancellation requests cancellation, but actual cancellation
depends on thread state
55
Threading Issues: Thread Specific Data
• Thread-local storage (TLS) allows each thread to have its own copy of data
• Useful when you do not have control over the thread creation process (i.e., when
using a thread pool)
56
Threading Issues: Scheduler Activations
• Many-to-Many models require communication to maintain the appropriate
number of kernel threads allocated to the application
• Typically use an intermediate data structure between user and kernel threads –
lightweight process (LWP)
• Appears to be a virtual processor on which process can schedule user thread to run
• Each LWP attached to kernel thread
• How many LWPs to create?
58
Windows Threads
• Windows API – primary API for Windows applications
• Implements the one-to-one mapping, kernel-level
• Each thread contains
• A thread ID
• Register set representing state of processor
• Separate user and kernel stacks for when thread runs in user mode or kernel mode
• Private data storage area used by run-time libraries and dynamic link libraries
• The register set, stacks, and private storage area are known as the context of
the thread
59
Windows Threads
• The primary data structures of a thread include:
• ETHREAD (executive thread block) – includes pointer to process to which thread
belongs and to KTHREAD, in kernel space
• KTHREAD (kernel thread block) – scheduling and synchronization info, kernel-mode
stack, pointer to TEB, in kernel space
• TEB (thread environment block) – thread id, user-mode stack, thread-local storage,
in user space
60
Linux Threads
• Linux refers to them as tasks rather than threads
• clone() allows a child task to share address space of the parent task (process)
• Flags control behavior
61
• struct task_struct points to process data structures (shared or unique)