0% found this document useful (0 votes)
11 views45 pages

L5-Concurrent_Data_Structures

The document discusses the design of concurrent data structures in modern C++, focusing on thread safety and synchronization techniques such as fine-grained vs. coarse-grained locking and lock-based vs. lock-free approaches. It highlights the importance of maintaining data invariants, avoiding race conditions, and enabling true concurrency while providing examples of thread-safe stacks and queues. Additionally, it explores the concepts of blocking and nonblocking data structures, including lock-free and wait-free algorithms, emphasizing their benefits and challenges.

Uploaded by

Chang Jingyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views45 pages

L5-Concurrent_Data_Structures

The document discusses the design of concurrent data structures in modern C++, focusing on thread safety and synchronization techniques such as fine-grained vs. coarse-grained locking and lock-based vs. lock-free approaches. It highlights the importance of maintaining data invariants, avoiding race conditions, and enabling true concurrency while providing examples of thread-safe stacks and queues. Additionally, it explores the concepts of blocking and nonblocking data structures, including lock-free and wait-free algorithms, emphasizing their benefits and challenges.

Uploaded by

Chang Jingyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Lecture 5 – Concurrent Data

Structures (in modern C++)


CS3211 Parallel and Concurrent Programming
Outline
• Design concurrent (thread-safe) data structures
• Fine-grained vs. coarse-grained synchronization
• Lock-based vs. lock-free
• Examples:
• Stack
• Queue

CS3211 L5 - Concurrent Data Structures 2


Design data structures for concurrency
• Goal
• Multiple threads can access the data structure concurrently
• Performing the same or distinct operations
• Each thread sees a self-consistent view of the data structure
• Designing thread-safe data structures
• No data is lost or corrupted
• All invariants are upheld
• No problematic race condition

CS3211 L5 - Concurrent Data Structures 3


Protect the data structure with a mutex
• Prevents true concurrent access to the data it protects
• Serialization: threads take turns accessing the data protected by the
mutex
• We want concurrent data structures that enable true concurrency
• Some data structures have more scope for true concurrency than others

CS3211 L5 - Concurrent Data Structures 4


Invariants
• Invariants – statements that are always true about a particular data
structure
• Often broken during an update on the data structure
• Example: this variable no_of_items contains the number of items in the list
• Use invariants to reason about program correctness

CS3211 L5 - Concurrent Data Structures 5


Delete a node from a doubly linked list
*Invariant is broken during the delete
a) Identify the node to delete: N.
b) Update the link from the node
prior to N to point to the node
after N.
c) Update the link from the node
after N to point to the node prior
to N.
d) Delete node N.

CS3211 L5 - Concurrent Data Structures 6


Building a thread-safe data structure
• Ensure that no thread can see a state where the invariants of the data
structure have been broken by the actions of another thread
• Take care to avoid race conditions inherent in the interface to the
data structure by providing functions for complete operations rather
than for operation steps
• Pay attention to how the data structure behaves in the presence of
exceptions to ensure that the invariants are not broken
• Minimize the opportunities for deadlock when using the data
structure by restricting the scope of locks and avoiding nested locks
where possible

CS3211 L5 - Concurrent Data Structures 7


Concurrency while calling functions
• Constructors and destructors require exclusive access to the data
structure
• It’s up to the users to ensure that data structures not accessed before
construction is complete or after destruction has started
• Data structure that support assignment, swap(), or copy construction:
• Decide whether these operations are safe to call concurrently with other
operations or whether they require the user to ensure exclusive access even
though the majority of functions for manipulating the data structure may be
called from multiple threads concurrently without any problems

CS3211 L5 - Concurrent Data Structures 8


Truly designing for concurrency
• The smaller the protected region, the fewer operations are serialized,
and the greater the potential for concurrency
• Provide the opportunity for concurrency to threads accessing a
thread-safe data structure
• If one thread is accessing the data structure through a particular function,
which functions are safe to call from other threads?

CS3211 L5 - Concurrent Data Structures 9


Thread-safe data structures
• Minimize the amount of serialization that must occur
• Lock at an appropriate granularity
• A lock should be held for only the minimum possible time needed to perform
the required operations
• Alternatives
• Multiple threads might perform one type of operation on the data structure
concurrently, whereas another operation requires exclusive access by a single
thread
• Use std::shared_mutex to allow concurrent access from threads that read the data
structure, but exclusive access for threads that modify the data structure
• Safe for multiple threads to access a data structure concurrently if they’re
performing different actions, whereas multiple threads performing
the same action would be problematic
CS3211 L5 - Concurrent Data Structures 10
Enable genuine concurrent access
• Can the scope of locks be restricted to allow some parts of an
operation to be performed outside the lock?
• Can different parts of the data structure be protected with different
mutexes?
• Do all operations require the same level of protection?
• How about operations on const objects?
• mutable keyword
• Can a simple change to the data structure improve the opportunities
for concurrency without affecting the operational semantics?

CS3211 L5 - Concurrent Data Structures 11


Lock-based concurrent data structures
• Ensure that the right mutex is locked when accessing the data and
that the lock is held for the minimum amount of time
• Data can’t be accessed outside the protection of the mutex lock
• There are no race conditions inherent in the interface
• Using multiple mutexes to protect separate parts of the structure
• Deadlocks are possible

CS3211 L5 - Concurrent Data Structures 12


S1: A thread-safe stack with one global mutex
• Safe for multiple threads to call the member functions concurrently

CS3211 L5 - Concurrent Data Structures 13


Shared pointer std::shared_ptr
• Reference-counted smart pointer: a reference to the pointed-to data
• Shared_ptr instance is copied: the number of references increases by 1
• Shared_ptr instance is destructed: the number of references decreases by 1
• When the number of references reaches 0, std::shared_ptr automatically calls the
destructor on the object that it is pointing to
• Counting the references is thread safe
• The pointed-to data is not thread safe

CS3211 L3 - Synchronization 14
S1: Concurrency in the thread-safe stack
• Safe for multiple threads to call the member functions concurrently
• Exception safe
• BUT the work is serialized for the stack data structure
• only one thread is ever doing any work in the stack data structure at a time
• Serialization limits the performance of an application that exhibits
significant contention on the stack
• No means of waiting for an item to be added
• A thread must periodically call empty(), or call pop() and catch the
empty_stack exceptions
• Consume precious resources checking for data or the user must write an
external wait and notification code

CS3211 L5 - Concurrent Data Structures 15


Q1: A thread-safe queue with notification (one mutex)

CS3211 L5 - Concurrent Data Structures 16


Q1: A thread-safe queue with notification (one mutex)
• The analysis for the stack applies here as well
• The wait_and_pop() functions are a solution to the problem of waiting
for a queue entry
• Exception safety
• If more than one thread is waiting when an entry is pushed onto the queue
(data_cond.wait), only one thread will be woken by the call
to data_cond.notify_one()
• But if that thread then throws an exception in wait_and_pop() (when the new
std::shared_ptr<> is constructed), none of the other threads will be woken
• Solutions:
• Replaced with data_cond.notify_all(), which will wake all the threads but at the cost
of most of them then going back to sleep when they find that the queue is empty after all, or
• wait_and_pop() call notify_one() if an exception is thrown, or
• Move the std::shared_ptr<> initialization to the push() call and
store std::shared_ptr<> instances rather than direct data values (see next slide)

CS3211 L5 - Concurrent Data Structures 17


Q2: Push shared-pointers in the queue

CS3211 L5 - Concurrent Data Structures 18


Q2: Push shared-pointers in the queue
• Exception safe
• The allocation of the new instance can now be done outside the lock
in push()
• Beneficial for the performance of the queue
• Usage of one mutex limits the concurrency supported by this queue
• By using the standard container (std::queue<>) you now have one data item
that’s either protected or not
• For fine-grained locking you need to take control of the detailed
implementation of the data structure

CS3211 L5 - Concurrent Data Structures 19


A thread-safe queue using fine-grained locks

• Analyze the constituent parts and associate one mutex with each
distinct item
• Insert mutexes into the data structure itself, and thus we cannot simply make
use of the STL library anymore
• Our queue
• Push (enqueue) at the back
• Pop (dequeue) from the front

CS3211 L5 - Concurrent Data Structures 20


Q3: Single-threaded queue

CS3211 L5 - Concurrent Data Structures 21


try_pop

CS3211 L5 - Concurrent Data Structures 22


push

CS3211 L5 - Concurrent Data Structures 23


Q3: Single-threaded queue
• Uses std::unique_ptr<node> to manage the nodes
• Ensures that they (and the data they refer to) get deleted when they’re no
longer needed, without having to write an explicit delete
• Problems when adding a mutex for front and back
• push() can modify both front and back
• push() and pop() access the next pointer of a node. If there’s a single
item in the queue:
• push() updates back->next, and try_pop() reads front->next
• front==back, so both front->next and back->next are the same object 
requires protection
• You can’t tell if it’s the same object without reading both front and back, you now
have to lock the same mutex in both push() and try_pop()
• Same serialization as before

CS3211 L5 - Concurrent Data Structures 24


Q4:Enabling concurrency by separating data
• Pre-allocate a dummy node with no data
• Ensure that there’s always at least one node in the queue to separate the
node being accessed at the front from that being accessed at the back
• Empty queue: front and back point to the dummy node
• No race on front->next and back->next
• Downside: add an extra level of indirection to store the data by pointer in
order to allow the dummy nodes

Queue with dummy node Empty queue

CS3211 L5 - Concurrent Data Structures 25


Q4: (Single-threaded queue) Enabling
concurrency by separating data
Pop

Push

CS3211 L5 - Concurrent Data Structures 26


What do you need to lock?
Q3: Queue without dummy node Q4: Queue with dummy node
• Pop: need to lock both front and back mutexes • Pop: Need to lock both front and back mutexes

• Push: need to lock both front and back mutexes; • Push: front is not affected; need to lock only back
always check front==back mutex

CS3211 L5 - Concurrent Data Structures 27


Q5: Q4 with locks (fine-grained thread-safe queue)

CS3211 L5 - Concurrent Data Structures 28


Queue invariants
• back->next==nullptr
• back->data==nullptr
• front==back implies an empty list
• A single element list has front->next==back
• For each node x in the list, where x!=back, x->data points to an
instance of T and x->next points to the next node in the list. x-
>next==back implies x is the last node in the list
• Following the next nodes from front will eventually yield back

CS3211 L5 - Concurrent Data Structures 29


Q6: An even more fine-grained lock-based queue

CS3211 L5 - Concurrent Data Structures 30


Q6: An even more fine-grained lock-based queue
• Line 3: Per-node mutex
• Synchronizes-with relationship between
push and pop threads
• Line 38: Additional scope introduced in
try_pop()
• Avoid UAF: node_lock unlocks early,
BEFORE we proceed to call delete
old_node
• Delete is expensive  release mut_front
early

CS3211 L5 - Concurrent Data Structures 31


Lock-based data structures
• Mutexes are powerful mechanisms for ensuring that multiple threads
can safely access a data structure without encountering race
conditions or broken invariants
• The granularity of locking can affect the potential for true
concurrency

CS3211 L5 - Concurrent Data Structures 32


Lock-free concurrent data structures

CS3211 L5 - Concurrent Data Structures 33


Blocking data structures
• Algorithms and data structures that use mutexes, condition variables,
and futures to synchronize the data are called blocking data
structures and algorithms
• The application calls library functions that will suspend the execution
of a thread until another thread performs an action
• These library calls are termed blocking calls
• The thread can’t progress past this point until the block is removed
• Typically, the OS will suspend a blocked thread completely (and allocate its
time slices to another thread) until it’s unblocked by the appropriate action of
another thread, such as
• unlocking a mutex,
• notifying a condition variable, or
• making a future ready

CS3211 L5 - Concurrent Data Structures 34


Nonblocking data structures
• Data structures and algorithms that do not use blocking library
functions are said to be nonblocking
• Types of nonblocking data structures
• Example: a spin lock is nonblocking, as it spins until the test_and_set is successful
• Obstruction-free: if all other threads are paused, then any given thread will
complete its operation in a bounded number of steps.
• Lock-free: if multiple threads are operating on a data structure, then after a
bounded number of steps one of them will complete its operation.
• Wait-free: every thread operating on a data structure will complete its
operation in a bounded number of steps, even if other threads are also
operating on the data structure.

CS3211 L5 - Concurrent Data Structures 35


Lock-free algorithms (Wikipedia)
• Lock-freedom allows individual threads to starve but guarantees
system-wide throughput
• An algorithm is lock-free if, when the program threads are run for a
sufficiently long time, at least one of the threads makes progress (for
some sensible definition of progress)
• All wait-free algorithms are lock-free

CS3211 L5 - Concurrent Data Structures 36


Lock-free algorithms
• There is some notion of progress
1. an operation is considered "completed" if the function returns
• Push operation is done (and thus progress is made) if the queue.push(...) function
returns
2. an operation is considered "completed" if its effects become visible
• for try_pop, the operation is done when the function returns, but for push, the operation
is done when the job becomes visible at pop
• If one or more threads get suspended, the non-suspended threads
must still be able to make progress by some deadline

CS3211 L5 - Concurrent Data Structures 37


Lock-free data structures
• More than one thread must be able to access the data structure
concurrently
• They do not have to be able to do the same operations
• If one of the threads accessing the data structure is suspended, the other
threads must still be able to complete their operations without waiting for the
suspended thread

CS3211 L5 - Concurrent Data Structures 38


Wait-free data structures
• Data structures that avoid the following problem are wait-free
• Lock-free algorithms with loops (using compare/exchange operations) can
result in one thread being subject to starvation
• If another thread performs operations with the “wrong” timing, the other thread might
make progress but the first thread continually has to retry its operation
• Algorithms that can involve an unbounded number of retries because of
clashes with other threads are not wait-free
• Writing wait-free data structures correctly is extremely hard
• It’s all too easy to end up writing what’s essentially a spin lock

CS3211 L5 - Concurrent Data Structures 39


Pros of lock-free data structures
• Enable maximum concurrency
• Some thread makes progress with every step
• Robustness
• If a thread dies partway through an operation on a lock-free data structure,
nothing is lost except that thread’s data; other threads can proceed normally
• You can not exclude threads from accessing the data structure
• Ensure that the invariants are upheld or choose alternative invariants that can be upheld
• Pay attention to the ordering constraints you impose on the operations
• Ensure that changes become visible to other threads in the correct order

CS3211 L5 - Concurrent Data Structures 40


Cons of lock-free data structures
• Livelocks are possible
• Two threads each try to change the data structure, but for each thread, the
changes made by the other require the operation to be restarted, so both
threads loop and try again
• Decrease overall performance, even though they reduce the time an
individual thread spends waiting
• Atomic operations used for lock-free code can be much slower than non-
atomic operations
• The hardware must synchronize data between threads that access the same
atomic variables
• Memory contention and write propagation
• Cache ping-pong with multiple threads accessing the same atomic variables

CS3211 L5 - Concurrent Data Structures 41


Contention and cache ping pong
• If one of the threads modifies the data, this change then has to
propagate to the cache on the other core, which takes time
• Depending on the memory orderings used for the operations, this
modification may cause the second core to stop and wait for the
change to propagate through the memory hardware
• Extremely slow
• Memory contention increases with the increase in number of threads
• Accessing data from the same cache line within multiple threads
• Example: a mutex used by many threads
• False-sharing can produce cache ping-pong as well, and it is more difficult to
identify.

CS3211 L5 - Concurrent Data Structures 42


Problem: false sharing
struct foo {
int x;
int y;
};

static struct foo f;

int sum_a(void) void inc_b(void)


{ {
int s = 0; for (int i = 0; i < 1000000; ++i)
for (int i = 0; i < 1000000; ++i) ++f.y;
s += f.x; }
return s;
43 CS3211 L5 - Concurrent Data Structures
}
Guidelines for writing lock-free code
• Use std::memory_order_seq_cst for prototyping
• Use a lock-free memory reclamation scheme
• Use some method to keep track of how many threads are accessing a
particular object and delete each object when it is no longer referenced from
anywhere
• Recycle nodes
• Watch out for the ABA problem
• Include an ABA counter alongside the variable
• Prevalent when using free lists
• Identify busy-wait loops and help the other thread

CS3211 L5 - Concurrent Data Structures 44


Summary
• Safely and efficiently using synchronization primitives and atomics to
implement thread-safe data structures
• Lock-based
• Granularity level in synchronization
• Lock-free
• Difficult to get right
• Principles for design for concurrent data structures

• References: C++ Concurrency in Action, Second Edition


• Chapters 6.1, 6.2, 7.1, 7.3

CS3211 L5 - Concurrent Data Structures 45

You might also like