STM User Guide
STM User Guide
Programmers’ Guide
Published July 24, 2009
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property
rights covering subject matter in this document. Except as expressly provided in any written license
agreement from Microsoft, the furnishing of this document does not give you any license to these
patents, trademarks, copyrights, or other intellectual property.
Please install the software package ”.NET Framework Beta 1 enabled to use Software Transactional
Memory” which will allow you to experiment with a preliminary release of the .NET runtime and
framework that has transactional memory support. The following guide explains how to use the
package, how to write programs with STM and how to contact us for help and feedback. We hope you
enjoy working with this offering and would help us drive improvements into future releases.
There are two installations packages. In addition to the STM-enabled version of the .NET Framework,
the second installation package contains a set of samples, documentation, and a VS2008 project
template for creating STM application. This will get you going with STM!
We would very much like to hear from you about your experience with STM. Please feel free to
comment and ask questions on the STM Devlab MSDN forum.
http://go.microsoft.com/fwlink/?LinkID=158427
11.4 STM vs. Reader-Writer Lock vs. Concurrency Data Structures (CDS) ..........................................................72
12 References .................................................................................................................................................. 82
13 Index ........................................................................................................................................................... 83
The goal of this document is to explain what STM is, why we chose this specific implementation, how to
run the provided samples, and how to write your own applications using STM.
This is both a programming guide and an introduction to transactional memory. If you want to jump
right into using STM, you will want to read the following sections first:
Section 2 “Hello, World!” This section demonstrates how to compile and run a basic STM
sample.
Section 4 “Atomic Block to the Rescue: Basic Concepts”. This section explains basic concepts of
STM.
Section 5 “Writing Correctly Synchronized Code”. This section explains how to write your code
such that it is correctly synchronized and doesn’t have race conditions.
For a deeper understanding of both STM in general and our implementation, you may want to read
through this manual in order. We have arranged it to present:
1.1 Caveats
1. This is a MSDN DevLab incubation release. There are no confirmed plans to ship STM as part of any
version of the .NET Framework. The license to use this software specifically prohibits developing
and deploying production software on this framework.
2. There are no performance optimizations. We have not spent much time optimizing the
performance of our changes. Further, NGEN is disabled. You cannot make meaningful performance
comparisons between programs executing on this framework to the same code running on another
version of the framework.
3. Alpha-level quality. We are a small team tackling a big problem. There are many known and many
unknown bugs and limitations. We have done the best that we can to ensure that all samples work
7 Copyright ©2009 Microsoft, All Rights Reserved
and that all features covered in this document work in their basic usage. Please report problems
through the MSDN forum for STM.NET. Although we will make an effort to answer users’ concerns
and issues, this software offering is not supported by Microsoft and its use is restricted as per the
terms in the End User License Agreement (EULA).
4. Modified .Net Framework 4 Beta 1 base. The current STM release is based off of the .NET
Framework 4 Beta 1 but has been modified to provide STM and work in a Visual Studio 2008
environment.
2 Hello, World!
Before we go into the fine details of STM, let’s go through a simple “Hello World” sample, STM style.
Start Visual Studio 2008. Choose File | New | Project; choose Visual C# project type, and My Templates
group. You should see there “TM Console Application”. This is a custom template that was installed on
your computer that has a combination of settings that are required for the building and debugging of
STM applications using STM.NET. Select it, specify any name/location of your liking, and click OK.
When your new project opens, take a look at the Solution Explorer on the right side:
1. app.config tweaks the VS environment to use STM.NET. In case you’re curious, this is what the file
contains. We will talk more about the dynamic checker in section 6.6.
<runtime>
<ZapDisable enabled="1" />
<gcServer enabled="true" />
</runtime>
</configuration>
2. The assembly references in your project contain versions of mscorlib.dll and System.Transactions.dll
which enable the use of STM. Use these references when developing STM-enabled applications.
We are now ready to code up our STM Hello World program. STM is about providing thread-safety.
Therefore our scenario will contain threads that are performing some tasks in a thread-unsafe manner.
We will demonstrate how to provide safety using STM.
In our scenario, there will be two string fields. The program tries to maintain the invariant that these
two fields do not reference strings with the same string value. i.e., if field1 equals field2, then we have a
violation of the program’s expectation.
Two threads will be constantly assigning string references to the fields and a third thread will be
periodically verifying that the program’s invariant is upheld.
Let’s start with a naïve and wrong program without locks or any other form of synchronization to
provide thread-safety. Copy the following code instead of Main, then build and run the program. You
should see a bunch of lines with the phrase “Violation!” in the output: this is the “watchdog” thread
noticing violations of the program’s invariant.
1
Main also has a mysterious looking attribute assigned to it [AtomicNotSupported]. You can safely ignore this for
now. We will discuss this attribute and other Atomic contract attributes in section 6.
public MyObject() { }
[AtomicNotSupported]
static void Main(string[] args) {
MyObject obj = new MyObject();
int completionCounter = 0, iterations = 1000;
bool violations = false;
We could add locks or mutexes to provide thread safety and if you want you can take a moment to do
that. Now let’s try to use STM instead. STM will provide the needed isolation, without having to worry
about what locks to use and how to avoid deadlocks.
To add the needed isolation in the Hello World sample, please replace the Validate and SetString
methods of MyObject with the following code:
Instead of using locks, we instead used “Atomic.Do” to wrap access to shared state in what we refer to
as an “atomic-block”. When you see:
Try to squint at it and imagine that you are actually seeing this:
atomic { <statememts> }
In order to provide the “atomic” keyword we had to change multiple language compilers and our work
so far has concentrated on libraries and runtime. For this experimental release, we could not provide
the “atomic” keyword in the language, so we actually demarcate atomic-blocks using a
try/catch(AtomicMarker). AtomicMarker is not a real exception that we ever throw and the catch body
Build and run the sample now, there should be no violations anymore (although, one race condition
remained in the code, have you noticed it?!)3.
In this sample case, you could modify the code by adding a lock (this) statement around the racy code.
This is not an example where STM is particularly interesting with the exception that you didn’t have to
specify what to lock. It was inferred automatically for you by the system. We will talk more about the
traits of STM and how it’s different from locks in the next section: introduction to STM.
1. The “Call string conversion function on objects in variables windows” option is disabled.
2. The “Enable property evaluation and other implicit function calls” option is disabled.
3. The “Edit and Continue” feature is disabled as well.
Please make sure that these options remain disabled whenever you are debugging STM-enabled
programs. The below screen shots illustrate how to make sure these options are disabled.
Select Menu | Tools | Options | Debugging, and then inspect the general options and the edit-
and-continue sub-options, as illustrated below.
2
Sometimes you would have to put a ‘throw’ in the catch handler of AtomicMarker, to keep the C# or VB.Net
compiler happy, since they are not aware that control flow never reaches those points. The ‘throw’, like any other
code you put there will never get executed.
3
As you run the Hello World sample and other samples you may note that exceptions of type
AbortReexecException are reported as thrown in the debugger output windows. These exceptions are internal to
the STM system. They are expected and do not indicate any problem in your environment.
That’s the bad news; the good news is that Moore’s law is by no means dead. The hardware industry is
still able to follow the path that Moore predicted. The density of transistors, in an integrated circuit, still
doubles every two years. The industry has, however, run out of effective ideas for turning those
transistors into faster processors. Therefore, they are using the extra transistors to create more
processors, building chip multiprocessors (CMP) with many processors on a single chip. Those chips are
also known as multi-cores, or many-cores.
This is difficult news for the software industry. We can no longer add new features to our applications
and rely on increasing hardware capacity to make evermore complex software perform acceptably. To
state things more positively, we can also no longer rely just on better hardware to bring applications
that were too expensive in the past to become feasible today (as was the case with, for example, speech
recognition). Now, to take advantage of future hardware capacity, we have to fundamentally change the
way we program, to exploit the parallel capacity of this new hardware.
Parallel programming is notoriously difficult. Researchers have been working for decades on systems to
make parallel programming easier, yet despite these efforts, parallel programming has not yet become
popular with all developers. Some amount of skepticism about whether we’ll succeed now is therefore
justified – but note that in the past parallel programming was a niche market, if only because of the
expense of multiprocessor computers. The hardware industry has the capability to produce chip
multiprocessors whose core counts double in time-frames similar to those we have been used to for
processor speeds – whether they will do so depends on whether anyone will buy them which, in turn,
depends on whether there is any software that utilizes their power to do interesting things that couldn’t
be reasonably done on a single-processor machine. If we succeed in creating such software, the virtuous
cycle of new hardware and software features and applications will continue.
In order to enable this virtuous cycle Microsoft is making an effort to make effective parallel
programming easier. There are many competing ideas on how this might be accomplished. Some are
fairly radical: “get everyone to use pure functional languages,” for example. Here we focus on one of
these ideas, Transactional Memory. Compared to some other ideas in this space, it is evolutionary rather
than revolutionary; programming with transactions will feel fairly similar to programming today, just
simplified.
We should be very clear that this is a different problem from either (a) designing a parallel algorithm, or
(b) coding that algorithm in a way that exposes parallelism. Task (a) is a conceptual task – given the
problem at hand, you must decide what subtasks can be run at the same time, and which may only be
run after others. Task (b) is a programming language issue – if you know that some set of subtasks
should be run in parallel, how do you express it in your programming language or libraries? This is the
problem that in .NET Framework 4 the new Task Parallel Library addresses – the TPL provides parallel
iteration constructs, where loop iterations are performed in parallel. The bodies of these iterations may
introduce new, nested parallelism.
When you’ve created parallel tasks, you have no problem until the tasks exhibit interference, where two
tasks access some shared object, and at least one of them writes to it. Perhaps the shared object has
some associated invariant (as in our Hello World example), and the writing transaction takes it from one
state satisfying this invariant to another – but via some intermediate state in which the invariant is
violated. Without some mechanism for preventing or handling interference, a reader might observe the
object in this invariant-violating intermediate state, with potentially disastrous results.
This is the current state of the art; this is what is available and recommended in languages like C# or
Visual Basic.NET. But there are many problems with using locks.
There are many possible locking conventions. Some objects may be protected by locking “this;” in other
cases, locking some single shared object may provide access to a large set of objects. In previous
versions of the .NET Framework it was considered a best practice to export a “SyncRoot” method, and
require callers to lock the resulting object rather than locking “this.” Sometimes different locks protect
different fields in the same object. A programmer striving to be careful in today’s world faces this
question every time he or she accesses a field of a shared object in a parallel program: does the current
thread hold a lock that allows me to access that field? Failure to properly follow the locking protocol
could allow a race condition, which is the worst kind of bug: intermittent (and possibly extremely rare),
with consequences that may only be manifest millions of instructions after it occurs, and in some cases
costs millions of dollars to fix if not caught on time, before software is deployed.
3.3.2 Deadlock
Deadlock is a fundamental problem in lock based programming. If thread 1 attempts to obtain lock A
then lock B, but thread 2 attempts to obtain B then A, then they can fall into a “deadly embrace”, in
which each holds the lock that the other is attempting to acquire. The solution to this problem is well-
known, too: have all threads acquire locks in the same order.
Often this dictum is easier stated than followed. It is common that large multi-threaded applications
home-grow “lock-leveling” systems that try to enforce lock acquisition order: each lock is given a level,
and a dynamic checking mode ensures that every thread acquires locks in (say) ascending level order.
Such systems can be fairly effective in finding violations (at least to the extent that the system’s test
suite exercises all the possible orders of lock acquisition in parallel executions). And in many cases this
checking could even be done statically, with sufficient annotations and checking tools. But they don’t
really give the programmer much help when a violation is detected! Lock X protects data D, and I
change some method BB to access data D, and therefore acquire lock X to do so – but the method BB
I’m modifying is called by method FF, which holds lock Y, which we’d previously decided must be
acquired after X – what should I do? One common, sometimes disastrous error in such situations is to
modify FF to temporarily release lock Y, just for the duration of the call to BB. If FF held Y in order to
modify some object, this may allow that object to be visible in an inconsistent state in which its
invariants are violated.
In short, specifying a lock order is a difficult global problem, requiring the programmer to understand
interactions between completely separate parts of a program, including implementation details of
libraries that ought to be invisible to the caller. (Consider a container class used in a concurrent setting,
which exports “Map” function that applies a delegate to all elements of the container. How does locking
that protects access to the container class interact with locking that might be required by the delegate?)
Further, lock order schemes have generally been applied to systems with static locks, in which a lock
may be named statically. This works in some cases. However object-oriented methodologies lead
naturally to more dynamic locking, in which there may be many instances of a class, each protected by
In all likelihood, composing the data structures in a safe and scalable manner is going to as complicated
as building the composite data structure from scratch, thus losing all the advantages of software reuse
and componentization.
The basic idea for adding transactions to a language is the atomic block construct:
atomic { body }
The system would promise to execute body atomically and in isolation – or at least provide those
semantics. To a first approximation, programmers can think about an atomic block as if it acquires a
single unnamed lock shared by all atomic blocks – the result will be as if no other atomic block was
running while the current one was executing. Of course, the “as if” was very important in that statement
– the underlying implementation will provide significantly more concurrency.
It is important to note that “execute body atomically” means that everything (transitively) executed
within body is part of the transaction; if body makes calls, the called methods become part of the
transaction. Transactions are lexically scoped but their execution proceeds dynamically while in scope.
Similarly, deadlock becomes a non-issue: since there is only a single “lock” to acquire (when we think of
an atomic block as abstractly taking a single big lock), there is no question of lock order. (If you’re
worried about what happens when we attempt to start a transaction within a transaction, see the
explanation of transaction nesting below – the short answer is that only the outermost atomic block has
any effect, and the inner ones are, for most purposes, semantic no-ops.)
Getting rid of the accounting and deadlock issues makes parallel programming considerably simpler. The
only issues become:
Our current “atomic” block implementation does allow the user to abort a transaction. This is done by
letting an exception escape the boundaries of an atomic block. Thus, the following two sequences are
not equivalent:
While these two samples are equivalent from isolation perspective, the one on the left will result in no
change to the variables modified while the one on the right will obviously externalize the side effects to
the variables as soon as the lock is released.
class BankAccount
{
private int m_balance;
public void ModifyBalance(int amount) {
atomic {
m_balance = m_balance + amount;
if (m_balance < 0)
throw new OverdraftException(m_balance, amount);
}
}
}
Note that if there’s an overdraft the balance modification is undone. Yes, this was deliberately and
cutely written to illustrate the failure atomicity feature – the consistency check is done after the balance
modification and then the modification is undone when the transaction is aborted due the exception
thrown. It’s convenient to be able to notice an error at any point, and know that your tentative changes
will be discarded.
Obviously, we would like the balance transfer to be atomic: we want it to happen “all-or-nothing,” and
we don’t want a concurrent operation to “steal” the money from the “from” account.
We can conveniently achieve this with another, higher-level, transaction.
Now our question is what does it mean to execute the nested transaction inside ModifyBalance, when
it’s being called from within the enclosing transaction in Transfer? We definitely do not want the effects
of the withdrawal to be globally visible as long as the entire transfer is not complete, so we get the
following rule for nested transactions:
From isolation perspective, nested transactions are flattened into their parent.
On the other hand, failure due to an over-draft in ModifyBalance doesn’t necessarily mean that the
entire Transfer operation needs to be fail. Perhaps Transfer can be coded such that it handles the
failure. For example:
atomic { <body> }
We feel this is a user-friendlier version of the following; albeit at a minor loss of performance; both
syntaxes prove the atomic-block functionality
The class Atomic resides in the System.TransactionalMemory namespace and it allows the usage of STM
with delegates, anonymous methods, closures, etc. Given what you know already about STM, the
implementation of Atomic.Do should appear trivial by now:
namespace System.TransactionalMemory {
...
public static class Atomic {
...
public static void Do(AtomicAction action) {
try {
action();
}
catch (AtomicMarker) {}
}
}
}
In general, STM implementations work by instrumenting the program’s access to shared state. In our
implementation of STM, this instrumentation is done by the Just-in-Time compiler (CLR JIT). When a
piece of code executes inside a transaction, the JIT generates a special version of the code that does the
right thing when objects are read or written into.
Isolation is achieved by associating thin locks with objects, statics, etc. at runtime. Transactional locks
support optimistic concurrency control. i.e., instead of acquiring locks pessimistically a transaction may
inspect a lock, proceed with its computation and then recheck that the lock has not changed state
before commit time.
As you can imagine the cost of all the instrumentation and fine-grained locking is significant. We are
currently measuring anywhere between 2x and 7x serial slowdown compared to lock-based code. There
are additional details in section 0 that discusses performance. It should be noted though that our
current STM implementation hasn’t undergone a significant amount of optimization. There is still a lot of
room for improvement and of course hardware support, if and when it becomes available, will
dramatically change the picture.
The next sample to “HelloWorld” is “BankAccount” which puts into code the discussion from the
previous section regarding the Atomic.Do API, transaction nesting and failure atomicity. This is how the
Transfer method is defined in the sample:
Recall that each ModifyBalance operation is in itself a small transaction, thus if it fails, it leaves no
modifications behind. To further illustrate how failure atomicity works, we first credit the “to” account,
even though the other accounts may not have sufficient funds for the transfer. We then try to withdraw
Very informally, a race condition within the context of STM is a situation where the same data can be
accessed in a conflicting manner both inside and outside transactions at the same time. “Conflicting
manner” means that at least one of the accesses is a write (reading a piece of data simultaneously inside
and outside a transaction is not considered a conflict).
Initially:
static int X = 0;
static bool X_Shared = false;
1. // “Publisher” thread 1. // “Consumer” thread
This program contains two pieces of shared data, or potentially shared data: the variables X and
X_Shared. It is very easy to see that X_Shared is correctly protected throughout the program; it is
always accessed within a transaction.
However when we examine accesses to X we see that the “publisher” thread writes to it outside of a
transaction while the “consumer” thread reads it inside a transaction. This still doesn’t necessarily mean
that the program is racy. For the program to be racy there needs to be a condition under which those
two conflicting accesses are concurrent.
How do we determine whether two accesses could be concurrent? We pretend that we are the OS
scheduler and the computer executing these pieces of code, and we observe whether at any given
moment we can get to a situation where both instructions are enabled for execution. If that’s the case,
then we have a race. In our concrete example, suppose the scheduler executes the publisher up till
Thus, we reach the conclusion that this program has a race condition on X since both threads have
conflicting operations on shared data. So this program is racy and may result in unexpected results.
Note that this observation has nothing to do with the hardware used, how many cores it has, how fast it
is, etc.
Many races are benign and never materialize. Some programmers deliberately code races into to their
programs as a means to optimize them. For example, the Double-Checked Locking (DCL) pattern is racy
under the above definition, but it may still work for locks and a given memory model. However, patterns
such as DCL are not supported under STM.NET.
Whether programs that are racy under the above definition of races work as expected or not depend on
a specification of the system referred to as the system’s memory model. Memory models can be quite
non-portable, arcane and difficult to reason about and therefore we encourage our users to stick with
the definition of race freedom that was brought here. This will save you a lot of nasty and hard to find
concurrency bugs in your code.
In addition to the general benefits of race-free design, when you’re working with STM, it is even more
important since:
Previously in this guide you were told that an “atomic” block is very similar to a locked region with a
global lock. Well, that wasn’t the entire truth. This is true only for race free programs.
What if your program contains races? In that case the semantics of your program as still well defined,
but are much harder to explain and understand so we will not do so now. Further information on the
topic can be found in [11][12]. But let us repeat our advice: keep your program race free!
This is a safe practice but it doesn’t allow taking advantage of thread locality. i.e., perhaps only a single
thread has access to a BankAccount object sometimes and then it doesn’t need to pay the cost of
transactions to access the object. Introducing a transaction in such situation is wasteful, but safe.
The best example of unstructured parallelism is the Thread API. Once you create a thread, it is executing
independently of the code that has created it. You have to explicitly wait for the thread’s completion in
order to join the current activity with the parallel activity that was carried out by the thread.
The best example of structured parallelism is the Parallel.For API which takes care of spawning and
joining the parallel activities such that when the Parallel.For API returns, the programmer knows that all
parallel tasks have completed.
With structured parallelism, you get very strong guarantees on task completion and can thus reason
about “publishing” data as you enter the API and “privatizing” the data as you exit it, all due to the
internal coordination and scoping of the API.
Data Under Tx
Control
Publish Privatize
Data Under
Local Thread
Control
Access Locally
The act of publishing or privatizing is done through modification of some state that is always accessed in
a synchronized manner, and it indicates the state of the data that is subject to moving between local
and shard states.
Consider for example a data structure that contains many records. Usually the data structure is “online”
and threads inspect it using transactions. However once in a while there is a lengthy operation that
needs to take place that scans the entire data structure. The data structure is then brought “offline”
(privatized), it is then accessed in a local manner, and when this is done, it is published again. The
PrivatizePublish feature sample shows this pattern in code. Here is the piece of code that demonstrates
both privatization and publication:
// Class declarations
static int[] items = new int[DataStructureSize];
// Privatize
int[] localItems = null;
Atomic.Do(() =>
{
localItems = items;
items = null;
});
Of course, this means that transactions that wish to inspect the array need to first confirm that it is
indeed shared. This is demonstrated in the following code snippet:
In this example, if the data is not available, nothing is done. Sometimes you must wait until the data is
available and then apply the operation. This is done in transactions using the “retry” operation, which
we will discuss in section 7.
5.2 Granularity
In the definition of race conditions we have introduced in this section, we said that it is illegal to access
the same data concurrently in a conflicting manner inside and outside of transactions. We spend a good
deal of time explaining what “concurrently” means, but we still have to define what “same data” means
here. This is where the question of data granularity comes into consideration.
The CLR memory model defines some rules regarding the atomicity of reading and writing data (load
and stores), as a function of the data types. For example a load or a store of an integer or a reference is
always atomic. You’d never get an odd mixture of bits. However with double precision floating point
values, or other “big” data types, atomicity is not guaranteed and you may be able to notice a value that
is the result of an odd mixing of concurrent stores.
Similarly, STM.NET, too, defines rules of granularity of accesses that are supported versus those that are
racy. Unsafe code and pointers can really muddy the water. Suppose you have a static variable of type
int, and you obtain two byte* pointers into the integer, one pointing into the first byte of the variable
and the other pointing into the second byte of the integer. Then suppose the first byte is modified
within transactions, and the second byte is modified outside of transactions. Would this be a race? Do
the concurrent activities in this example modify the same piece of data?
Objects: conflicting concurrent access to any part of the same object is considered racy.
Static fields: conflicting concurrent access to any part of the same static field is considered racy.
Let’s consider some examples. Suppose we have the following two types:
Also assume that we have created an instance of class C and that it’s referenced through the variable
myC. Now let’s consider pairs of accesses where at least one of the accesses is a write and furthermore
where one is transactional and the other isn’t and see whether they are considered racy or not:
4
It is important to note that the definitions that we have chosen have far reaching ramifications for performance.
In an ideal world we may have chosen the unit of granularity to be a single bit, or short of that, a single byte, but
that would make bookkeeping extremely costly.
5
But not thread static! These variables are local to each thread.
Verifiable managed code lends itself to instrumentation in a straight forward manner such that the JIT in
the CLR version of STM.NET is capable of automatically generating an atomic version of the code.
However (some) unsafe code, p/invoke and other forms of native interop are not “visible” to the JIT and
thus atomic behavior cannot be automatically inserted in such cases. We will describe in section 10 the
mechanisms that are available for “taming” such code such that it could be used inside atomic blocks.
However, there are always going to be cases where isolation and failure atomicity cannot be provided,
or it would be prohibitively complex or costly to do so. One must also be cognizant about adding atomic
support to public API’s as this is adding another dimension of supported usage that becomes part of the
API’s contract and thus cannot be retracted easily in the future.
Thus, some services will be supported in atomic blocks, while others won’t be. Some other services will
be available only inside atomic blocks. How do we codify that such that we can check for errors? We do
so by using atomic compatibility attributes:
These contracts can be applied to assemblies, methods and fields, with the restrictions captured in the
following table:
Assembly X X X -
Method X X X -
Field X - X -
Delegate Type X X X X
Once we have defined how contracts are assigned to these code elements we can start considering what
actions are legal or illegal in the code that you write. Our overall goal is to devise a static annotations
system that warns the user at build time of errors such as calling a method that has the
AtomicNotSupported contract inside an atomic block or calling a method that requires an atomic block,
outside of an atomic block.
6
In this section, when we say inside or outside of atomic blocks we mean dynamically so. For example if method
F1 has an atomic block in which it calls method F2 and F2 invokes method F3 then F3 is “invoked inside an atomic
block”—the one introduced by F1.
Non extern methods that are not explicitly annotated with a compatibility contract follow the assembly
level compatibility value.
Extern methods are by default AtomicNotSupported, regardless of the assembly level contract. We will
discuss in section 10.2 methods of making external methods supported, but let’s assume for now that
they are all AtomicNotSupported. It is illegal to just mark an extern method with AtomicSupported,
without using one of the advanced attributes described in section 10.
Fields do not follow assembly level contract. The default contract for fields is always AtomicSupported.
Only fields of classes may have an explicit contract assigned to them. Fields of structs are always
considered AtomicSupported and it is illegal to annotate them with a compatibility contract.
A compatibility attribute may be applied to a delegate type. The default for delegate types is
AtomicUnchecked.
At other occasions we know less about the state of the thread. When a method M has a contract of
AtomicSupported we assert that M can be called either within transactions, or outside of transactions.
Thus when we author the method, we can only safely do things that are allowed under both conditions.
We may not call AtomicNotSupported methods, since M may be invoked inside a transaction, and we
also may not call methods with an AtomicRequired contract, since M may be invoked outside of a
transaction.
So let’s formalize the concept of “what is known about the compatibility of a piece of code” as atomic
context. Any IL instruction in any IL method has a unique and unambiguous atomic context value
assigned to it from the following set:
AtomicContext.Always. Control flow will always reach the instruction while the thread is in an
atomic block.
AtomicContext.Never. Control flow will never reach the instruction while the thread is in an
atomic block.
AtomicContext.Unknown. Control may reach the instruction while the thread is in an atomic
block, but it may also reach the instruction when the thread is not within an atomic block.
In order to decide the atomic value of instructions we need to look at the method body that they’re a
part of, and at the effective compatibility value of the method. No other information is required.
The effective atomic compatibility value of the method determines the initial atomic context for the
method, in a pretty straightforward manner:
At this point in our guide we have introduced a single point where the context changes:
An ‘X’ in the cell indicates that a method with the contract associated with the column can be called
inside the context associated with the row. So AtomicRequired can be called inside an Always context,
but not inside Never and Unknown contexts.
The rules for invoking a delegate are identical to those of method invocation and field access, except
that it is always valid to invoke a delegate with a compatibility value of AtomicUnchecked. In case the
method invoked doesn’t support the context it was invoked in, through the delegate, a dynamic
exception will be raised. We will talk more about dynamic contract checking in section 6.6.
[AtomicRequired]
public delegate void MyAction();
[AtomicSupported]
public void M1()
{
}
Following the same principle for all contexts and compatibility values we arrive at the following
construction validity rules for delegates.
AtomicUnchecked X X X
AtomicRequired X - X
AtomicNotSupported - X X
AtomicSupported - - X
6.4 Polymorphism
Whenever virtual or interface methods bind themselves to a particular contract, derivations and
implementations of the given method must provide the contract that the base class or interface has
committed to. If a derived method is unable to provide the contract that the base obliged it to, then it
can choose to throw NotSupportedException in case it is invoked in an atomic context it doesn’t
support. AtomicRedirect, described in section 10.2, provides a good way of doing that in a statically
verifiable manner.
Any program that uses our STM system should adhere to these rules in order for it to work correctly.
If we have these three basic rules, then why did we feel the need to invent a more elaborate contract
checking scheme? These three rules can be verified at runtime since we always know the AtomicContext
in which the thread is currently executing and the AtomicContract of the field or method that is
currently being accessed by the thread. However, due to the language features such as polymorphism
and delegates, at compile time we do not always know these two pieces of information.
Our contract compatibility scheme helps fill this gap. It allows you to formally specify contracts on the
different language elements. Once these contracts are in place our static checking tool, TxCop, can verify
whether your assembly is correctly using transactions.
Our static checking scheme is conservative so there is room for you to ignore some of the errors that are
generated. For instance, let’s say you've implemented an interface method. The interface method has a
contract of AtomicSupported. Now according to the polymorphism contract checking rules the
implementing method that you’ve written should also have the AtomicSupported contract, so that it can
be invoked in all the contexts that the interface method is invoked. However, you annotate the method
as AtomicRequired since you expect it to be always be invoked inside a transaction.
TxCop, our static checking tool, will flag the AtomicRequired contract on your method as an error.
However, given your understanding that this method will never be used outside of a transaction, it is
safe for you to execute your program.
Broadly, our polymorphism errors serve as warnings and can be ignored when the programmer thinks it
is safe to do so. However, ignoring any of the other errors generated by our static checker is likely to
lead to runtime exception.
C:\Program Files\Microsoft.Net\STM>TxCop.exe /?
TxCop.exe
/assembly <assembly>
/pdb <symbols> (not required if pdb is colocated with the assembly)
/reference <reference or comma seperated list of reference> (short form /r)
/verbose specifies whether informational messages should be displayed
/exclude <warning type>
The /assembly option is used to specify the assembly or executable you want to run the static checker
on. The /pdb option is used to specify the location of symbols for the assembly you are checking.
After running TxCop it will generate a list of errors and print them out on the console.
An example output of running TxCop from the command line on assembly will look as shown below. The
output gives two different categories of information.
The first two lines are informational messages (only displayed if /verbose is specified). These lines tell us
that TxCop inferred a certain contract for a delegate or closure -- since delegates and closures cannot be
annotated by our contracts, which are nothing but .Net attributes, we infer their contract based on the
contract of the methods they invoke and the fields they access.
The last line gives an example of how a static checking error will look if TxCop is run from the command
line. This specific error tells us that the contract of a method or closure passed to a delegate type’s
constructor is not compatible with the contract of the delegate type.
To add a post-build event open Visual studio, load your solution, go to the Project menu and click on
Properties. This will bring up the properties pane. In the properties pane, click on the ‘Build Events’
option. At this point you will see two text boxes.
Enter the command to execute TxCop in the textbox that says ‘Post-build event command line’ or click
on ‘Edit Post-build’ and then enter the command in the pop-up window and then click on OK.
Now when you compile your solution, you will see a list of errors in the ‘Error List’ window in Visual
Studio.
We have reproduced a screen shot of an example of using TxCop with VS integration. The screen shot
highlights several things we have mentioned earlier in this section and shows the errors displayed in the
error list window as well. Clicking on the error will take you to the offending file and line. The underlying
framework on which TxCop is built is still not completely mature, which is why you may notice that
some of the line numbers are a little off at times.
The post-build event command line that is used for the Dictionary sample, which was used to generate
the screen shot is given below. Please use it as a reference when adding your own post-build event.
"$(ProgramFiles)\Microsoft.NET\STM\TxCop.exe" /assembly"$(TargetDir)$(TargetFileName)"
/r "$(ProgramFiles)\Microsoft.NET\STM\mscorlib.dll" /r
"$(Windir)\Microsoft.NET\Framework\v4.0.STMret\system.dll" /r
"$(Windir)\Microsoft.NET\Framework\v4.0.STMret\system.core.dll" /r
"$(Windir)\Microsoft.NET\Framework\v4.0.STMret\system.data.dll","$(Windir)\Microsoft.N
ET\Framework\v4.0.STMret\system.data.datasetextensions.dll" /r
There are four different strictness levels of the runtime checker: minimal, relaxed, strict, and highest.
The following table defines the violation types caught by the runtime checker at different strictness
levels:
3. Strictness level
STM violations caught by the runtime checker
7
Note that user’s code is not allowed to add AtomicRequired or AtomicSupported to a native method, unless
AtomicSuppress is added too. Only native methods that are implemented by the CLR can have an
AtomicSupported or AtomicRequired annotation without an AtomicSuppress. Thus, any p/invoke in your code
MUST have an effective atomic compatibility value of AtomicNotSupported OR have an AtomicSuppress attribute.
Highest Reports all violations of the Strict level and these additional non-critical ones:
One reason that we provide four strictness levels is that we want to accommodate different user
scenarios. Eventually, for products using the STM system, the runtime checker should run at the strict
level to enforce the three basic rules of the STM system as much as possible for safety concerns. But
during the early adoption stages of the STM system, it’s very likely that a programmer would like to
experimentally run her program inside transactions to see if it’s possible to transition it to transaction-
safe code. Or she may have transitioned all of her code to the transactional version, and marked them
with appropriate contracts. But she also needs to use a third party library, which does not have any STM
contracts. The programmer may want to run the third party library, but still have contracts in the code
checked, which is one of the scenarios making the relaxed level useful, since it allows strict checking of
assemblies that are “STM aware” and at the same time more lax checking on legacy assemblies.
Users can configure how the runtime checker reports a violation. You may choose to throw an exception
(of type AtomicContractViolationException) whenever the runtime checker detects a violation. But
sometime, you may want to run the program once, gather all error messages, fix them one by one, and
then try again. Throwing exceptions on every violation makes the diagnostic process inconvenient.
Therefore, we allow you to choose how to handle violations:
If the runtime checker detects a critical violation, it will always throw an exception since such a violation
may result in incorrect program execution or a system crash. For harmless violations, the runtime
checker won’t throw exceptions since they are harmless unless at the Highest level.
The app.config file in each of the samples that are provided contains an example setting for these
variables.
6.7.4 Example
Let’s say we have the following program8:
using System;
using System.TransactionalMemory;
using System.Runtime.CompilerServices;
class TestObject
{
public TestObject()
8
The NoInlining directives that are present in this example prohibit the JIT from inlining the given methods into
their callers. When a method is inlined, method level compatibility checks will not be performed.
[AtomicRequired]
[MethodImpl(MethodImplOptions.NoInlining)]
public void AtomicRequiredMethod()
{
m_privateField++;
}
[AtomicNotSupported]
[MethodImpl(MethodImplOptions.NoInlining)]
public void AtomicNotSupportedMethod()
{
Console.WriteLine("This method is not supported in transaction.");
}
[AtomicNotSupported]
public static void Main(string[] args)
{
TestObject obj = new TestObject();
try
{
// Start a transaction.
Atomic.Do(delegate
{
// Call a NotSupported method within the transaction.
obj.AtomicNotSupportedMethod();
});
}
catch (AtomicContractViolationException e)
{
Console.WriteLine(e.Message);
}
try
{
// Call a Required method outside the transaction.
obj.AtomicRequiredMethod();
}
catch (AtomicContractViolationException e)
{
Console.WriteLine(e.Message);
}
}
}
Further let’s assume we run the runtime checker at the strict level with STMExceptionOnViolation
enabled, and STMLogOnViolation disabled. We will get the following exception message:
Not the exact messages you would expect, right? This is because there are many compiler generated
methods (to support the C# closure language feature in this case), on which we can’t put atomic
compatibility contracts. To solve this problem, you can run the runtime checker in the relaxed mode,
which will ignore all elements without annotations, or you can add an assembly level AtomicSupported
attribute such as:
[assembly: AtomicSupported]
In general, the Parallel Extensions for .NET promote a programming style that is centered on non-
blocking tasks. When the tasks obey the non-blocking style, the system can schedule them optimally on
just the right number of OS threads and thus save a lot of memory dedicated to threads’ stack and
eliminate unnecessary context switches.
Given all of the above, our recommendation for when you think you need to block inside a transaction
is: don’t! Take a step back and think how you can refactor the problem differently, using non-blocking
tasks and structured parallelism.
One example that is often used to explain blocking patterns is the consumer/producer pattern. In this
pattern there is a blocking queue. A producer wants to add elements to the queue, but needs to wait
How would such a pattern be coded with non-blocking tasks? You’d do that by scheduling the consumer
code of processing a removed element as a continuation to the production of the given element. For
example, using Microsoft’s Task Parallel Library, a continuation of a Task may be specified using the
ContinueWith API.
However, we recognize that the world cannot switch instantly to this style of coding and therefore we
do provide a mechanism for coordinating transactions, although we are unsure whether we will
continue to support it in the future. You feedback on the question is highly appreciated.
STM.NET implements the “retry” coordination mechanism first described in 7.1. The semantics are
dead-simple: if you hit a retry statement then the transaction rolls back and re-executes. e.g., consider
this example:
Suppose a thread calls Get() and m_item is null. In this case the retry statement would be
encountered, the transaction will roll-back and re-execute. At this point we can imagine the reader is a
little bit anxious: is it just going to spin and re-execute, eating up my CPU and killing my battery?
While in general there are many ways to implement “retry”, our implementation is actually pretty
precise, meaning it will only wake up a transaction once some object or a static that it has examined
before retrying has actually changed by other transactions. This still doesn’t mean that the condition
that caused it to retry will be satisfied when it is finally woken-up, but it does reduce by a large factor
the amount of unnecessary spinning that is associated with “retry”.
One central property of retry is that it rolls back anything done so far in the transaction. This means it is
not possible to externalize to the outside world why you’re waiting and what it is that you wish be done
by parallel activities, which is a pattern common with condition variables.
If you are familiar with API’s like Monitor.Wait/Pulse you’ll notice the conspicuous absence of an API to
“pulse” waiting transactions. This is the case since with “retry”, the system automatically detects, as
transactions commit, whether “interesting” state changes have occurred and what waiting transactions
should be woken up and re-executed.
However the implementation has some optimization tricks that allow it to first retry just inner
transactions and then progressively roll back all the way to the top, if necessary.
Initially:
static int X = 0;
static int Y = 0;
// Thread 1 // Thread 2
atomic { int lx, ly;
X++; atomic {
Y++; lx = X;
} ly = Y;
}
Debug.Assert(lx==ly);
If we think of “atomic” as implemented in terms of a global lock, it’s clear that the assert on thread #2
should never fire. However in our implementation it is possible that thread 2 first fetches the value of X
and then thread 1 executes its transaction. Thread 2 is now “doomed” but it still doesn’t know that. It
proceeds to load the value of Y which is one bigger than the value it fetched for X. Now thread #2
reaches the end of the atomic block and it would validate its reads. It will discover that the version of
the lock associated with X has changed since it first read X and therefore it will roll back and re-execute.
Thus thread #2 will only exit the atomic block once it has consistently read both the values of X and Y
and therefore the assert will never fire. However, there will be intermediate states in which lx != ly.
These states are in general not visible to the program. The system maintains the illusion as if the atomic
block is really executing under a single global lock9.
// Thread 1 // Thread 2
atomic { int lx, ly;
X++; atomic {
Y++; lx = X;
} ly = Y;
if (lx != ly) {
// Can we get here?
throw new AssertException();
}
}
Again under global lock we would never see an exception thrown and therefore we arrive at the
conclusion that the transaction can only abort if it is consistent. If it is not consistent it will re-execute.
What does that mean in our single global lock model? Previously we said that the atomic block is
equivalent to a global lock under the following transformation:
9
But remember: this illusion is only provided for race-free programs.
We will now refine this transformation such that it more accurately depicts the fact that a transaction
abort is a consistent event10:
So essentially, an aborting transaction is identical to a read-only transaction reading the same values as
the aborting transaction and then successfully committing. The values that are read by an aborted
transaction can be communicated reliably to outside of the atomic block by capturing them in the
exception that is thrown out. Here is yet another example demonstrating this:
// ...
static int X = 0;
static int Y = 0;
static int Z = 0;
10
While this is an accurate model, our system of course doesn’t implement the model using a single global lock.
The section on performance tuning (section 0) contains a great deal of detail on how this model is actually
implemented in STM.NET.
Transactions #1 and #2 are equivalent. They both consistently read the values of the shared variables X
and Y and they result in no modification to any other variables.
If before re-throwing the exception on line #6 we first undo all side effects in line #5, then that would
include, potentially, the construction of the exception object that is thrown by the transaction. i.e., part
of the “undoing” would be reversing the effects of the initialization of the exception object, in the likely
case that it was constructed inside the transaction.
Thus, the above transformation can result in throwing a “weird” exception object. e.g., if we consider
MyException object, the XValue and YValue fields would have their value reset to zero before the
exception is re-thrown.
We need a solution that would “freeze” the exception object (and everything that it refers to) in a
manner that will allow resurfacing the “frozen” exception object in its consistent state outside of the
atomic block. Luckily such as abstraction already exists in .NET when exceptions have to be marshaled
across isolation boundaries, e.g. in cross app-domain calls. What happens is that (at least conceptually)
the exception is serialized into a buffer that is passed from inside transaction to the outside world. The
reading of the object graph for serialization is done transactionally and thus it becomes part of the
validation criteria of the transaction. In other words, the serialized state of exceptions is always
consistent. The writes into the buffer are done through some “magic” such that they are not rolled back
when the rest of the side effects of the transactions are rolled back. Finally, when the transaction has
rolled back the serialized exception state is deserialized and the exception object is re-thrown.
1. The exception object that you catch outside of the atomic block is not the same one that is
thrown inside the block. It is the result of serialization/deserialization pass.
2. At the same time, the system reserves the right to introduce optimizations that will allow
surfacing the same exception object. So the exception that you get out may or may not be the
same object you throw inside the block, but it is functionally identical (under
serialization/deserialization rules).
3. You thus must ensure that the exceptions you throw out of atomic blocks are serializable.
4. Be careful not to point to big amounts of data from your exception object, as this data will be
copied in the marshalling.
In summary the best way to think about surfacing exceptions from atomic blocks is that it’s extremely
similar to throwing exceptions across app-domain calls.
Rolling back a transaction. Transaction roll back is a system mechanism that undoes all the
effects of the current transaction. It is employed in various situations such as when a transaction
aborts, retries, or sometimes when it encounters contention on a lock and re-executes.
Aborting a transaction. A transaction is aborted when a thrown exception escapes the
boundaries of the atomic block and the transaction is in a consistent state.
Retrying a transaction. A transaction is retried when it encounters a retry statement. This
results in the rolling back of the transaction, waiting for changes and then re-executing the
transaction.
Re-executing a transaction. A transaction may roll back and re-execute multiple times, until it
results in the transaction being aborted or committed. These are the only two terminal states of
a transaction.
In the traditional transactions’ world, e.g. in Microsoft’s Distributed Transaction Coordinator (DTC) there
exists the notion of transactional resource managers. Transactional resource managers or simply RM’s
manage a resource that participates in a transaction. As work proceeds against resources, the resource
managers keep tabs on what in particular has been done (such that it can be undone if the transaction
aborts) and what locks are necessary and have been taken to ensure isolation (such that they are
released when the transaction completes etc.). The transaction coordinator orchestrates all the
resource managers that are involved in a transaction such that they reach agreement on the outcome of
the transaction. More information about the topic of distributed transactions can be found in Jim Gray’s
book on transaction processing [14] and other sources.
In this implementation of STM you can see that it is both a resource manager, managing the state of
.NET objects that are accessed during the transaction, and a transaction coordinator that controls the
lifetime and outcome of the transaction, with STM being the only resource manager participating in the
transaction. When we set out to control entities beyond .NET memory we once again reach the DTC/RM
design.
Remember the BankAccount sample of section 0? It was handy to get failure atomicity from STM. But
that sample dealt only with .NET memory; it is not what happens with real bank accounts – those live in
durable databases. Database programmers have perfected over the decades the art of making the
operations in their code ACID (Atomic, Consistent, Isolated, and Durable). It is achieved by using
transaction coordinators and resource managers.
Windows has a pretty compelling story here: three transaction managers (distributed: MSDTC; kernel:
KTM; and managed LTM in System.Transactions namespace) work with a group of resource managers
(e.g. SQL, MSMQ, Transactional NTFS (TxF), and the transactional registry). With these, the user can get
failure atomicity from any combination of database, networking, file, registry operations—either all of
the operations happen, or none do. It makes programming much easier, since it eliminates extremely
error-prone processing of all possible outcomes.
It would be nice to add memory to this list. Actually, it is already achievable today by writing a custom
Volatile Resource Manager, a concept in LTM (Lightweight Transaction Manager); but it requires you to
manually write a resource manager for every custom data structure or variable that you need to access
in a transaction—you need to provide isolation, failure atomicity, and provide mechanisms to avoid
deadlocks etc. These are exactly the things that STM.NET provides, so it seems that STM would be a
great way for handling “volatile resource management” or, in other words, transactional management
of in-memory state.
There are many applications that manage both durable resources and in-memory data. The memory can
represent not only transient data, but a longer-used model, reference data, or a cache. It is important to
be sure that in-memory data correspond to the durable “truth”. STM together with system transaction
Atomic.Do(() =>
{
<< memory operation >>
<< database operation >>
<< any other transacted operations, e.g. networking via MSMQ >>
});
Let us illustrate these capabilities using the BankAccount sample we covered in section 0: the accounts
will now live both in a database and in memory, and they need to be kept in sync. We would love to
show you the program with SQL, but we cannot do that yet. Though we have fully prototyped STM
integration with LTM/DTC, each resource manager library (in SQL’s case, ADO.NET) needs some
modifications before it could be used, and in our incubation effort we chose to first enable MSMQ–
Microsoft Message Queuing—for use in atomic blocks. With respect to transactional behavior, MSMQ is
not much different from SQL; some customers actually use MSMQ for storing durable data and
accessing it in a transactional manner. So here is the real code that you could run on STM.NET today:
You are guaranteed that the balance amount is always the same in memory and in the queue, and you
didn’t have to write any additional code for that. Atomic here starts as a usual STM transaction
(relatively cheap). Then, when the MSMQ send operation is invoked, the STM transaction is promoted
into a LTM transaction (LTM, in its turn, may promote it to MSDTC). From that point, LTM or DTC
manages the coordination of transaction, and STM becomes a Resource Manager, responsible for
memory only—the same way as the sibling MSMQ resource manager is responsible for durable data.
The installation contains two samples11 demonstrating the use of traditional transactions with STM:
TicTacToe and TraditionalTransactions. It is an interesting question; “how do Atomic blocks compose
11
Note: while running samples in the debugger, you may notice two exceptions that we handle internally –
HoistingException, which signals the need for promotion from STM to LTM, and SystemReexecException, that is
used on the slow path for combined STM-system transaction. User should never see these exceptions.
You cannot use an Atomic block inside of a top-level system transaction, if there is a chance that the
code inside Atomic will call some other operations relying on System.Transactions. The reason for that
restriction is that System.Transactions does not support true nesting with partial failure atomicity.
System transactions are always flattened.
One more thing to note is that for integration with traditional transactions you have to use Atomic.Do
(and not try/catch with AtomicMarker).
Since Atomic can play a façade role for TransactionScope, we also allow passing through
TransactionScope options. Here is the API:
One more question you might have now is how it all affects performance. The answer is that it is all
“pay for play”: if you don’t use integration with traditional transactions, your pure memory transactions
are not slowed down. On the other hand, if you do use durable transactional operations, their latency is
going to dominate the performance of your code anyway, such that adding STM on top of your in-
memory data structures and algorithms is hardly even measurable.
For legacy code and for I/O operations however, we still have to present solutions that will allow their
usage inside transactions. Well, in this section we are finally going to be exploring this aspect of
STM.NET.
The solutions we will explore run the gamut from very simple and crude, such as “punching through” the
transaction to emit debug output, to sophisticated and high fidelity, such as using a transaction to
53 Copyright ©2009 Microsoft, All Rights Reserved
writing into a message queue through integration with System.Transactions. When dealing with legacy
I/O there isn’t one correct answer. The “correct” answer depends on the nature of the resource that one
wishes to leverage inside transactions, whether it already has transactional semantics and to what
degree of fidelity isolation and failure atomicity need to be provided.
We will start with primitive building blocks and work our way up the stack.
[AtomicSupported]
[AtomicSuppress]
public static void PrintMe([AtomicMarshalReadonly]string msg)
{
Console.WriteLine(msg);
}
In the sample we would like to use console output as a debugging aid, so we would like the messages
that we output to print regardless of whether the transaction commits or aborts, and no matter how
many times it is re-executed. (Debugging is one of the main uses of [AtomicSuppress].)
In order to do that we wrap Console.WriteLine in a method called PrintMe which we decorate with
[AtomicSupported] and [AtomicSuppress]. As you can imagine, Console.WriteLine has a contract of
[AtomicNotSupported] as it really needs to drive output to the console, which is not at all a transactional
activity. PrintMe, on the other hand, has an explicit contract of *AtomicSupported+. Isn’t there a contract
violation here with an [AtomicSupported] method calling an [AtomicNotSupported] method? The
answer is no. The [AtomicSuppress] in PrintMe instructs the system to put the transaction on hold and
therefore the body of PrintMe is running outside of the atomic block and can call any
[AtomicNotSupported] method it wishes. The onus is on the programmer to make sure this indeed
makes sense to do, and this is exactly why this feature is a little bit tricky.
Every piece of information that you want to pass to the suppressed function must be marshaled out.
Does that sound familiar? You may recall that in section 8.2 we explained how exceptions were
marshaled outside of atomic blocks. Suppress uses a similar mechanism.
In the sample above we have created a string (referred to by the string reference variable ‘msg’). That
string object having been constructed inside a transaction cannot be accessed directly inside the
suppressed method; it must be marshaled to it first. This is achieved using the [AtomicMarshalReadonly]
attribute on the ‘msg’ parameter of the PrintMe method. The semantics of this attribute are the
following:
Clone as necessary (using standard serialization infrastructure) the given object and the entire object
graph rooted at it such that the resulting object graph can be read in the suppress method.
It provides reasonable semantics such that you could call PrintMe also outside of transactions. If
you did that, we wouldn’t want to introduce any marshaling. If we don’t introduce any
marshaling, it would still be legal to read the original object. So if you coded PrintMe such that it
only reads the parameter, it would work both inside and outside of transactions.
If you haven’t really modified the object inside the transaction, the system may be able to give
you the same object to read in the suppressed method, eschewing the costly cloning process.
The system may be able to modify some objects in-place when they are modified inside
transactions. In that case too the system will be able to pass the same object reference to the
suppressed method without additional cloning, which is again a performance win.
Note that marshaling is only necessary when you need to pass objects that were modified or created
within the transaction. If you have an object reference pointing to an object that is consistently only
updated outside of transactions, then no marshaling is necessary.
The [AtomicMarshalReadonly] attribute cannot be applied to parameters of type pointer or by-ref (ref
or out parameters in C#), nor is it valid to pass a by-ref or a pointer to memory that has been updated by
the transaction.
So far we have covered passing state from within the transaction and into the suppressed method. What
about the other direction? Can state be created in the suppressed method and passed back to the
transaction? The answer is yes, this is a variant of the publication pattern (section 5.1.3). Here is an
example that demonstrates:
[[AtomicSupported, AtomicSuppress]
private static string LoadReasourceString(
[AtomicMarshalReadonly] string resourceId)
{
// Heavy lifting here. Load resource dll, cache strings etc.
}
This method will suppress the transaction, will load a resource assembly, lookup the asked-for resource
using the string resourceId (which was marshaled in) and will finally return the result. The result will
then be available for the transaction to inspect. In this case we are passing a string back into the
transaction, which is an immutable type, but in other cases mutable objects may be passed into the
transaction as well and the transaction will be free to modify them as long as they cease to be available
to non-transactional code. i.e., it is valid to do so as long as the publication pattern is implemented
correctly.
It is interesting to note that the operation LoadResourceString is logically side-effect free. It is true that
at some layer of abstraction it is not at all side effect free—a resource assembly is loaded, caches are
populated etc. But at a higher layer of abstraction, these details are just implementation details of a
functional interface that maps resource ID’s to resource strings in a fixed manner, like a mathematical
function that always provides the same output for a given input. It is common to use suppress around
such functional interfaces.
We also do not support starting new transactions from within suppressed methods since this may cause
a deadlock in the STM implementation.
10.2 AtomicRedirect
Sometimes it is useful to do different things depending on whether or not you’re in a transaction.
AtomicRedirect provides a mechanism to redirect control flow to an alternative method. For example,
suppose we have a class that doesn’t want to support transactions at all. However the class overrides
Object.GetHashCode and Object.GetHashCode has a contract of [AtomicSupported] that requires all
derivations to be [AtomicSupported] too. The derived class cannot change the contract on
Object.GetHashCode, this is out of its control. However, it can throw a NotSupportedException if the
GetHashCode method is ever called inside a transaction. This is how you would code this:
[AtomicRequired]
private int TxGetHashCode()
{
throw new NotSupportedException();
}
//...
The [AtomicRedirect] takes a single string parameter which is the name of a private method in the same
class, with the exact same number of arguments, return values and it must also be static if the
“redirected” method is static, or an instance method, if the redirected method is an instance method
(i.e., they must also agree on the “this” parameter). The method that is the redirection target must have
an effective atomic compatibility value of either [AtomicSupported] or [AtomicRequired].
[AtomicRequired]
private static bool TxGetIsInAtomicBlock()
{
return true;
}
Now assume that we had provided IsInAtomicBlock as a static property in the Atomic class. Then we
could have coded the GetHashCode in the following manner:
[AtomicSupported]
public override int GetHashCode()
{
if (IsInAtomicBlock)
throw new NotSupportedException();
return SomeNotSupportedMethod ();
}
//...
Which version is preferable? The latter version is indeed more concise, but note that it is not easily
verifiable as [AtomicSupported] since its body contains a call to SomeNotSupportedMethod. So in other
words [AtomicRedirect] provides a statically verifiable atomic context transition that allows static
checking of atomic contract compatibility.
One example of a problem that can ensue from using locks with atomic blocks is the possibility of a
deadlock. Consider this program:
//Thread 1 //Thread 2
atomic { lock (obj) {
lock (obj) { atomic {
// ... // ...
} }
} }
Generally, you need to consider “atomic” as a distinct lock in your application lock hierarchy in order to
avoid deadlock. One simple way of ensuring that this principle is upheld is using this pattern:
The problem with side-effecting actions is that optimistic concurrency control assumes potential re-
execution and not every operation can be safely repeated. For example, if you print inside transaction,
you don’t want to see the multiple printouts in case transaction gets re-executed. If you format the disk,
you don’t want it to happen even if a transaction eventually aborts.
Since STM.NET integrates with traditional transactions we have a chance to provide a solution. We
cannot offer magic, but we can offer a bunch of partial solutions, which might help in majority of
situations. Please note that our I/O proposal is based on LTM integration which means you lose STM
nesting benefits, specifically, partial abort of nested transactions. We provide multiple mechanisms:
Atomic.Do( () =>
{
m_balance = m_balance + 1;
Atomic.DoAfterCommit( (Object o)=>
{
Console.WriteLine("m_balance= " + o);
}, m_balance);
});
You can safely do anything inside of the delegate that does not require a transaction, since it is executed
outside of the transaction, not more than once, after it had committed. But don’t forget that all the
data you might use, except of the specially transported context, will be in their post-transaction global
state - it is as if you just coded your actions after the transaction, but used the saved context. For
instance, the example above might give you simple tracing facility. Case 10 in TraditionalTransactions
sample illustrates deferred actions.
what to do immediately;
how to compensate it, if needed;
common context for both actions.
Both the immediateAction and rollbackAction will be executed under a suppressed context (first
immediately, second – during rollback, if it happens), so you can use any operations inside that does not
require a transaction; just beware of using any data other than that which was explicitly passed via the
context parameter.
Atomic.DoWithCompensation(
(Object o) => {Console.WriteLine("Immediate action : " + o); },
(Object o) => {Console.WriteLine("Compensating action: " + o); },
m_balance);
m_balance = m_balance + 1;
});
Here is an example (see case 13 in TraditionalTransactions sample): we implement a text file appending
operation. Please note how we abstracted away most of STM/LTM details:
try {
m_lines.Add(line);
}
catch (Exception e) {
FailOperation();
throw e;
}
}
// Implementing virtual methods of TransactionalOperation
public override void OnCommit() {
foreach (string line in m_lines) {
m_tw.WriteLine(line);
}
m_lines = new List<string>(); // Prepares for the next Tx
}
public override void OnAbort() {
m_lines.Clear();
}
}
And here is how you can use TxAppender inside an atomic block:
10.6 Summary
Introducing optimistic concurrency control into an existing program will never be seamless, since some
parts of it suddenly start to re-execute. But you can prepare resource managers for popular operations
This section walks through the state of performance in the current release. We reveal some design and
implementation details to help the reader gain better understanding of the performance behavior. We
also outline some performance optimizations and tuning work can be helpful in the future, and
demonstrate the potential performance gains we are likely to obtain by doing so.
All the performance data reported in this document is collected on a 24-way machine.
For the future, we have identified a series of compiler and runtime optimizations as well as code tuning
that we believe will reduce the sequential slowdown. Meanwhile, we also keep a close eye on the
relevant techniques exploited by the research community.
The following table shows the sequential slowdown of one benchmark on the current release. The
benchmark we use here is a thread-safe red-black tree data structure. We can perform operations, such
as lookup, insert, and remove, on the tree. This benchmark does not exploit many C# language features,
so it is only used to give the readers a rough idea on the sequential cost of a common data structure on
the current STM.Net release. For the measurement, each thread performs 10,000,000 operations. We
report the average of 5 runs.
As shown by the above tables, for the read-only cases, the STM version shows a 3.9x and 3.3x sequential
slowdowns over the lock based and ReaderWriterLockSlim-based versions, respectively. The updating
operations may contain writes, though they are still dominated by reads. It is clear we pay a higher cost
for writes than for reads. The sequential slowdown is exacerbated when there are more writes. Note
that in this microbenchmark, all threads are essentially always executing transactions. Obviously, many
programs spend only a small fraction of their execution in synchronized regions, and such programs
would experience correspondingly smaller sequential slowdowns.
11.2 Scalability
This section provides information on the scalability characteristics of STM.NET. Generally speaking, STM
achieves great scalability, unrivaled by any other technology with the same level of usability and
composability, to the point where it is able to offset the sequential slowdown at relatively low core
counts. Section 11.4 provides some detailed measurements in two scenarios, but just to illustrate the
point here, consider the below performance curve:
12
Use lock(this) {}
13
Use System.Threading.ReaderWriterLockSlim class supported since .Net 3.5.
Thread 1 Thread 2
atomic { atomic {
o.a = 1; localVar = o.b;
} }
In the above example, thread 1 writes to field “a” of object “o” in a transaction. Concurrently, thread 2
reads field “b” of object “o”, also in a transaction. Though you may feel there is no conflict between the
two transactions, there is! Because they are accessing the same object “o” and thread 1 is writing to a
field, the two transactions conflict, and thus they will be serialized by the runtime. If this is not the
desired behavior, the programmer needs to consider redesigning the data structure, e.g., by breaking-up
the object into two sub-objects, one containing the field ‘a’ and the other containing the field ‘b’.
STM.NET handles array objects differently from normal objects, as will be explained in the next section.
When there is any form of conflict in the system, we employ the use of a contention manager (CM) to
help decide what action to take next. Contention arises from conflicts detected when attempting to
acquire pessimistic locks (pessimistic reads and writes), and from conflicts detected when validating
optimistic reads when the transaction prepares to commit.
The contention manager is responsible for a variety of chores: identifying the transactions involved in
the conflict, determining what each transaction should do in response to the conflict, and helping each
transaction respond (i.e., by providing functionality to wait/rollback/re-execute an operation, etc.). How
the CM makes these decisions is referred to as the contention management policy. There are a large
number of policies possible, and many studied in various research papers [1], [2].
In this release, we provide an exponential backoff based contention manager. It is fairly simple—the
transaction that detects conflict is the transaction that is re-executed, with exponential backoff before
re-execution. This CM also performs dynamic read-mode switching when re-executing a transaction.
When a transaction starts, it attempts to use optimistic read techniques for all read operations. Once it
detects a conflict and rolls back for re-execution,
if the transaction has a large read set, it is considered as a large transaction, it re-executes using
pessimistic reads.
if the transaction has re-executed many times (reaching a pre-defined threshold), it re-executes
using pessimistic reads, too.
The dynamic read-mode switching increases the chance of commit for a large transaction or an old
transaction.
This release does support array concurrency in transactions. That is, our runtime allows multiple
transactions to access different chunks of the same array concurrently. At runtime, if the size of an array
is larger than a pre-defined threshold, it can be accessed concurrently by multiple transactions.
Otherwise, it is still treated as a normal object. At this point, the runtime is responsible for dividing the
array into multiple chunks, such that accesses to the different chunks can be concurrent, and accesses to
the same chunk are still serialized. Also note that the runtime divides the array into chunks “logically”.
That is, a chunk consists of an integral number of elements. An element will never be split between two
different chunks.
In this release, for a given array, the runtime uses array concurrency for it only if the size of the array (in
bytes) exceeds a pre-defined threshold. The current default size in retail builds of STM.NET is 128 bytes.
You can change it through a configuration parameter: STMMinimumArraySizeForConcurrency. For
example, if your application only uses arrays as read-only inside transactions, you can set the threshold
to a large threshold, so that array concurrency is not used, thus the overhead of the array concurrency is
avoided.
We would like to stress that even though we provide contention management for arrays at a more
granular level than the entire object, it is still considered a race to access any two elements of any array
in a conflicting manner, concurrently inside and outside transactions. This was discussed in section 5.2.
Thread 1 Thread 2
x = 17; atomic {
atomic { if (x_shared==true){
x_shared = true; localVar = x;
} }
}
In the above example, thread 1 initializes “x” to 17, then publishes it by setting “x_shared” to “true”.
Thread 2 should read “x” as 17 if it sees “x_shared” is true. This is the publication pattern. Since it is
impossible for the non-transactional write (x = 17) to run concurrently with the transactional read
“localVar = x”, this program is race-free. As we mentioned in section 5 this pattern is supported by
STM.NET, since we provide global-lock equivalence to race-free programs. This non-racy publication
pattern works if an STM implementation validates reads and stamps writes using a version number per
object (which is a de-centralized and thus scalable mechanism). However, an optimizing compiler or a
programmer may change the above code to:
The above example is normally referred to as “racy publication”. It is “racy” because the non-
transactional write (x = 17) can happen concurrently with the transactional read (tmp = x). Since this is a
racy program, STM.NET is not obligated to support it (more accurately, STM.NET is not obligated to
provide single-lock equivalence in this case). However, it seems possible that an optimizing compiler
might transform the non-racy pattern to the racy pattern. It was therefore argued by some that the STM
system needs to provide the same semantic guarantee for both patterns.
In order to support the “racy publication” pattern, a global versioning mechanism is normally adopted.
For more detail, please refer to [3]. However, global versioning is a centralized mechanism. As a result, it
hurts scalability.
Recently, it has become clear that the CLR memory model does not and will not allow the above
transformation [4][5]. The C++ memory model seems to be moving towards the same conclusion [6], [7].
As a result, the JIT compiler will not generate such a speculative read (tmp = x). On the other hand, our
STM programming model does not provide strong guarantees for racy programs. Therefore, in this
release, instead of the global versioning mechanism, we deploy a scalable algorithm—using a version
number per object. Thus STM.Net does not suffer the scalability bottleneck of global versioning and still
ensures the safety of the non-racy publication pattern.
Thread 1 Thread 2
atomic { atomic {
x_shared = false; if (x_shared==true){
} x = 1;
r1 = x; }
r2 = x; }
Debug.Assert(r1==r2);
Basically, thread 1 privatizes “x” by setting “x_shared” to false. If thread 2 observes “x_shared” as false,
it does not perform any writes to “x”. This is not a racy pattern. The STM system should guarantee the
expected semantics in accordance with single-lock equivalence.
Currently, we are working on an approach with both programming model support and compiler static
analysis techniques to let many transactions skip the privatization barrier. It is expected to improve the
scalability of the system significantly. This optimization partly will rely on the user taking advantage of
AtomicRequired fields. Such fields cannot be “privatized” and thus transactions that exclusively used
AtomicRequired fields could be exempted from the commit ticket protocol. We refer to this
optimization as the “TVar optimization” (“tvars” or “transactional variables”, are the Haskell STM [15]
equivalent of AtomicRequired fields in STM.NET). By exploiting programming model support and
compiler analysis to enable skipping of the commit ticket protocol for transactions that only access
AtomicRequired fields [8], we expect that the scalability will be significantly improved, especially when
the number of threads becomes large.
A Phone Book, which maps a name to a phone number. Internally, it uses one concurrent
dictionary.
A Bidirectional Phone Book, which provides bidirectional mapping between a name and a
phone number. Internally, it uses two concurrent dictionaries. Most of its public methods
involve accesses to both dictionaries. STM is used to ensure the atomicity and isolation of such
operations that access multiple concurrent data structures.
First, let us compare the scalability of the STM of this release (with the commit ticketing) to the STM
with the TVar optimizations:
Use the Tvar optimization to avoid the commit ticketing. (Referred to as “TVar Opt” in the
figures below)
The above two figures show that the STM with the TVar optimization achieves better scalability when
the number of threads is larger than two. If the percentage of updating transactions increases, the
difference is becoming even more pronounced.
11.4 STM vs. Reader-Writer Lock vs. Concurrency Data Structures (CDS)
This section compares the scalability of concurrent data structures synchronized using STM to ones
using other synchronization mechanisms.
We will continue to use the concurrent dictionary as the benchmark. We compared three different
implementations
Let us first look at the performance results from the Phone Book benchmark
Throughput Scalability
As expected, the expertly hand-tuned CDS version achieves the best throughput in all cases.
The ReaderWriterLockSlim-based version does not scale at all, even in the read-only cases. This
may be a little surprising to many readers. However, the implementation of
ReaderWriterLockSlim, even when acquiring a read lock, requires an update to a shared location
in the lock data structure (using interlocked hardware instructions). When many threads are
attempting to acquire the read lock frequently, there can be contention on the cache line
holding this shared location. When the critical section is not big enough to “space out” the cost
of the lock acquisitions, the resulting cache contention prevents scaling.
The STM based versions have lower throughput than the CDS version because of the sequential
overhead. However, the STM version with TVar optimization can achieve similar or better
scalability than the CDS version. It is worth noting that the development of the STM based-
version of the benchmark was much easier than the development of the CDS version, and
required much less expertise. This is because the STM system automates fine-grain locking and
contention management for developers, who are not required to deal with the complexity of
using fine-grain locks.
} else {
// We have tried the optimistic scheme
// a few times, we switch to a
// pessimistic mode by acquiring a lock
rwls.EnterReadLock();
try {
14
In a straight-forward implementation one would use the first dictionary to map persons’ names to their phone
numbers, and the other dictionary to map persons’ phone numbers to their names. The challenge would to keep
those to data structures in-sync while maintaining good scalability for concurrent access.
On the other hand, the STM-based version, below, is extremely simple and the thread-safety tax is
limited to a single atomic block.
atomic {
outPhone = name2phone.TryGetValue(name);
outName = phone2name.TryGetValue(phone);
}
// Check results...
}
Let us take a look at the measured throughput and scalability for the bidirectional phone book example.
Throughput Scalability
15
Special care is taken for transactional accesses to readonly fields in the constructor.
x = value;
it dooms all other concurrent transactions that read the same “x”. However, in some applications, it is
possible that “x” is equal to “value” most of the time. In such cases, you can write the assignment above
as
if ( x != value) x = value;
As a result, if “x” is equal to “value”, the STM only executes a transactional read, instead of a write.
Therefore, the current transaction does not doom other transactions that read x. This optimization can
sometimes reduce runtime contention and improve scalability.
For example, assume we want to develop a concurrent deque based on a doubly linked-list. The idea is
that one or more threads always push/pop items from one end (let’s say “head”); other threads always
push/pop items from the other end (let’s say “tail”). Ideally, if the queue contains enough items, a
thread working on the head should not conflict with another thread working on the tail.
class Deque
{
Node head;
Node tail;
}
Unfortunately, since “head” and “tail” are the fields of the same Deque object, a write to one of them
will conflict with accesses to the other field. This is definitely not what we want when we design the
deque data structure.
class Deque
{
private readonly NodeHolder head = new NodeHolder();
private readonly NodeHolder tail = new NodeHolder();
In this way, the head and tail are held by two different objects. Therefore, there is no contention when
accessing them concurrently.
1 3 7 20
Assume there is a transaction executing a lookup operation to find integer 20. It needs to search from
the head of the linked list. Therefore, when it reaches the node that contains 20, its read set has
contained the nodes that hold 1, 3, 7, and 20.
Assume that at the same time there is another transaction that inserts integer 2 into the set and then
commits before the lookup transaction commits. As a result, it needs to modify the “next” pointer of the
node holding “1”, so it writes to node 1. Note that node 1 is in the read set of the lookup transaction,
which is thus doomed and has to be rolled back to re-execute.
If we look at these two concurrent transactions at an abstract level, they do not conflict. However, they
conflict at the concrete implementation level, and are serialized by the runtime.
Unless advanced techniques, such as open nesting [9], or abstract nested transactions [10], are used,
STM doesn’t excel at such scenarios. However, in terms of scalability, it is worth noting that STM does
not perform worse than a single-lock based-version even in such scenarios.
Our advice would be to prefer data structures that, unlike a singly linked list, are more distributed and
contain less central “choke-points”. Trees and hashtables are good examples of such data structures.
11.5.5 ByRef
The CLR allows the use of managed and unmanaged pointers. Managed pointers are exposed in the C#
programming language as ref and out parameters. Unmanaged pointers may only be used in unsafe
1. if we access memory via such a pointer, how do we decide whether the pointer is to a local
variable, a static variable, or a heap object? We need to know that since we “transact” accesses
differently in each case, and,
2. if the pointer is into a heap object, how do find the start of the object in order to acquire the
transactional lock that protects the object?
We solve these problems by doing extra static analysis to track information about the targets of
pointers, and transmitting this information for pointer/ref arguments from callers to callees. We’ve
worked hard to optimize this as much as possible, especially to ensure that this does not impose undue
cost when you’re not using transactions (as would be required if TM were ever to become part of a
production CLR). But there’s still some significant cost associated with transactional accesses via ref
arguments and pointers. In one experiment we compared two different signatures for a lookup
operation on a dictionary. In one case, the Value type in the KeyValue mapping is required to be an
object type, and mappings to null are not allowed, so that lookup can return null to indicate failure. The
other signature returns a bool to indicate success or failure of the query, and, on success, returns the
Value found as an out parameter. The latter style is currently 40% slower because of overhead
associated with the ref parameter .We may well decrease this overhead in future releases.
The bottom line is that if you can avoid the usage of ref and out parameters in perf-sensitive code, then
do so. It will be more efficient to create and pass around “result objects” that can capture structures and
multiple result values, rather than to use out and ref parametes.
Unsafe code is always discouraged, and with STM this is the case, too.
For more details on how we transact pointer-based accesses in STM, please refer to [16].
STM.NET provides additional events that will allow you to glean important information about the
dynamics of your program execution, find sources of contention and more. In this section we describe
the steps for using ETW tracing to get the trace of STM events. ETW is available only on Windows Vista
and later releases of Windows; we have only tested our use of ETW on the Windows Vista 32-bit edition.
In order to capture, parse and view ETW events you will need to install the Windows Performance Tools Kit
from this link. For details about this Toolkit, please read Windows Performance Tools Kit Documentation
(MSDN). In particular, you need to use a command line tool called xperf. Please read Xperf Command Line
Tool in Detail (MSDN) and Xperf Command-Line Reference.
In step 10, if you get any error/warning (such as loss of events) from xperf, please read Xperf Command
Line Tool in Detail (MSDN) and Xperf Command-Line Reference on how to use the tool.
16
On x64 systems, use %ProgramFiles(x86)% instead of %ProgramFiles%
The environment variable COMPLUS_STMEtwTraceLevel used in Step 5 controls what types of events
are traced.
COMPLUST_STMEtwTraceLevel Events
1 TXStart, TXCommit, TXRollback, TXRetry
2 TXSleep, TXWakeup, and Level 1 events
3 TXConflictOnWrite, TXConclictOnReadOpt, TXConflictOnPes,
TXDoomTx, TXFoundDoomed, and Level 1 and 2 events
17
TX pointer is an address pointing to the runtime transaction object, which manages the transaction state.
18
TxSiteId identifies a unique lexical transaction site.
19
This field in all ETW events is unused right now. It is always 0. Timestamp is the global version number that the
transaction reads when it starts.
20
The address of the object that causes the conflict.
21
The decision of the CM: 0 – try the read/write operation again, 1 – rollback the transaction, 2 – doom the enemy
transaction.