Fault Tolerance Via Idempotence
Fault Tolerance Via Idempotence
Fault Tolerance Via Idempotence
Abstract ing comes with its own pitfalls, such as process failures, imperfect
Building distributed services and applications is challenging due messaging, asynchrony and concurrency.
to the pitfalls of distribution such as process and communication Consider the prototypical bank account transfer service in
failures. A natural solution to these problems is to detect potential Fig. 1. The goal of the service is to transfer money between bank
failures, and retry the failed computation and/or resend messages. accounts, potentially in different banks. If the accounts belong to
Ensuring correctness in such an environment requires distributed different banks, ensuring that the transfer executes as an atomic
services and applications to be idempotent. (distributed) transaction is usually not feasible, and the natural way
In this paper, we study the inter-related aspects of process fail- of expressing this computation is as a workflow [10, 20] consisting
ures, duplicate messages, and idempotence. We first introduce a of two steps, a debit followed by a credit.
simple core language (based on λ-calculus) inspired by modern dis- What if the process executing the workflow fails in between the
tributed computing platforms. This language formalizes the notions debit and credit steps? A natural solution is to detect this failure
of a service, duplicate requests, process failures, data partitioning, and ensure that a different process completes the remaining steps
and local atomic transactions that are restricted to a single store. of the workflow. A challenging1 aspect of realizing this solution
We then formalize a desired (generic) correctness criterion for is figuring out whether the original process failed before or after
applications written in this language, consisting of idempotence completing a particular step (either debit or credit). If not done
(which captures the desired safety properties) and failure-freedom carefully, the debit or credit step may be executed multiple times,
(which captures the desired progress properties). leading to further correctness concerns. Services often rely on a
We then propose language support in the form of a monad that central workflow manager to manage process failures during the
automatically ensures failfree idempotence. A key characteristic of workflow (using distributed transactions).
our implementation is that it is decentralized and does not require Now consider a (seemingly) different problem. Messages sent
distributed coordination. We show that the language support can between the client initiating the transfer and the service may be lost.
be enriched with other useful constructs, such as compensations, The only option for a client, when it does not receive a response
while retaining the coordination-free decentralized nature of the within some reasonable time, is to resend its request. Yet the client
implementation. does not want the transfer to occur twice!
We have implemented the idempotence monad (and its variants) In this paper, we study process and communication failures in
in F# and C# and used our implementation to build realistic appli- the context of workflows. The seemingly different problems caused
cations on Windows Azure. We find that the monad has low runtime by process and communication failures are, in fact, inter-related.
overheads and leads to more declarative applications. Idempotence, a correctness criterion that requires the system to tol-
erate duplicate requests, is the key to handling both communication
Categories and Subject Descriptions D.4.5 [Operating Systems]: and process failures efficiently. Idempotence, when combined with
Reliability—Fault-tolerance; C.2.4 [Computer-Communication Net- retry, gives us the essence of a workflow, a fault tolerant composi-
works]: Distributed Systems—Client/server, Distributed applica- tion of atomic actions, for free without the need for distributed co-
tions ordination. In the transfer example, a fault tolerant account trans-
fer can be implemented without a central workflow manager if the
General Terms Reliability, Languages, Design debit and credit steps can be designed to be idempotent,
Keywords fault tolerance, idempotence, workflow, transaction, Formalizing Failfree Idempotence. In this paper, we introduce a
monad simple core language λFAIL , inspired by contemporary cloud plat-
forms. This language formalizes process failure, duplicate requests,
1. Introduction partitioned data, and local transactions. A local transaction pro-
Distributed computing is becoming mainstream. Several modern vides ACID guarantees but is restricted to access data within a sin-
platforms offer virtualized distributed systems at low entry cost gle partition (typically a single server). Computations in λFAIL are
with the promise of scaling out on demand. But distributed comput- like workflows, but without any fault-tolerance guarantees for the
composition (i.e., the computation may fail between transactions).
We then formalize a generic correctness criterion for applica-
tions written in λFAIL . A simple, powerful and tempting criterion is
that an application’s behavior in the presence of duplicate requests
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
and process failures should be indistinguishable from its behav-
for profit or commercial advantage and that copies bear this notice and the full citation ior in the absence of duplicate requests and failures. We formal-
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee. 1 Ingeneral, detecting failures perfectly in an asynchronous, message pass-
POPL’13, January 23–25, 2013, Rome, Italy. ing system is impossible [8]. Conservative failure detection can also lead to
Copyright c 2013 ACM 978-1-4503-1832-7/13/01. . . $10.00 the same problem of duplicated computation.
let process (request) = workflow monad [2, 4]. We find that the core business logic in these
match request with applications can be declaratively expressed using the monad. Our
| (“getBalance”, (branch, account)) → evaluation shows that performance overheads of using the monad
atomic branch {lookup account}
| (“transfer”, (fromBranch, fromAccount, toBranch, toAccount, amt) →
over hand-coded implementations are statistically insignificant.
atomic fromBranch { The rest of the paper is organized as follows. In Section 2, we
update fromAccount ((lookup fromAccount) − amt) introduce a language λFAIL and formalize duplicate requests and
}; process failures. We formalize what it means for a λFAIL application
atomic toBranch { to correctly tolerate duplicate requests and failures. In Section 3,
update toAccount ((lookup toAccount) + amt) we present the idempotence monad and show how it can be used to
}; tolerate duplicate requests as well as process failures. In Section 4,
“Transfer complete.” we describe extensions of the idempotence construct. In Section 5,
we evaluate the idempotence monad and our implementation from
Figure 1: A banking service example, in syntactically sugared the perspective of expressiveness, benefits and overheads. Section 6
λFAIL , that is neither idempotent nor fault-tolerant. discusses related work.
2. Failfree Idempotence
ize a slightly weaker, but more appropriate, correctness criterion, In this section we present a language λFAIL that distils essential el-
namely failure-freedom modulo message duplication. Informally, ements of distributed computing platforms such as Windows Azure
this criterion permits the system to send duplicate responses. This and formalize the concept of failfree idempotence.
weakening is appropriate from the perspective of composition: if
the recipient of the responses can also tolerate duplicate messages, 2.1 The Language λFAIL
then the sender is freed of the obligation to send the response ex-
actly once. Informal Overview. A λFAIL program e represents a service that
receives input requests and produces output responses. An input
Automating Idempotence. Next, we address the problem of auto- request v is processed by creating an agent to evaluate e v. When
matically ensuring idempotence for a service. We present our solu- the evaluation of e v terminates, producing a value v 0 , v 0 is sent
tion as a monad, the idempotence monad. We then show that idem- back as the response. Multiple input requests can be processed con-
potence, when coupled with a simple retry mechanism, provides a currently, and their evaluation can be interleaved. Shared, mutable,
“free” solution to the problem of tolerating process failures, guaran- persistent data is stored in tables.
teeing failure-freedom modulo message duplication. We then pro- Agents. An agent has its own internal state (captured by local
pose dedicated language support for idempotent computations. variables of the code). An agent may fail at any point in time. A
failure models problems such as hardware failure, software crashes
Decentralized Idempotence and Workflow. The idea underlying and reboots . Data stored in tables is persistent and is unaffected by
the idempotence monad is conceptually simple, but tedious to im- agent failures.
plement manually (i.e., without the monad). Given a unique iden- Tables. Tables are persistent maps. They provide primitives to
tifier associated with a computation, the monad essentially adds update the value bound to a key and lookup up the value asso-
logging and checking to each effectful step in the workflow to en- ciated with a key. The language provides a limited form of an
sure idempotance. An important characteristic of our implementa- atomic transaction, which enables a set of operations on the same
tion (of the monad) is that it is designed to work with contemporary table to be performed transactionally. Specifically, the construct
distributed storage systems such as key-value tables. Specifically, it “atomic t e” evaluates e in the context of table t (lookups and
does not assume the presence of dedicated storage for logs that can updates are permitted only on table t within e), guaranteeing iso-
be accessed atomically with each transaction. The monad reuses lation (no other agent can access t in the middle of e’s evaluation)
the underlying store (in this case a key-value table) to simulate a and the all-or-nothing property (no process failure can happen in
distinct address space for logging. the middle of e’s evaluation).
This leads to a decentralized implementation of idempotence
that does not require any centralized storage or any (distributed) Example. Fig. 1 presents a simple bank-account-transfer exam-
coordination between different stores. Thus, the implementation ple in syntactically sugared λFAIL . This example is neither idempo-
of idempotance preserves the decentralized nature of the under- tent (e.g., executing the same transfer request twice is not equiva-
lying computation. This, in turn, leads to a completely decentral- lent to executing it once) nor fault-tolerant (e.g., if the agent pro-
ized implementation of a (fault-tolerant) workflow, unlike tradi- cessing a transfer request fails in between the debit and credit
tional workflow implementations, which use a centralized work- steps).
flow coordinator and/or a centralized repository for runtime status
information. Syntax. Fig. 2 presents the syntax of λFAIL , which extends λ-
calculus with the primitive operations on tables explained above.
Extensions. We then extend the idempotence monad with other In the rest of the paper, we will use extensions such as natural
useful constructs, while preserving the decentralized nature of the numbers, arithmetic operators, and ordered pairs for brevity. These
construct. One extension allows the application to associate each can be encoded in the core language or added to it in the usual
transaction in a workflow with a compensating action. Another ex- fashion. We also use syntactic sugar, such as lookup e, where
tension allows the application to generate intermediate responses e 6∈ hVali, as shorthand, e.g., for (λx.lookup x)e.
to client requests and then asynchronously retry the requests on
client’s behalf. This idiom, especially useful in long running com- Semantic Domains Fig. 2 defines the semantic domains used in
putations, frees the client from having to track status of requests, the semantics of λFAIL . Let hVali denote the set of all basic values:
and leads to more responsive clients. these basically consist of function abstractions. (As usual, natural
numbers, string constants, ordered pairs of values, etc. can all be
Implementation. We have implemented the idempotence work- encoded within hVali or added to it.)
flow monad in F# targeting the Windows Azure platform. We have Let hValiopt represent the set of optional values (of the form
implemented several realistic applications using the idempotence NONE or SOME v). An element of ΣST (a map from hVali to
(a) Evaluation Contexts
E ::= [·] | E e | v E
(b) The Set of Evaluation Rules A used to define the standard semantics of λFAIL .
[INPUT] [OUTPUT]
v ∈ hVali vo ∈ hVali
in(v) out(vi ,vo )
he, µ, α, I, Oi ⇒ he, µ, α ] {v . (e v)}, I ∪ {v}, Oi he, µ, α ] {vi . vo }, I, Oi ⇒ he, µ, α, I, O ∪ {(vi , vo )}i
[NORMAL] [FAIL]
ω
hµ, ei hµ0 , e0 i
hp, µ, α ] {v . e}, I, Oi ⇒ hp, µ0 , α ] {v . e0 }, I, Oi hp, µ, α ] {v . e}, I, Oi ⇒ hp, µ, α, I, Oi
[ATOMIC] [UPDATE]
[LOOKUP]
ω∗
hµ[tn], e()i ht, vi, v ∈ hVali
ω (t,k,v)
hµ, atomic tn ei hµ[tn 7→ t], vi ht, update k vi ht[k 7→ (SOME v)], 0i ht, lookup ki ht, t[k]i
[CONTEXT] [LIFT]
[CONTEXT] [LIFT]
u [BETA]
ht, ei ht 0 , e0 i hµ, ei
ω
hµ0 , e0 i e → e0 e → e0
u ω
ht, E[e]i ht 0 , E[e0 ]i hµ, E[e]i hµ0 , E[e0 ]i ht, ei ht, e0 i hµ, ei hµ, e0 i (λx.e) v → e[v/x]
[RETRY]
v∈I
he, µ, α, I, Oi ⇒ he, µ, α ] {v . (e v)}, I, Oi
[IMATOMIC]
[IMRETURN]
hτ ,τv i
Γ |= e : τ Γ |= t : hτk , τv i Γ |= e : (unit k→ τ )!hτk , τv i
Γ |= imreturn e : IM τ Γ |= imatomic t e : IM τ
[IMBIND] [IMRUN]
Γ |= e2 : τ1 → IM τ2 Γ |= e1 : IM τ1 Γ |= e : τ1 → IM τ2
Γ |= imbind e1 e2 : IM τ2 Γ |= imrun e : τ1 →ID τ2
L EMMA 3.3. All executions of a well-typed IM program are oper- define the relation ' between ΣpA and ΣpI by hp, µ1 , α1 , I1 , O1 i '
ationally idempotent. hp, µ2 , α2 , I2 , O2 i iff I1 = I2 and µ1 = µ2 .
P ROOF S KETCH. Let us refer to a value of the form (0, k) (used as Note that in the above definition, the condition I1 = I2 im-
a key for a table lookup/update operation) as a system key. Consider plies that both configurations have received the same set of input
any execution of a well-typed IM program. It is easy to verify that requests. The condition µ1 = µ2 implies that the processing of
once the value associated with a system key (0, k) in any table is each input request v in both configurations have gone through iden-
set to be v, it is never subsequently modified. tical sequences of effectful steps so far. (This follows since every
ω ω0 effectful step is memoized and is part of the table state µ1 and µ2 .)
Consider any two execution-steps a 99K b and a0 99K b0 in the
execution (in that order) where a = a0 . The idempotence property T HEOREM 3.6. If p is a well-typed IM program, then ' is a weak
follows trivially whenever these steps are effect-free steps. The bisimulation between (ΣpA , ⇒A ) and (ΣpI , ⇒IDEAL ).
only effectful steps are produced by the evaluation of an imatomic
construct. Since a = a0 , the key value used in the evaluation of P ROOF S KETCH. Consider any σ1 ' σ2 . The interesting transi-
imatomic in both steps must be identical. The implementation tions are those that are effectful (involving an atomic operation) or
of imatomic guarantees that whatever value is produced by the produce an output. If either σ1 or σ2 can perform an interesting
evaluation of imatomic in the first step will be memoized with the transition, we must show that the other can perform an equivalent
given value. This guarantees that the second step will find this same transition (possibly after a sequence of silent transitions).
value in the table (thanks to the property described earlier). Hence, Consider the case when an agent v . e in σ1 can perform an
it will evaluate to the same value and will have no side-effects on effectful transition in the standard semantics. Let v . e0 be the state
the table. It follows that the execution is idempotent. of the same agent after its most recent imatomic evaluation. Then,
we must have some side-effect-free evaluation e0 → e1 · · · ek = e.
Consider configuration σ2 . By definition, we restrict our attention
T HEOREM 3.4. Any well-typed IM program is observationally to reachable states. Hence, there exists some execution in the ideal
idempotent. semantics that produces σ2 . This execution must have received in-
P ROOF S KETCH. Follows immediately from the previous two lem- put request v and produced an agent v . (p v). Consider the evalu-
mas. ation of this agent. The effectful steps in this agent’s evaluation in
the ideal semantics must have produced the same sequence of val-
As stated earlier, observational idempotence does not give us ues as in the evaluation in the standard semantics (since the mem-
any progress guarantees in the presence of failures. We now estab- oized values for these steps are the same in both σ1 and σ2 ). Thus,
lish progress guarantees in the form of a weak bisimulation. σ2 must have some agent of the form v . ei for some 0 ≤ i ≤ k.
This will produce, after zero or more silent transitions, the agent
D EFINITION 3.5. Let p be any λFAIL program. Let ΣpA denote the v . e that can then perform the same effectful transition in the ideal
set of configurations { σ ∈ Σ | hhpii ⇒∗A σ } that can be produced semantics as in the standard semantics.
by the execution of p under the standard semantics. Let ΣpI denote Consider the case when an agent v . e in σ2 can perform an ef-
the set of configurations { σ ∈ Σ | hhpii ⇒∗IDEAL σ } that can fectful transition in the ideal semantics. We can create a new agent
be produced by the execution of p under the ideal semantics. We v . (p v) using rule RETRY in the standard semantics. We can then
lation preserves types: translation of a well-typed λFAIL program
[x]V = x (satisfying the restrictions mentioned above) will produced a well-
[λx.e]V = λx.[e]I typed monadic program.
[x]I = imreturn x 3.4 Failfree Realization Via Idempotence
[λx.e]I = imreturn (λx.[e]I ) Given any λFAIL expression e, we can construct its monadic version
[n e]I = imbind [n]I (λx.[x e]I ) ,n 6∈ hVali em as explained above. The preceding results imply that em must
[v n]I = imbind [n]I (λx.[v x]I ) ,v ∈ hVali, n 6∈ hValibe failfree. We now establish a stronger result, namely that em is
[v v 0 ]I = [v]V [v 0 ]V ,v, v 0 ∈ hVali a failfree realization of e. (Failure-freedom is a weak correctness
[atomic x e]I = A
himatomic x [e] , fv i criterion. It indicates that a program’s behavior under the standard
where fv = {x} ∪ freevars(e) semantics is equivalent to it’s behavior under the ideal semantics.)
The notion of failure-freedom does let us simplify verifying cor-
[x]A = x rectness of a program by considering only its behavior under the
[λx.e]A = λx.[e]A ideal semantics. In particular, we have:
[e1 e2 ]A = [e1 ]A [e2 ]A T HEOREM 3.8. If p and q are weakly bisimilar under the ideal se-
[update e1 e2 ]A = imupdate [e1 ]A [e2 ]A mantics (i.e., if (hhpii, ⇒IDEAL ) ' (hhqii, ⇒IDEAL ), and p is failfree,
[lookup e]A = imlookup [e]A then p is a failfree realization of q.
P ROOF S KETCH. Follows as we can compose the two weak bisim-
he, {}i = e
ulations together.
he, {x} ] Y i = himbind x (λx.e), Y i
This theorem simplifies proving that a program is a failfree
Figure 7: Transforming λFAIL expressions into idempotent expres- realization of another. In particular, we have already seen that a
sions. monadic program is failfree (Theorem 3.7). Hence, to prove that em
is a failfree realization of e, it suffices to show a weak bisimilarity
between the ideal executions of a λFAIL program e and the ideal
duplicate the entire execution history of v . e (which is guaranteed executions of its monadic representation em .
to be the same in both semantics by the definition of '). Thanks
to the idempotence property, this duplicate execution will have no T HEOREM 3.9. Let em be the monadic representation of e. Then,
extra side-effects and will eventually produce the same effectful em is a failfree realization of e.
transition as in the ideal semantics. P ROOF S KETCH. As explained above, it is sufficient to relate eval-
uations of e and em under the ideal semantics. In this setting, it is
intuitively simple to see why the monadic program “em ” simulates
T HEOREM 3.7. A well-typed IM program is failfree. the given program e. The key distinction is that the monadic imple-
mentation uses imatomic to perform any effectful step. This step
3.3 Transforming λFAIL Programs To Idempotent Programs
will first check the memoized data to see if this step has already
We have seen that the idempotence monad lets us construct failfree executed. In an ideal execution, this check will always fail, and the
programs. Given any λFAIL -program e, we can construct its “equiv- monadic implementation then performs the same effectful step as
alent” monadic representation using standard techniques (which are the original program (and memoizes it). The check is guaranteed
conceptually straightforward, though the details are intricate). The to always fail because the keys (g, i) used in distinct executions
transformation is presented in Fig. 7 (based on the transformation of imatomic are distinct. The value of g will be different for exe-
in [17]). cutions corresponding to different inputs x and y. The value i will
We make a few simplifying assumptions (type restrictions) be different for different steps corresponding to the same input x.
about the source λFAIL program e in this transformation algorithm
(similar to those mentioned in Section 3.2). We assume that the
types of keys and values of all tables are pure types. We assume that 3.5 Idempotent Failfree Computations as a Language
intra-transaction expressions do not manipulate values of workflow Feature
type: e.g., atomic t {λx.(atomic s e)} is not allowed. We have now seen how idempotent failfree computations can be
The translation is defined using multiple translation functions. automatically realized using the idempotence monad. We now pro-
[v]V is a translation function that applies only to values (which pose a new language construct for idempotent failfree computa-
must be of the form x or λx.e). The translation function [e]I is the tions.
heart of the monadic transformation and can be applied to multi- The construct “idff v e” indicates that the computation of
transaction expressions. It uses the idempotence monad to sequence e should be failfree and should be idempotent with respect to v:
local transactions. The translation function [e]A is applied to intra- i.e., multiple (potentially concurrent) invocations of this construct
transaction expressions and it is used primarily to replace occur- with the same value v behaves as though only one of the invoca-
rences of update and lookup by imupdate and imlookup respec- tions executed. The above construct permits users to specify the
tively. Note that a single local transaction (i.e.intra-transaction ex- code fragment for which they desire automatic idempotence and
pression) evaluation is done using standard evaluation (without us- failure-freedom (provided by the compiler). This enables users to
ing any monad). The auxiliary function he, Xi is used to transform rely on other methods, perhaps application-specific, to ensure these
the monadic values used in the evaluation of multi-transactions to properties elsewhere. (For instance, some computation may be se-
standard values required in the evaluation of a single local transac- mantically idempotent already.) It also lets the users specify what
tion. should be used as computation identifier to detect duplicates, as
Finally, given a top-level λFAIL program e (which is assumed to illustrated below.
be a functional value of the form λx.e0 ), it’s monadic form em is We refer to this enhanced language as λIDFF . A formal seman-
defined to be imrun [e]V . As usual, it can be shown that the trans- tics of λIDFF appears in the appendix. We illustrate the meaning of
the first parameter of idff using the incorrect λFAIL example of Semantic Domains. The semantic domains for λIDWF are the
Fig. 1. We can wrap the idff construct around this example in the same as for λFAIL with minor extensions. The runtime expression
following two ways, with different semantics. Note that the input e ec is used to represent a workflow during its execution. Here,
in this example is a pair (reqId, req) consisting of a request-id as e represents the partially evaluated form of a workflow and ec
well as the actual request. Consider: represents the compensating action to be performed in case the
workflow needs to be aborted. Agents can also be of the form
f1 = λ(reqId , req). idff reqId (process (reqId , req))
id I ew ec , indicating an agent evaluating a workflow.
f2 = λ(reqId , req). idff (reqId , req) (process (reqId , req))
Semantics. The semantics of λIDWF is defined using a set of rules
The behavior of these two functions differ in the cases where consisting of all the rules in A used to define the semantics of λFAIL
multiple inputs arrive with the same request-id but differing in the except for FAIL and ATOMIC, plus the new rules presented in Fig. 8.
second parameter req. f1 treats such requests as the same and will The initiation of a workflow (rule IDWF-BEGIN) creates a new
process only one of them, while f2 will treat them as different agent of the form id I ew ec , provided no agent has already
requests and process them all. been created for id. This agent evaluates the workflow ew and
One of the subtle issues with the semantics and implementation tracks the compensating action ec to be performed in case of an
of the idff construct is the treatment of invocations that have the abort. Rule IDWF-END indicates how the computation proceeds
same id (first parameter) but have different expressions as the sec- once the workflow evaluation is complete (or if a previous work-
ond parameter. We take a simple approach with our semantics (that flow with the same id has already been initiated).
the effect is as if only the first invocation occurred). This has some The rules ATOMIC and ATOMIC-ABORT define the semantics of
implications for the underlying implementation (in the presence of transactions in a workflow. Informally, the expression atomic t ea ec
failures). One solution is to use a continuation-passing style com- is evaluated as follows. First, ea is evaluated atomically to produce
putation, and memoize the entire continuation in the memoization a value v. Then, as a side-effect, the compensating action is up-
step (rather than just the value computed for the step). dated to indicate that ec v should be evaluated as the first step
of the compensation before executing the original compensation.
4. Extensions Finally, the whole expression evaluates to v. Thus, note that the
value produced by the “atomic” action ea is available to the subse-
We have seen how idempotence can serve as the basis for fail- quent computation as well as the compensating action ec . When a
free composition of computations: essentially, a simple form of workflow is aborted (rules ATOMIC-ABORT and IDWF-ABORT), the
fault-tolerant workflow. In this section, we describe two extensions compensation expression is evaluated.
that enrich this construct, namely compensating actions and asyn- Rule ATOMIC of λFAIL is replaced by the pair of rules PURE-
chronous evaluation, which simplify writing applications. These ATOMIC and PURE-ATOMIC-ABORT, which describe the behavior of
concepts are not new, but what is interesting is that they can be a transaction that is not contained within a workflow. PURE-ATOMIC
integrated without affecting the light-weight, decentralized, nature describes the successful completion of a transaction, while PURE-
of our idempotence implementation. ATOMIC-ABORT describes the case where the transaction is aborted.
(The auxiliary relation used here is defined by the rules for
4.1 Compensating Actions λFAIL .)
The idff construct allows us to compose several transactions into
Compensation monad We now describe the compensation monad
an idempotent workflow that appears to execute exactly once with-
that can be used to realize workflows with compensating actions.
out process failures. However, the lack of isolation means that when
For simplicity, we describe an implementation of the compensat-
a transaction in the workflow is executed, its precondition may not
ing monad that focuses only on logical failures and compensations.
be satisfied and we may need to abort the workflow. For example,
We assume there are no duplicate requests or process failures. We
in the transfer example (Fig. 1), the debit step may succeed, but
can realize idempotent workflows with compensating actions by
we may be unable to complete the subsequent credit step because
composing this monad and the idempotence monad [13].
the account does not exist. One way of recovering from this failure
The compensation monad, shown in Fig. 9, is a combination
is to compensate for the debit by crediting the amount back to the
of the exception monad and the continuation passing style monad.
source account. If compensating actions are correct, the workflow
Transactions return a value of the form Value(v) (on successful
can guarantee all-or-nothing semantics i.e. either all or none of the
completion) or a special value Abort to indicate that the transaction
transactions in the workflow appear to execute.
was aborted. Workflows are represented as a function in continua-
We first formalize the desired semantics of workflows with com-
tion passing style. The helper function compensateWith associates
pensating actions. We present a language λIDWF which provides
a transaction a with a compensating action to construct a primi-
language constructs to associate transactions with compensating
tive workflow. compensateWith constructs a function in continua-
actions, and to declare logical failures. Finally, λIDWF supports
tion passing style. This function first evaluates a(). If a aborts, the
a construct idworkflow id e, which composes transactions with
whole transaction aborts and returns Abort. Otherwise, the contin-
compensating actions into a workflow. We present semantics of this
uation is evaluated using the value returned by the transaction. If
construct, and then show how this construct is realized using a com-
the continuation itself aborts (because one of the following trans-
pensation monad.
actions aborts), we evaluate the compensating action and return the
Syntax. λIDWF modifies and extends λFAIL in the following ways value Abort. The monad’s return simply lifts a value (with no com-
(see Fig. 8). atomic t ea ec extends the atomic construct of λFAIL pensation) into a workflow. The monadic bind is standard for con-
by specifying ec as the compensation for the atomic transaction ea . tinuation passing style computations. The function run shows how
abort indicates that a logical failure has occured and the work- to execute a workflow by passing it an empty continuation.
flow must be aborted. idworkflow id e represents an idempotent
workflow with identifier id, where e is the workflow consisting of 4.2 Asynchronous evaluation
a composition of atomic transactions with compensations. Expres- Workflows are commonly used to perform computations involving
sion of the form e ec arise only during evaluation and are not several transactions. Consequently, workflows are often long run-
source language constructs. ning with highly variable latencies. Large latencies accentuate the
(a) Syntax and Evaluation Context
[IDWF-END]
[IDWF-BEGIN] id I u ∈ α u ∈ hVali
v . E[idworkflow id w] ∈ α 6 ∃e0 .id I e0 ∈ α hp, µ, α ] {v . E[idworkflow id w]}, I, Oi ⇒
hp, µ, α, I, Oi ⇒ hp, µ, α ] {id I (w() 0)}, I, Oi hp, µ, α ] {v . E[u]}, I, Oi
[ATOMIC] [ATOMIC-ABORT]
[NORMAL]
ω ω∗ ω∗
0 0
hµ, ei hµ , e i hµ[tn], ea ()i ht, vi, v ∈ hVali hµ[tn], ea ()i ht, E[abort]i
ω
hp, µ, α ] {v I e}, I, Oi ⇒ hµ, E[atomic tn ea ec ] ei hµ, E[atomic tn ea ec ] ei
hp, µ0 , α ] {v I e0 }, I, Oi hµ[tn 7→ t], E[v] ((ec v); e)i hµ, E[abort] ei
[IDWF-ABORT] [CONTEXT]
e → e0
hµ, (E[abort]) ec i hµ, ec i hµ, E[e] ec i → hµ, E[e0 ] ec i
[PURE-ATOMIC] [PURE-ATOMIC-ABORT]
ω∗ ω∗
hµ[tn], ea ()i ht, vi, v ∈ hVali hµ[tn], ea ()i ht, E[abort]i
ω
hµ, atomic tn ea ec i hµ[tn 7→ t], vi hµ, atomic tn ea ec i hµ, 0i
let compensateWith a comp = queue, retrieve checkpoints, and continue evaluation, deleting the
fun f → match a() with checkpoint only when the workflow has been fully evaluated.
| Abort → Abort Supporting asynchronous evaluation requires some additional
| Value(b) → match (f b) with
| Abort → let = comp b in Abort
support from the platform and a minor change to the monad’s bind
| Value(c) → Value(c) function. We assume that the platform provides a channel that can
only be accessed by the idempotence monad (hence not exposed
let bind (v, f) = fun g → v (fun a → f a g) in λFAIL ). Messages can be sent to this channel using the function
let return a = compensateWith (fun () → a) (fun v → ()) send and received using the function recv. We assume the channel
let run a = a (fun x → Value(x)) supports the following handshake protocol for messages in order
to guarantee at-least-once processing of messages. In this protocol,
recv does not delete messages from the channel. Instead, an agent
Figure 9: The compensation monad that receives a message must acknowledge the message (using ack)
before the message is permanently deleted. If an agent crashes after
receiving a message and before acknowledging it, the message
problem of duplicate requests because clients cannot easily distin- reappears in the channel and may be processed by other agents.
guish between a long running workflow and one that has failed to (Windows Azure provides such a channel implementation, which
generate a response. While the idempotence monad guarantees cor- our implementation uses.)
rectness in such cases, idempotence does come at a performance The changes to the idempotence monad are illustrated in Fig-
cost (due to log lookups). A design pattern commonly used to re- ure 10. Instead of invoking the remainder of the workflow, the mod-
duce the number of duplicate requests is for the system to take over ified bind (Fig. 10) creates a closure for the rest of the workflow
the task of retrying (a part of) the request on behalf of the client and persists the closure in a special queue (“worklist”) (using the
asynchronously, and sending an intermediate response to the client, function send). Special agent processes (agent) retrieve the check-
typically with the transaction identifier. The client can use transac- points (using recv) and continue evaluating the rest of workflow.
tion identifier to periodically poll for the status of the request. If an agent manages to evaluate the workflow without failing, it
The idempotence monad can be extended to support asyn- deletes the workflow from the queue (using ack). In our implemen-
chronous evaluation as follows. At a programmer defined point tation, the choice of using asynchronous evaluation (and the partic-
in the evaluation of the workflow, we create a checkpoint. A check- ular step at which to create a checkpoint) is left to the programmer.
point is essentially a closure representing the rest of the work- We evaluate the benefits of these optimizations in Section 5.
flow, along with relevant state variables. The checkpoint is then
persisted, typically in a distributed queue (essentially a worklist).
Once the checkpoint has been created, an intermediate response
is sent to the client. A set of special agents periodically query the
let agent () = Hand- Molecules
while (true) coded
let (msgid, msg) = recv “worklist”in Banking 94 7296
let (f, params) = msg in Blogging 243 6484
let result = f params in Auction 107 7256
let = ack “worklist” (msgid,msg) in result Surveys 530 9004
Expense 378 6504
let bind (idf, f) =
fun (guid, tc) →
let (ntc, v) = idf (guid, tc)
Figure 12: Average message size (in bytes) of the hand-coded and
send “worklist” (f a (guid, ntc + 1)) workflow implementations.
Figure 10: The idempotence monad with asynchronous evaluation The first transaction checks if the bid is valid and then checks if
the bid beats the current maximum bid, in which case the bid is
recorded in a bids table. The second transaction marks the current
let saveSurvey response = idworkflow { winner and loosing bidders in a separate table. The web front-end is
do! addAtomic ”responses” response.Id response programmed to poll this table periodically for the latest bid status.
return! atomic{ Subsequent transactions update other tables such as an aggregate
let! summary = readAtomic ”summaries” response.SurveyName table that maintains the most frequently bid items, more frequent
do summary.Merge(response) bidders etc. With the idempotent workflow monad, it is easy to
return! writeAtomic ”summaries” response.SurveyName summary }}
guarantee that each bid is processed exactly once even if clients
retry with little change in complexity.
Figure 11: The save survey operation expressed as in idempotent
workflow. Overheads. There are two sources of overheads associated with
idempotent workflows compared to hand-coded implementations.
First, asynchronous evaluation requires serialization and deserial-
ization closures, which can be expensive. Compiler generated clo-
5. Evaluation sures tend to capture a lot more state than hand-coded implementa-
We have implemented the idempotence workflow monad and its tions. Idempotent workflows also add unnecessary logging to trans-
variants in C# and F#, targeting Windows Azure, a platform for actions that are already idempotent.
hosting distributed applications. Azure provides a key-value store Fig. 12 shows the average size of messages sent in hand-coded
with explicit support for data partitioning, where partitions are units and workflow monad based implementations. As expected, the
of serializability. In this section, we focus on evaluating the expres- workflow based implementation generates significantly larger mes-
siveness and performance overheads of the idempotence monad. sages. However, experiments on Windows Azure show that the size
of the message does not significantly influence the read/write la-
Sample applications We found several real world applications tency or throughput of channels [1], as confirmed by our experi-
whose core logic can be declaratively expressed as workflows. ments below.
We briefly describe an expense management application [4] and Next, we evaluate the overall performance overheads of using
a survey application [3]. Other applications we have implemented the idempotence monad relative to hand-coded implementations.
include applications for online auctions [2], blogging and banking. For each benchmark application, we evaluate three versions - the
original hand-coded version, and two monad based versions. The
Survey application. The Surveys application enables users to de- first version (synchronous) evaluates the workflow synchronously
sign a survey, publish the survey, and collect the results of surveys. in the context of the current agent and delegates the responsibility
The most performance critical scenario in this application is when a of retrying operations to the client. The second version uses asyn-
user submits a survey. This operation involves two steps, recording chronous evaluation.
the response and updating the survey summary with the response. We hosted each of the applications on Windows Azure and
For scalability, survey responses and summaries are stored in dif- ensured that each variant was assigned the same hardware re-
ferent data partitions. Figure 11 shows an implementation of this sources i.e. 5 virtual machines, each with a 4-core processor and
operation using workflows. The syntax here is from the actual F# 7 GB RAM. We assigned two virtual machines each for the front
implementation and differs from λFAIL in non-essential ways. The end agents that service browser requests and agents that interact
workflow is composed of two transactions, a transaction that saves with the storage system and one virtual machine for a background
the response and a transaction that updates the summary. In this agent uses for asynchronous/check-pointed evaluation. In the syn-
implementation, survey responses do not reflect in the summary chronous evaluation variant, we assigned an additional virtual ma-
immediately. In general, deferred writes are often acceptable as chine to the storage interaction agent. For each benchmark, we cre-
long as the writes will eventually appear to occur exactly once, a ated a workload that simulates users exercising performance crit-
guarantee workflows provide. The use of the idempotence monad ical scenarios in the application. For example, in the surveys ap-
guarantees that even if multiple requests are received with the same plication, the workload consists of a mix of surveys response re-
survey, the workflows appears to execute just once. quests and survey analysis requests. In each case, the workload is
biased towards write requests (e.g. 75% responses and 25% anal-
Auction application. The auction application allows users to bid ysis requests in the survey application) to ensure that molecular
for items on auction and track the status of their bids. Sellers can transactions are on the critical path. We ran the workload for 3 min-
register items for auction with a time limit and a minimum bid utes (with a 30 seconds warm-up period) and measured throughput
price. Auction sites are often high volume sites, and both latency (number of requests serviced per second) and latency (average re-
and scalability is important. The most critical operation in this ap- sponse time) for varying number of simultaneous users.
plication is the operation that processes new bids. This operation Fig. 13 shows the measured throughput and latency for the
can be expressed as a workflow composed of several transaction. surveys application. In the baseline version, both throughput and
Hand-coded Synchronous Asynchronous Hand-coded Synchronous Asynchronous
Throughput (Request/sec)
60 1.00
Latency (sec)
50 0.75
40
0.50
30
0.25
20
10 0.00
25 50 75 100 25 50 75 100
Number of concurrent users Number of concurrent users
Figure 13: Average throughput and latency for various versions of the survey application.
40 1.50
Latency (secs)
35 1.20
30 0.90
25 0.60
20 0.30
15 0.00
25 50 75 100 25 50 75 100
Number of concurrent users Number of concurrent users
Figure 14: Average throughput and latency for various versions of the auction application.
latency increase almost linearly with the number of users. All sented as a monad. This implementation technique is decentralized
monad based implementations closely follow this trend, suggesting and does not require distributed coordination. (c) Language con-
very little additional overhead due to our implementation. The structs for idempotent workflow.
throughput (60 operations/second) and latencies are along expected The need to move beyond atomic transactions to sequential
lines. The expected throughput of Azure tables is 500 reads/writes compositions of atomic transactions (i.e., workflows) motivated the
per second, whereas our “operation” is a workflow consisting of early work on Sagas and long running transactions [10, 20]. These
two transactions. constructs are weaker than distributed transactions and are gener-
The auction application (Fig. 14) has slightly different perfor- ally used to orchestrate processes that run for extended periods.
mance characteristics. The throughput of this application is lower Kamath et al [12] discusses several issues relating to the implemen-
than the surveys application and it decreases with load. This is tation of transaction managers in Sagas and workflows. A distin-
expected since the workflow is significantly longer. Synchronous guishing aspect of our work is that it exploits the fact that a service
evaluation achieves better throughput than the hand-coded imple- is required to be observationally idempotent (from its clients’ per-
mented (which is asynchronous) at almost the same latency. As spective) to simplify the internal implementation of the workflow.
expected, asynchronous evaluation improves latency, especially at In particular, this lets us avoid the need for distributed coordination
low loads, but at the cost of reduced throughput (up to 25% lower). with workflow managers.
At high loads, all monad based implementations perform better Modern web applications often exploit horizontal partitioning
than the hand-coded implementation. This is due to a few low level for scalability, which involves storing data in a number of differ-
optimizations (such as batching of requests) performed by our im- ent databases or non-relational data-stores. This leads to a need for
plementation. We leave a more detailed performance analysis and workflow-style computations even within a single intra-enterprise
optimization of the idempotence monad for future work. application. Since scalability is key (and the motivation for data
partitioning), conventional workflow engines are rarely considered
for use in this context. Instead, the programmer usually realizes
6. Related Work the workflow manually, exposing themselves to all the subtleties
Our work builds on previous literature in the topics of building of realizing such workflow correctly. Pritchett [16] describes pro-
reliable distributed systems and transactional workflow. Idempo- gramming methodologies for use in these scenarios. Our goal is to
tence has been widely and informally recognized as an important provide lightweight language mechanisms that can be used to real-
property of distributed systems. Our key contributions include: (a) ize idempotent workflows correctly in such scenarios.
A formal specification of the desired correctness properties, both Helland [11] explains in detail why idempotence is an essential
safety (idempotence) as well as progress (failure freedom). (b) An property for reliable systems.
automatic technique for guaranteeing failfree idempotence, pre-
Frolund et al. [9] define a correctness criteria known as X- (JACM), 32(2):374–382, 1985.
Ability for replicated services. A history is said to be x-able if it is [9] Svend Frolund and Rachid Guerraoui. X-Ability: A Theory of
equivalent to some history where every request is processed exactly Replication. Distributed Computing, 14, 2000.
once. Much like failfree idempotence, X-Ability is both a safety
[10] Hector Garcia-Molina and Kenneth Salem. Sagas. In Proc. of ICMD,
and liveness criteria. Our notion of failfree idempotence is general- pages 249–259, 1987.
izes X-Ability beyond replication requests to workflows. There are
also significant differences in our implementation techniques. [11] Pat Helland. Idempotence is not a medical condition. ACM Queue,
Our work is also related to open nested multi-level transac- 10(4):30–46, 2012.
tions [15, 19]. These two constructs share the use of compensat- [12] Mohan Kamath and Krithi Ramamritham. Correctness Issues in
ing actions, but are semantically different. Open nested transactions Workflow Management. Distributed Systems Engineering, 3(4):213,
provide a way for dealing with conflicts at a higher level of abstrac- 1996.
tion, which often leads to increased concurrency. [13] Sheng Liang, Paul Hudak, and Mark Jones. Monad Transformers and
Our basic setting is similar to Argus [14]. However, the con- Modular Interpreters. In In Proc. of POPL, pages 333–343, 1995.
struct that Argus provides programmers to deal with process fail- [14] Barbara Liskov. Distributed programming in argus. Communications
ures is a conventional transaction. As with sagas, we show that of ACM, 31:300–312, March 1988.
many applications can be expressed using workflows (as we cover [15] J. Eliot B. Moss. Nested Transactions: An Approach to Reliable
in Section 2) with compensating actions to compensate for the lack Distributed Computing, 1981.
of isolation. The transactor programming model [7] also provide
primitives for dealing with process failures in a distributed system. [16] Dan Pritchett. Base: An acid alternative. Queue, 6(3):48–55, May
2008.
However, there is no shared state in the transactor model. The prim-
itives provided by the transactor model (stabilize, checkpoint, and [17] Philip Wadler and Peter Thiemann. The Marriage of Effects and
rollback) are different from the primitives we study. Monads. ACM Trans. Comput. Log., 4(1):1–32, 2003.
Bruni et al [5] formalize compensating actions in an abstract [18] David Walker, Lester Mackey, Jay Ligatti, George A. Reis, and
language. Their formalism, however, does not explicitly model David I. August. Static Typing for a Faulty Lambda Calculus. In In
state. Their paper, in fact, suggests study of compensation in the ACM International Conference on Functional Programming, 2006.
context of imperative features (state, variables, control-flow con- [19] Gerhard Weikum and Hans-J. Schek. Concepts and Applications of
structs) as future work. Our work provides a compensation-based Multilevel Transactions and Open Nested Transactions. In Database
transactional construct as a library in a real language (F#) for a real Transaction Models for Advanced Applications, pages 515–553,
system (Azure), in addition to the theoretical treatment in the set- 1992.
ting of lambda-calculus with mutable shared state. [20] Gerhard Weikum and Gottfried Vossen. Transactional Information
Luis et. al [6] propose an abstract model of compensating ac- Systems: Theory, Algorithms, and the Practice of Concurrency
tions (based on process calculus) for reasoning about correctness Control. 2001.
for workflows that use compensating actions.
Faulty lambda calculus [18] is a programming language and
a type system for fault tolerance. However, λFAIL is addresses A. An Abstract Definition of Idempotence
process failures, while faulty lambda calculus addresses transient We now present a more abstract definition of idempotence in terms
data corruption errors. of histories.
Basics. The users of the system issue requests to the system. Let
Acknowledgments I denote a set of input messages (or requests). The system responds
We would like to acknowledge Siddharth Agarwal, Nirmesh Malviya, to requests with a response. Let O denote a set of output values. An
Khilan Gudka, and Dheeraj Singh for their contributions. output message or response is a pair (i, o) ∈ I × O indicating that
the output value o is produced in response to input request i.
A history π is a sequence e1 e2 · · · ek of requests and responses
References (i.e., each ei can be either a request or a response). We will restrict
[1] Azurescope: Benchmarking and Guidance for Windows Azure. our attention to histories that satisfy the simple property that every
http://azurescope.cloudapp.net/BenchmarkTestCases/ response corresponds to an earlier request in the history. We define
#4f2bdbcc-7c23-4c06-9c00-f2cc12d2d2a7, June 2011. a specification Φ to be a set of histories. In the sequel, we use a set
[2] Bid Now Sample. http://bidnow.codeplex.com, June 2011. of histories to specify desired safety properties of the system.
[3] The Tailspin Scenario. http://msdn.microsoft.com/ In the sequel, let q ∈ I range over requests and let r ∈ I × O
en-us/library/ff966486.aspx, June 2011. range over responses. Let α, β, and γ range over sequences of
[4] Windows Azure Patterns and Practices. http://wag.codeplex. requests and responses.
com/, 2011.
D EFINITION A.1. A specification Φ is said to be asynchronous if
[5] Roberto Bruni, Hernán Melgratti, and Ugo Montanari. Theoretical it satisfies the following properties.
Foundations for Compensations in Flow Composition Languages. In
Proceedings of POPL, pages 209–220, 2005. 1. αβqγ ∈ Φ ⇒ αqβγ ∈ Φ.
[6] Luis Caires, Carla Ferreira, and Hugo Vieira. A Process Calculus 2. αrβγ ∈ Φ ⇒ αβrγ ∈ Φ.
Analysis of Compensations. In Trustworthy Global Computing,
volume 5474 of Lecture Notes in Computer Science, pages 87–103. The above conditions are a natural restriction on specifications
2009. because of messaging delays that cannot be controlled. The above
[7] John Field and Carlos A. Varela. Transactors: A Programming Model property is also related to the notion of linearizability. Given any
for Maintaining Globally Consistent Distributed State in Unreliable sequential specification Φs , “linearizable Φs ” can be defined as the
Environments. In Proceedings of POPL, pages 195–208, 2005. smallest asynchronous specification that contains Φs . Every lin-
[8] M.J. Fischer, N.A. Lynch, and M.S. Paterson. Impossibility of earizable specification satifies the asynchronous property defined
Distributed Consensus with one Faulty Process. Journal of the ACM above, but not all asynchronous specifications are linearizable.
Idempotence. Two requests q1 and q2 (in the same history) are has been initiated. This has the effect of creating a new agent
said to be duplicates if q1 = q2 . Two responses (q1 , r1 ) and v I e. Rule IDFF-USE allows the computation idff v e to proceed
(q2 , r2 ) (in the same history) are said to be duplicates if q1 = q2 . once the created agent completes evaluation (as indicated by the
presence of an agent of the form v I w, where w is a value). The
D EFINITION A.2. A specification Φ is said to idempotent iff: same rule also applies to “duplicate” evaluations with the same id.
1. Duplicate requests have no effect: αqβqγ ∈ Φ iff αqβγ ∈ Φ.
2. Duplicates responses have the same value:
α(q, o1 )β(q, o2 )γ ∈ Φ ⇒ o1 = o2 .
3. Duplicate responses are allowed: αrβrγ ∈ Φ iff αrβγ ∈ Φ.
(The above definition is intended for asynchronous specifica-
tions. Hence, the asynchrony conditions have been used to simplify
the definition.)
Idempotence Closure. We define a history to be repetitionless
if it contains no duplicate requests or duplicate responses. We
define a specification Φ to be repetitionless if all histories in Φ are
repetitionless.
Given a repetitionless specification Φ, we define its idempo-
tence closure idem(Φ) to be the smallest specification that contains
Φ and is idempotent.
We now summarize our goal: given a program p that satisfies a
repetitionless specification Φ in the absence of process failures and
duplicate requests, construct a program p0 that satisfies idem(Φ)
even in the presence of process failures and duplicate requests.
Extension The above definitions can be generalized to permit
requests to be of the form (k, v), where k is a unique-identifier
(key) for an input request, and allowing responses to be of the
form (k, o), where k is the unique-identifier of the input request
for which the response is produced. Note that the above definition
does not capture the progress conditions of failure-freedom.