Fault Tolerance Via Idempotence

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Fault Tolerance via Idempotence

G. Ramalingam and Kapil Vaswani


Microsoft Research, India
grama,kapilv@microsoft.com

Abstract ing comes with its own pitfalls, such as process failures, imperfect
Building distributed services and applications is challenging due messaging, asynchrony and concurrency.
to the pitfalls of distribution such as process and communication Consider the prototypical bank account transfer service in
failures. A natural solution to these problems is to detect potential Fig. 1. The goal of the service is to transfer money between bank
failures, and retry the failed computation and/or resend messages. accounts, potentially in different banks. If the accounts belong to
Ensuring correctness in such an environment requires distributed different banks, ensuring that the transfer executes as an atomic
services and applications to be idempotent. (distributed) transaction is usually not feasible, and the natural way
In this paper, we study the inter-related aspects of process fail- of expressing this computation is as a workflow [10, 20] consisting
ures, duplicate messages, and idempotence. We first introduce a of two steps, a debit followed by a credit.
simple core language (based on λ-calculus) inspired by modern dis- What if the process executing the workflow fails in between the
tributed computing platforms. This language formalizes the notions debit and credit steps? A natural solution is to detect this failure
of a service, duplicate requests, process failures, data partitioning, and ensure that a different process completes the remaining steps
and local atomic transactions that are restricted to a single store. of the workflow. A challenging1 aspect of realizing this solution
We then formalize a desired (generic) correctness criterion for is figuring out whether the original process failed before or after
applications written in this language, consisting of idempotence completing a particular step (either debit or credit). If not done
(which captures the desired safety properties) and failure-freedom carefully, the debit or credit step may be executed multiple times,
(which captures the desired progress properties). leading to further correctness concerns. Services often rely on a
We then propose language support in the form of a monad that central workflow manager to manage process failures during the
automatically ensures failfree idempotence. A key characteristic of workflow (using distributed transactions).
our implementation is that it is decentralized and does not require Now consider a (seemingly) different problem. Messages sent
distributed coordination. We show that the language support can between the client initiating the transfer and the service may be lost.
be enriched with other useful constructs, such as compensations, The only option for a client, when it does not receive a response
while retaining the coordination-free decentralized nature of the within some reasonable time, is to resend its request. Yet the client
implementation. does not want the transfer to occur twice!
We have implemented the idempotence monad (and its variants) In this paper, we study process and communication failures in
in F# and C# and used our implementation to build realistic appli- the context of workflows. The seemingly different problems caused
cations on Windows Azure. We find that the monad has low runtime by process and communication failures are, in fact, inter-related.
overheads and leads to more declarative applications. Idempotence, a correctness criterion that requires the system to tol-
erate duplicate requests, is the key to handling both communication
Categories and Subject Descriptions D.4.5 [Operating Systems]: and process failures efficiently. Idempotence, when combined with
Reliability—Fault-tolerance; C.2.4 [Computer-Communication Net- retry, gives us the essence of a workflow, a fault tolerant composi-
works]: Distributed Systems—Client/server, Distributed applica- tion of atomic actions, for free without the need for distributed co-
tions ordination. In the transfer example, a fault tolerant account trans-
fer can be implemented without a central workflow manager if the
General Terms Reliability, Languages, Design debit and credit steps can be designed to be idempotent,
Keywords fault tolerance, idempotence, workflow, transaction, Formalizing Failfree Idempotence. In this paper, we introduce a
monad simple core language λFAIL , inspired by contemporary cloud plat-
forms. This language formalizes process failure, duplicate requests,
1. Introduction partitioned data, and local transactions. A local transaction pro-
Distributed computing is becoming mainstream. Several modern vides ACID guarantees but is restricted to access data within a sin-
platforms offer virtualized distributed systems at low entry cost gle partition (typically a single server). Computations in λFAIL are
with the promise of scaling out on demand. But distributed comput- like workflows, but without any fault-tolerance guarantees for the
composition (i.e., the computation may fail between transactions).
We then formalize a generic correctness criterion for applica-
tions written in λFAIL . A simple, powerful and tempting criterion is
that an application’s behavior in the presence of duplicate requests
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
and process failures should be indistinguishable from its behav-
for profit or commercial advantage and that copies bear this notice and the full citation ior in the absence of duplicate requests and failures. We formal-
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee. 1 Ingeneral, detecting failures perfectly in an asynchronous, message pass-
POPL’13, January 23–25, 2013, Rome, Italy. ing system is impossible [8]. Conservative failure detection can also lead to
Copyright c 2013 ACM 978-1-4503-1832-7/13/01. . . $10.00 the same problem of duplicated computation.
let process (request) = workflow monad [2, 4]. We find that the core business logic in these
match request with applications can be declaratively expressed using the monad. Our
| (“getBalance”, (branch, account)) → evaluation shows that performance overheads of using the monad
atomic branch {lookup account}
| (“transfer”, (fromBranch, fromAccount, toBranch, toAccount, amt) →
over hand-coded implementations are statistically insignificant.
atomic fromBranch { The rest of the paper is organized as follows. In Section 2, we
update fromAccount ((lookup fromAccount) − amt) introduce a language λFAIL and formalize duplicate requests and
}; process failures. We formalize what it means for a λFAIL application
atomic toBranch { to correctly tolerate duplicate requests and failures. In Section 3,
update toAccount ((lookup toAccount) + amt) we present the idempotence monad and show how it can be used to
}; tolerate duplicate requests as well as process failures. In Section 4,
“Transfer complete.” we describe extensions of the idempotence construct. In Section 5,
we evaluate the idempotence monad and our implementation from
Figure 1: A banking service example, in syntactically sugared the perspective of expressiveness, benefits and overheads. Section 6
λFAIL , that is neither idempotent nor fault-tolerant. discusses related work.

2. Failfree Idempotence
ize a slightly weaker, but more appropriate, correctness criterion, In this section we present a language λFAIL that distils essential el-
namely failure-freedom modulo message duplication. Informally, ements of distributed computing platforms such as Windows Azure
this criterion permits the system to send duplicate responses. This and formalize the concept of failfree idempotence.
weakening is appropriate from the perspective of composition: if
the recipient of the responses can also tolerate duplicate messages, 2.1 The Language λFAIL
then the sender is freed of the obligation to send the response ex-
actly once. Informal Overview. A λFAIL program e represents a service that
receives input requests and produces output responses. An input
Automating Idempotence. Next, we address the problem of auto- request v is processed by creating an agent to evaluate e v. When
matically ensuring idempotence for a service. We present our solu- the evaluation of e v terminates, producing a value v 0 , v 0 is sent
tion as a monad, the idempotence monad. We then show that idem- back as the response. Multiple input requests can be processed con-
potence, when coupled with a simple retry mechanism, provides a currently, and their evaluation can be interleaved. Shared, mutable,
“free” solution to the problem of tolerating process failures, guaran- persistent data is stored in tables.
teeing failure-freedom modulo message duplication. We then pro- Agents. An agent has its own internal state (captured by local
pose dedicated language support for idempotent computations. variables of the code). An agent may fail at any point in time. A
failure models problems such as hardware failure, software crashes
Decentralized Idempotence and Workflow. The idea underlying and reboots . Data stored in tables is persistent and is unaffected by
the idempotence monad is conceptually simple, but tedious to im- agent failures.
plement manually (i.e., without the monad). Given a unique iden- Tables. Tables are persistent maps. They provide primitives to
tifier associated with a computation, the monad essentially adds update the value bound to a key and lookup up the value asso-
logging and checking to each effectful step in the workflow to en- ciated with a key. The language provides a limited form of an
sure idempotance. An important characteristic of our implementa- atomic transaction, which enables a set of operations on the same
tion (of the monad) is that it is designed to work with contemporary table to be performed transactionally. Specifically, the construct
distributed storage systems such as key-value tables. Specifically, it “atomic t e” evaluates e in the context of table t (lookups and
does not assume the presence of dedicated storage for logs that can updates are permitted only on table t within e), guaranteeing iso-
be accessed atomically with each transaction. The monad reuses lation (no other agent can access t in the middle of e’s evaluation)
the underlying store (in this case a key-value table) to simulate a and the all-or-nothing property (no process failure can happen in
distinct address space for logging. the middle of e’s evaluation).
This leads to a decentralized implementation of idempotence
that does not require any centralized storage or any (distributed) Example. Fig. 1 presents a simple bank-account-transfer exam-
coordination between different stores. Thus, the implementation ple in syntactically sugared λFAIL . This example is neither idempo-
of idempotance preserves the decentralized nature of the under- tent (e.g., executing the same transfer request twice is not equiva-
lying computation. This, in turn, leads to a completely decentral- lent to executing it once) nor fault-tolerant (e.g., if the agent pro-
ized implementation of a (fault-tolerant) workflow, unlike tradi- cessing a transfer request fails in between the debit and credit
tional workflow implementations, which use a centralized work- steps).
flow coordinator and/or a centralized repository for runtime status
information. Syntax. Fig. 2 presents the syntax of λFAIL , which extends λ-
calculus with the primitive operations on tables explained above.
Extensions. We then extend the idempotence monad with other In the rest of the paper, we will use extensions such as natural
useful constructs, while preserving the decentralized nature of the numbers, arithmetic operators, and ordered pairs for brevity. These
construct. One extension allows the application to associate each can be encoded in the core language or added to it in the usual
transaction in a workflow with a compensating action. Another ex- fashion. We also use syntactic sugar, such as lookup e, where
tension allows the application to generate intermediate responses e 6∈ hVali, as shorthand, e.g., for (λx.lookup x)e.
to client requests and then asynchronously retry the requests on
client’s behalf. This idiom, especially useful in long running com- Semantic Domains Fig. 2 defines the semantic domains used in
putations, frees the client from having to track status of requests, the semantics of λFAIL . Let hVali denote the set of all basic values:
and leads to more responsive clients. these basically consist of function abstractions. (As usual, natural
numbers, string constants, ordered pairs of values, etc. can all be
Implementation. We have implemented the idempotence work- encoded within hVali or added to it.)
flow monad in F# targeting the Windows Azure platform. We have Let hValiopt represent the set of optional values (of the form
implemented several realistic applications using the idempotence NONE or SOME v). An element of ΣST (a map from hVali to
(a) Evaluation Contexts

E ::= [·] | E e | v E

(b) The Set of Evaluation Rules A used to define the standard semantics of λFAIL .
[INPUT] [OUTPUT]
v ∈ hVali vo ∈ hVali
in(v) out(vi ,vo )
he, µ, α, I, Oi ⇒ he, µ, α ] {v . (e v)}, I ∪ {v}, Oi he, µ, α ] {vi . vo }, I, Oi ⇒ he, µ, α, I, O ∪ {(vi , vo )}i
[NORMAL] [FAIL]
ω
hµ, ei hµ0 , e0 i
 
hp, µ, α ] {v . e}, I, Oi ⇒ hp, µ0 , α ] {v . e0 }, I, Oi hp, µ, α ] {v . e}, I, Oi ⇒ hp, µ, α, I, Oi
[ATOMIC] [UPDATE]
[LOOKUP]
ω∗
hµ[tn], e()i  ht, vi, v ∈ hVali
ω (t,k,v) 
hµ, atomic tn ei hµ[tn 7→ t], vi ht, update k vi  ht[k 7→ (SOME v)], 0i ht, lookup ki  ht, t[k]i
[CONTEXT] [LIFT]
[CONTEXT] [LIFT]
u [BETA]
ht, ei  ht 0 , e0 i hµ, ei
ω
hµ0 , e0 i e → e0 e → e0
u ω  
ht, E[e]i  ht 0 , E[e0 ]i hµ, E[e]i hµ0 , E[e0 ]i ht, ei  ht, e0 i hµ, ei hµ, e0 i (λx.e) v → e[v/x]
[RETRY]
v∈I

he, µ, α, I, Oi ⇒ he, µ, α ] {v . (e v)}, I, Oi

(c) Additional Rules Used To Define the Ideal Semantics of λFAIL .


[UNIQUE-INPUT] [DUPLINPUT] [DUPLSEND]
v ∈ hVali, v 6∈ I v ∈ hVali, v ∈ I v∈O
in(v) in(v) out(v)
he, µ, α, I, Oi ⇒ he, µ, α ] {v . (e v)}, I ∪ {v}, Oi he, µ, α, I, Oi ⇒ he, µ, α, I, Oi he, µ, α, I, Oi ⇒ he, µ, α, I, Oi

Figure 3: Operational semantics of λFAIL .

The state of an executing program is represented by a system


x ∈ hIdentifieri
configuration he, µ, α, I, Oi ∈ Σ, where e is the program itself,
v ∈ hVali ::= x | λx .e
µ represents the values of all tables, α is the multi-set of currently
e ∈ hExpi ::= v | e e | atomic v1 v2 |
executing agents, I represents the set of all input requests received
update v1 v2 | lookup v
so far, and O represents the set of all responses produced so far. (A
response to a request vi is a pair of the form (vi , vo ) where vo is
tn ∈ hTableNamei = hVali
the result.) Let Σ represent the set of all configurations.
t ∈ ΣST = hVali 7→ hValiopt
Let ] denote the union operator for multi-sets.
µ ∈ ΣT = hTableNamei 7→ ΣST
hReqi = hVali
hRespi = hReqi × hVali Semantics. Fig. 3 presents an operational semantics for λFAIL as
(v . e) ∈ ΣA = hReqi × hExpi r
a labelled transition relation ⇒ on the set of configurations Σ. The
hReqi
Σ = hExpi × ΣT × Σ+ A ×2 × 2hRespi evaluation of a program p starts in the initial configuration hhpii ,
hp, µI , ∅, ∅, ∅i, where µI = λt.λk.NONE. (Initially, all tables map
Figure 2: The Syntax of λFAIL and its Semantic Domains. every key to a default value of NONE. We utilize a standard encoding
of optional values consisting of either NONE or SOME v.)
r
System Transitions. (⇒) As rule INPUT indicates, the arrival of
an input request v spawns a new agent to evaluate e v. As rule OUT-
hValiopt ) represents the value of a single table. An element µ ∈ ΣT PUT indicates, when an agent’s evaluation completes, the resulting
represents the values of all tables. value is sent back as a response. The labels on system transitions
As explained earlier, an agent represents a thread of computa- represent requests and responses. Rule NORMAL describes a nor-
tion spawned to process a given input request. The state of an agent mal system transition caused by a potentially effectful execution
is represented by a pair of the form v . e, where v represents the step performed by a single agent, described below. As the rule in-
input request being processed and e represents a partially evaluated dicates, the execution steps of different agents can be interleaved in
expression (and represents the local state of the agent). Let ΣA rep- a non-deterministic order. Rule FAIL indicates that an agent can fail
resent the set of all agent-states. at any point in time.
ω
Agent Transitions. ( ) Execution steps in the evaluation of a DUPLSEND. We define IDEAL to be A \ { FAIL, RETRY, INPUT} ∪
ω
single agent are described by the transition relation on ΣT × { UNIQUE-INPUT, DUPLINPUT, DUPLSEND}. We refer to ⇒IDEAL as
hExpi. A transition hµ, ei
ω 0 0
hµ , e i indicates that an agent expres- the ideal semantics for λFAIL .
sion e is transformed to an agent expression e0 , with the side-effect Observational Idempotence (Safety). We now consider two
of transforming the table-state from µ to µ0 . The label ω represents (well-studied) notions of behavioral equivalence in formalizing
a sequence of updates to a single table performed atomically in this our correctness criterion. Recall that hhpii denotes the initial con-
step. (This label, however, identifies an internal transition not visi- figuration in an execution of program p (which is the same under
ble externally, which is why the label is omitted in the correspond- both semantics). Given a labelled transition relation ⇒ on config-
ing system transition in rule NORMAL.) These transitions are of two urations, an execution of a program p (with respect to ⇒) is an
types: pure (standard λ-calculus evaluation, BETA), and effectful, r1
alternating sequence of states and labels, denoted σ0 ⇒ σ1 · · · σn ,
which take the form of atomic table operations. representing a sequence of transitions starting from the initial pro-
Atomic Table Operations. The expression “atomic t e” iden- gram state σ0 = hhpii. We say that the observed behavior obs(ξ)
tifies a set of operations to be performed on a single table t in of an execution ξ is the sequence of non- labels in ξ. Note that
an atomic and failfree fashion. Its semantics is defined by rule obs(ξ) is a sequence of (input) requests and (output) responses.
u
ATOMIC, which utilizes a transition relation  on atomic evalua- Specifically, it does not include updates to tables, which are inter-
tion configurations of the form ht, ei, which indicates the atomic nal transitions but not visible externally.
evaluation of an expression e at a table t. The labels on such tran-
sitions are either  or represent a single update to a table. Rules UP- D EFINITION 2.1. We say that a λFAIL program p is observationally
DATE/LOOKUP define the semantics of an update/lookup operation idempotent if for every execution ξ1 of p under the standard seman-
on a table. The rule ATOMIC indicates that no other execution step tics there exists an execution ξ2 of p under the ideal semantics such
interleaves with the evaluation of an atomic expression. Note that that obs(ξ1 ) = obs(ξ2 ).
the evaluation of an atomic expression cannot fail in the middle. In We present a simpler, more abstract, formalization of idempo-
other words, either all effects of the atomic expression evaluation tence in the appendix. However, the preceding definition in terms
happen or none does. of the ideal semantics will be useful in formalizing progress prop-
Retry. The rule RETRY models one of the key ingredients used erties, as below, and in proving correctness of our implementation.
to tolerate process failures, namely a retry mechanism. The rule
indicates that a request must be retried periodically. Typically, Failfree Idempotence (Progress). An observationally idempotent
retrying logic is built into clients. A client will resend a request if it program, by definition, gives no progress guarantees. Consider a
does not receive a response within a pre-defined amount of time. modified version of the account-transfer example that checks the
This basic scheme can be optimized, as discussed in Section 4: input request to determine if it is a duplicate request and processes
the system (application) can send an acknowledgement back to the it only if it is not a duplicate. This ensures that the program is
client, after which the client can stop resending a request and the idempotent. However, if the agent fails in between the debit and
application takes on the responsibility of retrying the request (to credit steps, we would still have a problem. This motivates the
ensure progress in the presence of failures). The system can exploit following stronger correctness condition, based on the notion of
various optimizations in implementing the retry logic, but rule weak bisimulation. A labelled transition system (S, ⇒) consists of
`
RETRY suffices for our purpose here. As we will soon see, the key a relation ⇒ ⊆ S × S for every ` ∈ hLabeli.
reason for adding the RETRY rule to the semantics is to formalize
a weakened progress guarantee (“progress modulo retries”) that is D EFINITION 2.2. A weak bisimulation between two labelled tran-
appropriate in the presence of failures. sition systems (Σ1 , ⇒1 ) and (Σ2 , ⇒2 ) is a relation ∼ ⊆ Σ1 × Σ2
such that for any σ1 ∼ σ2 we have:
2.2 Formalizing Failfree Idempotence `  ∗ `
1. σ1 ⇒1 σ10 ∧ ` 6=  ⇒ ∃σ20 , σ200 .σ2 ⇒2 σ200 ⇒2 σ20 ∧ σ10 ∼ σ20
We now formalize a natural correctness goal of any λFAIL program: `  ∗ `
namely, that it correctly handles process failures and duplicate 2. σ2 ⇒2 σ20 ∧` 6=  ⇒ ∃σ10 , σ100 .σ1 ⇒1 σ100 ⇒1 σ10 ∧ σ10 ∼ σ20
  ∗ 0
messages. We will later see how we can automatically ensure this 3. σ1 ⇒1 σ10 ⇒ 0 0
∃σ2 .σ2 ⇒2 σ2 ∧ σ1 ∼ σ2 0
  ∗
property for any program. We formalize this correctness criterion 4. σ2 ⇒2 σ20 ⇒ ∃σ10 , σ10 .σ1 ⇒1 σ10 ∧ σ10 ∼ σ20
as follows. We define an alternative semantics for λFAIL , which we
refer to as the ideal semantics, representing an idealized execution We will write (σ1 , Σ1 , ⇒1 ) ' (σ2 , Σ2 , ⇒2 ) to indicate that
platform. We then define a program to be “correct” iff its behavior there exists a weak bisimulation ∼ between (Σr1 , ⇒1 ) and (Σr2 , ⇒2
under the standard semantics is equivalent to its behavior under the ) under which σ1 ∼ σ2 , where Σri represents the set of states in Σi
ideal semantics. that are reachable from σi via a sequence of ⇒i transitions. We
will omit Σ1 and Σ2 in this notation if no confusion is likely.
Ideal Semantics. Let A denote the set of all rules defined in
Fig. 3(b). We will define the ideal semantics as a different labelled D EFINITION 2.3. A λFAIL program p is said to be failfree idempo-
transition relation on program configurations, by adding and re- tent iff (hhpii, ⇒A ) ' (hhpii, ⇒IDEAL ).
moving some rules to set A. Let ⇒S denote the labelled transition This definition requires a failfree idempotent program to pro-
relation on Σ induced by a given set of rules S. Thus, ⇒A is the vide progress guarantees: at any point in time, if the system can pro-
transition relation capturing the standard semantics of λFAIL . duce a response r under the ideal semantics, then the system should
We first omit rule FAIL eliminating process failures from the be capable of producing the same response r under the standard
ideal semantics. In the ideal semantics, we assume that all input semantics also. However, the inclusion of rule RETRY in the stan-
requests are distinct, by replacing the INPUT rule by the UNIQUE- dard semantics means that this progress guarantee holds provided
INPUT and DUPLINPUT rules. We also drop rule RETRY. Finally, requests are retried. Absolute progress guarantees are not possible
process failures make it hard to ensure that an application sends an since an agent may fail before it executes even its first step.
output message exactly once. A common approach to this problem
is to weaken the specification and permit the application to send the T HEOREM 2.4. A failfree idempotent program is also observation-
same output message more than once. We do this by adding rule ally idempotent.
3. Whenever an effectful step is executed, we simultaneously per-
let imreturn v = fun (guid, tc) → (tc, v) sistently record the fact that this step has executed and save the
value produced by this step.
let imatomic T f =
4. Every effectful step is modified to first check if this step has
fun (guid, tc) →
atomic T { already been executed. If it has, then the previously saved value
let key = (0, (guid, tc)) in (for this step) is used instead of executing the step again.
match lookup key with
| Some(v) → (v,tc+1)
Note that λFAIL does not have any non-deterministic construct
| None → (in a single agent’s evaluation). Non-deterministic constructs can
let v = f () in (update key v); (v,tc+1) be supported by treating them as an effectful step so that once a
} non-deterministic choice is made, any re-execution of the same step
makes the same choice.
let imbind (idf, f) =
fun (guid, tc) → Details. We now describe in detail how individual computation
let (ntc, v) = idf (guid, tc) in steps can be made idempotent and how these idempotent steps can
f v (guid, ntc) be composed together into idempotent computation. Our solution
is essentially a monad (Fig. 4).
let imrun idf x = We represent an idempotent computation as a function that takes
let (j,v) = idf x (x, 0) in v a tuple (guid , tc) as a parameter (where guid and tc represent an
identifier and a step number used to uniquely identify steps in a
let imupdate key val = update (1,key) val
computation) and returns a value along with a new step number.
let imlookup key = lookup (1,key)
We can “execute” an idempotent computation idf as shown by the
function imrun, using the function’s argument itself as the guid
Figure 4: The Idempotence Monad. value and an initial step number of 0.
The function imreturn (the monadic “return”) lifts primitive
values to idempotent computations. This transformation can be
Failfree Idempotent Realization. Failfree idempotence is a generic used for any pure (side-effect-free) expressions.
correctness property we expect of a λFAIL program. More generally, Side-effects (table operations) are permitted only inside the
the following definition combines this property with a “specifica- atomic construct. The function imatomic is used to make a local
tion” (provided as another λFAIL program q). transaction idempotent. Specifically, “imatomic T fm ” is an idem-
potent representation for “atomic T f”, where fm is the monadic
D EFINITION 2.5. A λFAIL program p is said to be a failfree idem- form of f constructed as described later. As explained above, this
potent realization of q iff (hhpii, ⇒A ) ' (hhqii, ⇒IDEAL ). is represented as a function that takes a pair (guid , tc) as a param-
eter and realizes the memoization strategy described above. The
3. Realizing Failfree Idempotence pair (guid , tc) is used as a unique identifier for this step. We check
whether this step has already executed. If so, we return the pre-
We now present a generic, application-independent, strategy that viously computed value. If not, we execute this computation step
can be used to ensure failfree idempotence. Informally, a function and memoize the computed value. In either case, the step number
is idempotent if multiple evaluations of the function on the same is incremented in this process and returned.
argument, potentially concurrently, behave the same as a single It is, however, critical to do all of the above steps atomically
evaluation of that function with the same argument. Consider the to ensure that even if two agents concurrently attempt to execute
example in Fig. 1. In this example, the parameter requestId serves the same computation, only one of them will actually execute
to distinguish between different transfer requests (and identify du- it. However, note that an atomic expression is allowed to access
plicate requests). This service can be made idempotent and failfree only a single table. Hence, the “memoized” information for this
by (a) using this identifier to log debit and credit operations, when- computation step must be stored in the same table that is accessed
ever the operations are performed, and (b) modifying the debit and by the computation step. However, we must keep our memoization
credit steps to check, using the log, if the steps have already been and book-keeping information logically distinct from the program’s
performed. This strategy can ensure that multiple (potentially par- own data stored in the same table. We achieve this by creating two
tial and concurrent) invocations of transfer with the same identifier distinct “address spaces” (of key values) for the table. We convert
have the same effect as a single invocation. a key value k used by our idempotence implementation to (0, k)
Manually ensuring idempotence is tedious and it introduces the and convert a key value k used by the program itself to (1, k).
possibility of various subtle bugs and in general makes implementa- The functions imupdate and imlookup do this wrapping for the
tion less comprehensible. We now describe a monad-based library user’s code. Thus, the expression f to be evaluated in the atomic
that realizes idempotence and failure-freedom in a generic way. transaction is transformed to its monadic representation fm by
replacing lookup and update in f by imlookup and imupdate
3.1 The Idempotence Monad
respectively.
The Intuition. A computation performed by a single agent con- We now consider how to compose individual computation steps
sists of a sequence of steps, “pure” as well as “effectful” ones in an idempotent computation (the monadic “bind” function). Con-
(namely, atomic local transactions which can read/update a single sider a computation of the form “let x = step1 in step2”
table.) We use the following strategy to make a computation idem- which is equivalent to “(λx.step2)step1”. We transform step1 to
potent: its monadic form, say idf. We transform step2 to its idempotent
form, say g. The function bind, applied to idf and λx.g, produces
1. Associate every computation instance (that we wish to make the idempotent form of the whole computation. (A formal and more
idempotent) with a unique identifier. complete description of the transformation is presented later.)
2. Associate every step in an idempotent computation with a The result of “imbind idf f” is defined as follows: it is an
unique number. idempotent function that takes a parameter (guid , tc), and invokes
idf (the first step). It uses the value and the step number returned by P ROOF S KETCH. Let ξ be an operationally idempotent execution
the idempotent function and invoke the second idempotent function of a program p under the standard semantics ⇒A . We show how
f . Thus the monad effectively threads the step number through the to construct an execution ξ 0 of p under the ⇒IDEAL semantics such
computation, incrementing it in every atomic transaction. that obs(ξ) = obs(ξ 0 ).
Note that a key characteristic of this implementation is that it is We first omit any transitions in ξ due to the FAIL rule since their
completely decentralized. When a transaction completes, the cur- labels are empty.
rent agent simply attempts to execute the next (idempotent) transac- We next omit the transitions corresponding to repeated execution-
tion in the workflow; no coordination with a centralized transaction steps:
ω
manager is required. In general, distributed coordination requires The key point to note here is that an execution-step a 99K b
the use of expensive blocking protocols such as 2 phase commit. In affects the (legality of) subsequent transitions in two ways: indi-
contrast, a workflow implementation based on this idea of idempo- rectly through the side-effects ω on the tables and directly through
tence coupled with (client-side) retry logic are non-blocking. The b (which may take part in subsequent transitions). A repeated
implementation does not have a single point-of-failure or single execution-step is redundant in any idempotent execution that has no
performance bottleneck, which can lead to increased scalability. failures (transitions due to the FAIL rule): it has no side-effects on
Also note that this implementation creates one log entry per the tables, and if the value b is subsequently used in a non-repeated
transaction. Once logging for idempotence is separated from the execution-step, then we can show that another agent identical to b
core business logic, it can be optimized in several ways. We can already exists in the configuration.
avoid redundant logging if the underlying storage engine maintains We then omit any transition due to the RETRY rule. We finally
its own log. Logging can be avoided if the transaction is (declared replace INPUT transitions corresponding to a duplicate and replace
to be) semantically idempotent. INPUT transitions corresponding to a non-duplicate input request by
a UNIQUE-INPUT transition.
Example. The monadic library lets us construct the monadic ver-
This leaves us with duplicate responses that may be produced
sion em of any λFAIL expression e, by using the monadic version
by the OUTPUT rule for the same input (due to multiple agents
of any primitive and the monadic bind in place of function appli-
that process it). We replace these transitions by corresponding
cation. E.g., the monadic version of the following code fragment
DUPLSEND transition.
(from the money transfer example)
This transformation produces an execution ξ 0 of p under the
atomic fb {update fa ((lookup fa) − amt)}; ⇒IDEAL semantics such that obs(ξ) = obs(ξ 0 ). 
atomic tb {update ta ((lookup ta) + amt)};
Our next goal is to show that all executions of programs written
is the following:
using the idempotence monad are operationally idempotent. How-
imbind ever, this claim requires the programs to be well-typed, as defined
(imatomic fb {imupdate fa ((imlookup fa) − amt)}) below.
{imatomic tb {imupdate ta ((imlookup ta) + amt)}}
where {e} is shorthand for λx.e, where x is not free in e. Well-Typed IM Programs. The idempotence monad can be used
to write monadic programs in the usual way. The monad may be
3.2 Idempotence Monad Programs Are Failfree thought of as defining a new language λIM , obtained from λFAIL
by replacing the keywords atomic, lookup, and update by the
In this section, we show that programs written using the idempo-
keywords imatomic, imlookup, imupdate, imbind, and imrun.
tence monad are failfree.
A λIM program can also be thought as a λFAIL program, using the
Idempotent Executions. Consider any execution of a λFAIL pro- definition of the monad in Fig. 4.
gram. We refer to any transition generated by rule NORMAL as an We first mention a few type restrictions used to simplify presen-
ω tation. We refer to λ-calculus terms as pure and their types as pure
execution-step and identify it by the triple (v . e) 99K (v . e0 ).
ω types. We assume that the types of keys and values of all tables are
Thus, an execution-step a 99K b represents a transition caused by pure types. We assume that the types of the input request and the
an agent a that transforms to an agent b, with a side-effect ω on output response are also pure types.
ω
the tables. An execution-step a 99K b, after an execution ξ, is said We use hτk , τv i to denote the type of a table with keys of type
to be a repeated execution-step if the execution ξ contains some τk and values of type τv . We assume that table names come from
ω0
execution-step of the form a 99K b0 . (Note that this does not sig- some fixed set of identifiers and that the typing environment maps
nify a cycle in the execution since the states of the tables could be these table names to their types.
different in the two corresponding configurations.) The non-pure terms can be classified into two kinds. The first
kind are expressions (such as imupdate k v), which are meant to
D EFINITION 3.1. An execution is said to be operationally idempo- be evaluated in the context of a single table (as part of a local trans-
ω ω0 action). Fig. 5 presents typing rules for intra-transaction expres-
tent iff for any two execution-steps a 99K b and a0 99K b0 in the
sions. The type τ !hτk , τv i signifies a term that can be evaluated in
execution (in that order), a = a0 implies that b = b0 and ω 0 is
the context of a table of type hτk , τv i, producing a value of type
empty.
τ . The second kind of expressions are the ones that use the idem-
Note that operational idempotence is a property that involves the potence monad, used to represent workflow computations which
side-effects on the tables, unlike observational idempotence. As we execute one or more local transactions. Fig. 6 presents typing rules
show below, it is a stronger property that can be used to establish for such expressions (which are the standard monad typing rules).
observational idempotence. In the sequel, we will use the term well-typed IM program to
refer to any expression e such that Γtn |= e : τi →ID τo , where τi
L EMMA 3.2. Any operationally idempotent execution ξ of a pro- and τo are, respectively, the types of input and output messages, and
gram p (in the standard semantics) is observationally equivalent to Γtn provides the typings of tables. This is essentialy a program of
some ideal execution ξ 0 of p (in the ideal semantics): i.e., obs(ξ) = the form imrun e0 , where e0 is constructed from the other monadic
obs(ξ 0 ). constructs.
[IMLOOKUP] [IMUPDATE] [AT-VAR]
Γ |= e : τk !hτk , τv i Γ |= e1 : τk !hτk , τv i Γ |= e2 : τv !hτk , τv i
Γ |= imlookup e : τv !hτk , τv i Γ |= imupdate e1 e2 : unit!hτk , τv i x : τ |= x : τ !hτk , τv i
[AT-LAMBDA] [AT-APPLY]
Γ, x : τ1 |= e : τ2 !hτk , τv i Γ |= e1 : (τ1
hτk ,τv i
→ τ2 )!hτk , τv i Γ |= e2 : τ1 !hτk , τv i
hτk ,τv i
Γ |= λx.e : (τ1 → τ2 !hτk , τv i) Γ |= e1 e2 : τ2 !hτk , τv i

Figure 5: A type system for intra-transaction expressions.

[IMATOMIC]
[IMRETURN]
hτ ,τv i
Γ |= e : τ Γ |= t : hτk , τv i Γ |= e : (unit k→ τ )!hτk , τv i
Γ |= imreturn e : IM τ Γ |= imatomic t e : IM τ
[IMBIND] [IMRUN]
Γ |= e2 : τ1 → IM τ2 Γ |= e1 : IM τ1 Γ |= e : τ1 → IM τ2
Γ |= imbind e1 e2 : IM τ2 Γ |= imrun e : τ1 →ID τ2

[VAR] [LAMBDA] [APPLY]


Γ, x : τ1 |= e : τ2 Γ, x : τ1 |= e : τ2
x : τ |= x : τ Γ |= λx.e : τ1 → τ2 Γ |= λx.e : τ1 → τ2

Figure 6: A type system for idempotence monad.

L EMMA 3.3. All executions of a well-typed IM program are oper- define the relation ' between ΣpA and ΣpI by hp, µ1 , α1 , I1 , O1 i '
ationally idempotent. hp, µ2 , α2 , I2 , O2 i iff I1 = I2 and µ1 = µ2 .
P ROOF S KETCH. Let us refer to a value of the form (0, k) (used as Note that in the above definition, the condition I1 = I2 im-
a key for a table lookup/update operation) as a system key. Consider plies that both configurations have received the same set of input
any execution of a well-typed IM program. It is easy to verify that requests. The condition µ1 = µ2 implies that the processing of
once the value associated with a system key (0, k) in any table is each input request v in both configurations have gone through iden-
set to be v, it is never subsequently modified. tical sequences of effectful steps so far. (This follows since every
ω ω0 effectful step is memoized and is part of the table state µ1 and µ2 .)
Consider any two execution-steps a 99K b and a0 99K b0 in the
execution (in that order) where a = a0 . The idempotence property T HEOREM 3.6. If p is a well-typed IM program, then ' is a weak
follows trivially whenever these steps are effect-free steps. The bisimulation between (ΣpA , ⇒A ) and (ΣpI , ⇒IDEAL ).
only effectful steps are produced by the evaluation of an imatomic
construct. Since a = a0 , the key value used in the evaluation of P ROOF S KETCH. Consider any σ1 ' σ2 . The interesting transi-
imatomic in both steps must be identical. The implementation tions are those that are effectful (involving an atomic operation) or
of imatomic guarantees that whatever value is produced by the produce an output. If either σ1 or σ2 can perform an interesting
evaluation of imatomic in the first step will be memoized with the transition, we must show that the other can perform an equivalent
given value. This guarantees that the second step will find this same transition (possibly after a sequence of silent transitions).
value in the table (thanks to the property described earlier). Hence, Consider the case when an agent v . e in σ1 can perform an
it will evaluate to the same value and will have no side-effects on effectful transition in the standard semantics. Let v . e0 be the state
the table. It follows that the execution is idempotent.  of the same agent after its most recent imatomic evaluation. Then,
we must have some side-effect-free evaluation e0 → e1 · · · ek = e.
Consider configuration σ2 . By definition, we restrict our attention
T HEOREM 3.4. Any well-typed IM program is observationally to reachable states. Hence, there exists some execution in the ideal
idempotent. semantics that produces σ2 . This execution must have received in-
P ROOF S KETCH. Follows immediately from the previous two lem- put request v and produced an agent v . (p v). Consider the evalu-
mas.  ation of this agent. The effectful steps in this agent’s evaluation in
the ideal semantics must have produced the same sequence of val-
As stated earlier, observational idempotence does not give us ues as in the evaluation in the standard semantics (since the mem-
any progress guarantees in the presence of failures. We now estab- oized values for these steps are the same in both σ1 and σ2 ). Thus,
lish progress guarantees in the form of a weak bisimulation. σ2 must have some agent of the form v . ei for some 0 ≤ i ≤ k.
This will produce, after zero or more silent transitions, the agent
D EFINITION 3.5. Let p be any λFAIL program. Let ΣpA denote the v . e that can then perform the same effectful transition in the ideal
set of configurations { σ ∈ Σ | hhpii ⇒∗A σ } that can be produced semantics as in the standard semantics.
by the execution of p under the standard semantics. Let ΣpI denote Consider the case when an agent v . e in σ2 can perform an ef-
the set of configurations { σ ∈ Σ | hhpii ⇒∗IDEAL σ } that can fectful transition in the ideal semantics. We can create a new agent
be produced by the execution of p under the ideal semantics. We v . (p v) using rule RETRY in the standard semantics. We can then
lation preserves types: translation of a well-typed λFAIL program
[x]V = x (satisfying the restrictions mentioned above) will produced a well-
[λx.e]V = λx.[e]I typed monadic program.
[x]I = imreturn x 3.4 Failfree Realization Via Idempotence
[λx.e]I = imreturn (λx.[e]I ) Given any λFAIL expression e, we can construct its monadic version
[n e]I = imbind [n]I (λx.[x e]I ) ,n 6∈ hVali em as explained above. The preceding results imply that em must
[v n]I = imbind [n]I (λx.[v x]I ) ,v ∈ hVali, n 6∈ hValibe failfree. We now establish a stronger result, namely that em is
[v v 0 ]I = [v]V [v 0 ]V ,v, v 0 ∈ hVali a failfree realization of e. (Failure-freedom is a weak correctness
[atomic x e]I = A
himatomic x [e] , fv i criterion. It indicates that a program’s behavior under the standard
where fv = {x} ∪ freevars(e) semantics is equivalent to it’s behavior under the ideal semantics.)
The notion of failure-freedom does let us simplify verifying cor-
[x]A = x rectness of a program by considering only its behavior under the
[λx.e]A = λx.[e]A ideal semantics. In particular, we have:
[e1 e2 ]A = [e1 ]A [e2 ]A T HEOREM 3.8. If p and q are weakly bisimilar under the ideal se-
[update e1 e2 ]A = imupdate [e1 ]A [e2 ]A mantics (i.e., if (hhpii, ⇒IDEAL ) ' (hhqii, ⇒IDEAL ), and p is failfree,
[lookup e]A = imlookup [e]A then p is a failfree realization of q.
P ROOF S KETCH. Follows as we can compose the two weak bisim-
he, {}i = e
ulations together. 
he, {x} ] Y i = himbind x (λx.e), Y i
This theorem simplifies proving that a program is a failfree
Figure 7: Transforming λFAIL expressions into idempotent expres- realization of another. In particular, we have already seen that a
sions. monadic program is failfree (Theorem 3.7). Hence, to prove that em
is a failfree realization of e, it suffices to show a weak bisimilarity
between the ideal executions of a λFAIL program e and the ideal
duplicate the entire execution history of v . e (which is guaranteed executions of its monadic representation em .
to be the same in both semantics by the definition of '). Thanks
to the idempotence property, this duplicate execution will have no T HEOREM 3.9. Let em be the monadic representation of e. Then,
extra side-effects and will eventually produce the same effectful em is a failfree realization of e.
transition as in the ideal semantics.  P ROOF S KETCH. As explained above, it is sufficient to relate eval-
uations of e and em under the ideal semantics. In this setting, it is
intuitively simple to see why the monadic program “em ” simulates
T HEOREM 3.7. A well-typed IM program is failfree. the given program e. The key distinction is that the monadic imple-
mentation uses imatomic to perform any effectful step. This step
3.3 Transforming λFAIL Programs To Idempotent Programs
will first check the memoized data to see if this step has already
We have seen that the idempotence monad lets us construct failfree executed. In an ideal execution, this check will always fail, and the
programs. Given any λFAIL -program e, we can construct its “equiv- monadic implementation then performs the same effectful step as
alent” monadic representation using standard techniques (which are the original program (and memoizes it). The check is guaranteed
conceptually straightforward, though the details are intricate). The to always fail because the keys (g, i) used in distinct executions
transformation is presented in Fig. 7 (based on the transformation of imatomic are distinct. The value of g will be different for exe-
in [17]). cutions corresponding to different inputs x and y. The value i will
We make a few simplifying assumptions (type restrictions) be different for different steps corresponding to the same input x. 
about the source λFAIL program e in this transformation algorithm
(similar to those mentioned in Section 3.2). We assume that the
types of keys and values of all tables are pure types. We assume that 3.5 Idempotent Failfree Computations as a Language
intra-transaction expressions do not manipulate values of workflow Feature
type: e.g., atomic t {λx.(atomic s e)} is not allowed. We have now seen how idempotent failfree computations can be
The translation is defined using multiple translation functions. automatically realized using the idempotence monad. We now pro-
[v]V is a translation function that applies only to values (which pose a new language construct for idempotent failfree computa-
must be of the form x or λx.e). The translation function [e]I is the tions.
heart of the monadic transformation and can be applied to multi- The construct “idff v e” indicates that the computation of
transaction expressions. It uses the idempotence monad to sequence e should be failfree and should be idempotent with respect to v:
local transactions. The translation function [e]A is applied to intra- i.e., multiple (potentially concurrent) invocations of this construct
transaction expressions and it is used primarily to replace occur- with the same value v behaves as though only one of the invoca-
rences of update and lookup by imupdate and imlookup respec- tions executed. The above construct permits users to specify the
tively. Note that a single local transaction (i.e.intra-transaction ex- code fragment for which they desire automatic idempotence and
pression) evaluation is done using standard evaluation (without us- failure-freedom (provided by the compiler). This enables users to
ing any monad). The auxiliary function he, Xi is used to transform rely on other methods, perhaps application-specific, to ensure these
the monadic values used in the evaluation of multi-transactions to properties elsewhere. (For instance, some computation may be se-
standard values required in the evaluation of a single local transac- mantically idempotent already.) It also lets the users specify what
tion. should be used as computation identifier to detect duplicates, as
Finally, given a top-level λFAIL program e (which is assumed to illustrated below.
be a functional value of the form λx.e0 ), it’s monadic form em is We refer to this enhanced language as λIDFF . A formal seman-
defined to be imrun [e]V . As usual, it can be shown that the trans- tics of λIDFF appears in the appendix. We illustrate the meaning of
the first parameter of idff using the incorrect λFAIL example of Semantic Domains. The semantic domains for λIDWF are the
Fig. 1. We can wrap the idff construct around this example in the same as for λFAIL with minor extensions. The runtime expression
following two ways, with different semantics. Note that the input e ec is used to represent a workflow during its execution. Here,
in this example is a pair (reqId, req) consisting of a request-id as e represents the partially evaluated form of a workflow and ec
well as the actual request. Consider: represents the compensating action to be performed in case the
workflow needs to be aborted. Agents can also be of the form
f1 = λ(reqId , req). idff reqId (process (reqId , req))
id I ew ec , indicating an agent evaluating a workflow.
f2 = λ(reqId , req). idff (reqId , req) (process (reqId , req))
Semantics. The semantics of λIDWF is defined using a set of rules
The behavior of these two functions differ in the cases where consisting of all the rules in A used to define the semantics of λFAIL
multiple inputs arrive with the same request-id but differing in the except for FAIL and ATOMIC, plus the new rules presented in Fig. 8.
second parameter req. f1 treats such requests as the same and will The initiation of a workflow (rule IDWF-BEGIN) creates a new
process only one of them, while f2 will treat them as different agent of the form id I ew ec , provided no agent has already
requests and process them all. been created for id. This agent evaluates the workflow ew and
One of the subtle issues with the semantics and implementation tracks the compensating action ec to be performed in case of an
of the idff construct is the treatment of invocations that have the abort. Rule IDWF-END indicates how the computation proceeds
same id (first parameter) but have different expressions as the sec- once the workflow evaluation is complete (or if a previous work-
ond parameter. We take a simple approach with our semantics (that flow with the same id has already been initiated).
the effect is as if only the first invocation occurred). This has some The rules ATOMIC and ATOMIC-ABORT define the semantics of
implications for the underlying implementation (in the presence of transactions in a workflow. Informally, the expression atomic t ea ec
failures). One solution is to use a continuation-passing style com- is evaluated as follows. First, ea is evaluated atomically to produce
putation, and memoize the entire continuation in the memoization a value v. Then, as a side-effect, the compensating action is up-
step (rather than just the value computed for the step). dated to indicate that ec v should be evaluated as the first step
of the compensation before executing the original compensation.
4. Extensions Finally, the whole expression evaluates to v. Thus, note that the
value produced by the “atomic” action ea is available to the subse-
We have seen how idempotence can serve as the basis for fail- quent computation as well as the compensating action ec . When a
free composition of computations: essentially, a simple form of workflow is aborted (rules ATOMIC-ABORT and IDWF-ABORT), the
fault-tolerant workflow. In this section, we describe two extensions compensation expression is evaluated.
that enrich this construct, namely compensating actions and asyn- Rule ATOMIC of λFAIL is replaced by the pair of rules PURE-
chronous evaluation, which simplify writing applications. These ATOMIC and PURE-ATOMIC-ABORT, which describe the behavior of
concepts are not new, but what is interesting is that they can be a transaction that is not contained within a workflow. PURE-ATOMIC
integrated without affecting the light-weight, decentralized, nature describes the successful completion of a transaction, while PURE-
of our idempotence implementation. ATOMIC-ABORT describes the case where the transaction is aborted.
(The auxiliary relation  used here is defined by the rules for
4.1 Compensating Actions λFAIL .)
The idff construct allows us to compose several transactions into
Compensation monad We now describe the compensation monad
an idempotent workflow that appears to execute exactly once with-
that can be used to realize workflows with compensating actions.
out process failures. However, the lack of isolation means that when
For simplicity, we describe an implementation of the compensat-
a transaction in the workflow is executed, its precondition may not
ing monad that focuses only on logical failures and compensations.
be satisfied and we may need to abort the workflow. For example,
We assume there are no duplicate requests or process failures. We
in the transfer example (Fig. 1), the debit step may succeed, but
can realize idempotent workflows with compensating actions by
we may be unable to complete the subsequent credit step because
composing this monad and the idempotence monad [13].
the account does not exist. One way of recovering from this failure
The compensation monad, shown in Fig. 9, is a combination
is to compensate for the debit by crediting the amount back to the
of the exception monad and the continuation passing style monad.
source account. If compensating actions are correct, the workflow
Transactions return a value of the form Value(v) (on successful
can guarantee all-or-nothing semantics i.e. either all or none of the
completion) or a special value Abort to indicate that the transaction
transactions in the workflow appear to execute.
was aborted. Workflows are represented as a function in continua-
We first formalize the desired semantics of workflows with com-
tion passing style. The helper function compensateWith associates
pensating actions. We present a language λIDWF which provides
a transaction a with a compensating action to construct a primi-
language constructs to associate transactions with compensating
tive workflow. compensateWith constructs a function in continua-
actions, and to declare logical failures. Finally, λIDWF supports
tion passing style. This function first evaluates a(). If a aborts, the
a construct idworkflow id e, which composes transactions with
whole transaction aborts and returns Abort. Otherwise, the contin-
compensating actions into a workflow. We present semantics of this
uation is evaluated using the value returned by the transaction. If
construct, and then show how this construct is realized using a com-
the continuation itself aborts (because one of the following trans-
pensation monad.
actions aborts), we evaluate the compensating action and return the
Syntax. λIDWF modifies and extends λFAIL in the following ways value Abort. The monad’s return simply lifts a value (with no com-
(see Fig. 8). atomic t ea ec extends the atomic construct of λFAIL pensation) into a workflow. The monadic bind is standard for con-
by specifying ec as the compensation for the atomic transaction ea . tinuation passing style computations. The function run shows how
abort indicates that a logical failure has occured and the work- to execute a workflow by passing it an empty continuation.
flow must be aborted. idworkflow id e represents an idempotent
workflow with identifier id, where e is the workflow consisting of 4.2 Asynchronous evaluation
a composition of atomic transactions with compensations. Expres- Workflows are commonly used to perform computations involving
sion of the form e ec arise only during evaluation and are not several transactions. Consequently, workflows are often long run-
source language constructs. ning with highly variable latencies. Large latencies accentuate the
(a) Syntax and Evaluation Context

x ∈ hIdentifieri; v ∈ hVali ::= x | λx.e


e ∈ hExpi ::= x | λx .e | e e | atomic vt va vc | idworkflow vi vw | abort | update v v | lookup v | e ec
ΣA = hExpi × hExpi + hExpi × hExpi
E ::= [·] | E e | v E | E e

(b) Evaluation Rules (in addition to A \ { ATOMIC })

[IDWF-END]
[IDWF-BEGIN] id I u ∈ α u ∈ hVali
v . E[idworkflow id w] ∈ α 6 ∃e0 .id I e0 ∈ α hp, µ, α ] {v . E[idworkflow id w]}, I, Oi ⇒


hp, µ, α, I, Oi ⇒ hp, µ, α ] {id I (w() 0)}, I, Oi hp, µ, α ] {v . E[u]}, I, Oi
[ATOMIC] [ATOMIC-ABORT]
[NORMAL]
ω ω∗ ω∗
0 0
hµ, ei hµ , e i hµ[tn], ea ()i  ht, vi, v ∈ hVali hµ[tn], ea ()i  ht, E[abort]i
ω 
hp, µ, α ] {v I e}, I, Oi ⇒ hµ, E[atomic tn ea ec ] ei hµ, E[atomic tn ea ec ] ei
hp, µ0 , α ] {v I e0 }, I, Oi hµ[tn 7→ t], E[v] ((ec v); e)i hµ, E[abort] ei

[IDWF-ABORT] [CONTEXT]

e → e0
hµ, (E[abort]) ec i hµ, ec i hµ, E[e] ec i → hµ, E[e0 ] ec i
[PURE-ATOMIC] [PURE-ATOMIC-ABORT]
ω∗ ω∗
hµ[tn], ea ()i  ht, vi, v ∈ hVali hµ[tn], ea ()i  ht, E[abort]i
ω 
hµ, atomic tn ea ec i hµ[tn 7→ t], vi hµ, atomic tn ea ec i hµ, 0i

Figure 8: Syntax and Semantics of λIDWF .

let compensateWith a comp = queue, retrieve checkpoints, and continue evaluation, deleting the
fun f → match a() with checkpoint only when the workflow has been fully evaluated.
| Abort → Abort Supporting asynchronous evaluation requires some additional
| Value(b) → match (f b) with
| Abort → let = comp b in Abort
support from the platform and a minor change to the monad’s bind
| Value(c) → Value(c) function. We assume that the platform provides a channel that can
only be accessed by the idempotence monad (hence not exposed
let bind (v, f) = fun g → v (fun a → f a g) in λFAIL ). Messages can be sent to this channel using the function
let return a = compensateWith (fun () → a) (fun v → ()) send and received using the function recv. We assume the channel
let run a = a (fun x → Value(x)) supports the following handshake protocol for messages in order
to guarantee at-least-once processing of messages. In this protocol,
recv does not delete messages from the channel. Instead, an agent
Figure 9: The compensation monad that receives a message must acknowledge the message (using ack)
before the message is permanently deleted. If an agent crashes after
receiving a message and before acknowledging it, the message
problem of duplicate requests because clients cannot easily distin- reappears in the channel and may be processed by other agents.
guish between a long running workflow and one that has failed to (Windows Azure provides such a channel implementation, which
generate a response. While the idempotence monad guarantees cor- our implementation uses.)
rectness in such cases, idempotence does come at a performance The changes to the idempotence monad are illustrated in Fig-
cost (due to log lookups). A design pattern commonly used to re- ure 10. Instead of invoking the remainder of the workflow, the mod-
duce the number of duplicate requests is for the system to take over ified bind (Fig. 10) creates a closure for the rest of the workflow
the task of retrying (a part of) the request on behalf of the client and persists the closure in a special queue (“worklist”) (using the
asynchronously, and sending an intermediate response to the client, function send). Special agent processes (agent) retrieve the check-
typically with the transaction identifier. The client can use transac- points (using recv) and continue evaluating the rest of workflow.
tion identifier to periodically poll for the status of the request. If an agent manages to evaluate the workflow without failing, it
The idempotence monad can be extended to support asyn- deletes the workflow from the queue (using ack). In our implemen-
chronous evaluation as follows. At a programmer defined point tation, the choice of using asynchronous evaluation (and the partic-
in the evaluation of the workflow, we create a checkpoint. A check- ular step at which to create a checkpoint) is left to the programmer.
point is essentially a closure representing the rest of the work- We evaluate the benefits of these optimizations in Section 5.
flow, along with relevant state variables. The checkpoint is then
persisted, typically in a distributed queue (essentially a worklist).
Once the checkpoint has been created, an intermediate response
is sent to the client. A set of special agents periodically query the
let agent () = Hand- Molecules
while (true) coded
let (msgid, msg) = recv “worklist”in Banking 94 7296
let (f, params) = msg in Blogging 243 6484
let result = f params in Auction 107 7256
let = ack “worklist” (msgid,msg) in result Surveys 530 9004
Expense 378 6504
let bind (idf, f) =
fun (guid, tc) →
let (ntc, v) = idf (guid, tc)
Figure 12: Average message size (in bytes) of the hand-coded and
send “worklist” (f a (guid, ntc + 1)) workflow implementations.

Figure 10: The idempotence monad with asynchronous evaluation The first transaction checks if the bid is valid and then checks if
the bid beats the current maximum bid, in which case the bid is
recorded in a bids table. The second transaction marks the current
let saveSurvey response = idworkflow { winner and loosing bidders in a separate table. The web front-end is
do! addAtomic ”responses” response.Id response programmed to poll this table periodically for the latest bid status.
return! atomic{ Subsequent transactions update other tables such as an aggregate
let! summary = readAtomic ”summaries” response.SurveyName table that maintains the most frequently bid items, more frequent
do summary.Merge(response) bidders etc. With the idempotent workflow monad, it is easy to
return! writeAtomic ”summaries” response.SurveyName summary }}
guarantee that each bid is processed exactly once even if clients
retry with little change in complexity.
Figure 11: The save survey operation expressed as in idempotent
workflow. Overheads. There are two sources of overheads associated with
idempotent workflows compared to hand-coded implementations.
First, asynchronous evaluation requires serialization and deserial-
ization closures, which can be expensive. Compiler generated clo-
5. Evaluation sures tend to capture a lot more state than hand-coded implementa-
We have implemented the idempotence workflow monad and its tions. Idempotent workflows also add unnecessary logging to trans-
variants in C# and F#, targeting Windows Azure, a platform for actions that are already idempotent.
hosting distributed applications. Azure provides a key-value store Fig. 12 shows the average size of messages sent in hand-coded
with explicit support for data partitioning, where partitions are units and workflow monad based implementations. As expected, the
of serializability. In this section, we focus on evaluating the expres- workflow based implementation generates significantly larger mes-
siveness and performance overheads of the idempotence monad. sages. However, experiments on Windows Azure show that the size
of the message does not significantly influence the read/write la-
Sample applications We found several real world applications tency or throughput of channels [1], as confirmed by our experi-
whose core logic can be declaratively expressed as workflows. ments below.
We briefly describe an expense management application [4] and Next, we evaluate the overall performance overheads of using
a survey application [3]. Other applications we have implemented the idempotence monad relative to hand-coded implementations.
include applications for online auctions [2], blogging and banking. For each benchmark application, we evaluate three versions - the
original hand-coded version, and two monad based versions. The
Survey application. The Surveys application enables users to de- first version (synchronous) evaluates the workflow synchronously
sign a survey, publish the survey, and collect the results of surveys. in the context of the current agent and delegates the responsibility
The most performance critical scenario in this application is when a of retrying operations to the client. The second version uses asyn-
user submits a survey. This operation involves two steps, recording chronous evaluation.
the response and updating the survey summary with the response. We hosted each of the applications on Windows Azure and
For scalability, survey responses and summaries are stored in dif- ensured that each variant was assigned the same hardware re-
ferent data partitions. Figure 11 shows an implementation of this sources i.e. 5 virtual machines, each with a 4-core processor and
operation using workflows. The syntax here is from the actual F# 7 GB RAM. We assigned two virtual machines each for the front
implementation and differs from λFAIL in non-essential ways. The end agents that service browser requests and agents that interact
workflow is composed of two transactions, a transaction that saves with the storage system and one virtual machine for a background
the response and a transaction that updates the summary. In this agent uses for asynchronous/check-pointed evaluation. In the syn-
implementation, survey responses do not reflect in the summary chronous evaluation variant, we assigned an additional virtual ma-
immediately. In general, deferred writes are often acceptable as chine to the storage interaction agent. For each benchmark, we cre-
long as the writes will eventually appear to occur exactly once, a ated a workload that simulates users exercising performance crit-
guarantee workflows provide. The use of the idempotence monad ical scenarios in the application. For example, in the surveys ap-
guarantees that even if multiple requests are received with the same plication, the workload consists of a mix of surveys response re-
survey, the workflows appears to execute just once. quests and survey analysis requests. In each case, the workload is
biased towards write requests (e.g. 75% responses and 25% anal-
Auction application. The auction application allows users to bid ysis requests in the survey application) to ensure that molecular
for items on auction and track the status of their bids. Sellers can transactions are on the critical path. We ran the workload for 3 min-
register items for auction with a time limit and a minimum bid utes (with a 30 seconds warm-up period) and measured throughput
price. Auction sites are often high volume sites, and both latency (number of requests serviced per second) and latency (average re-
and scalability is important. The most critical operation in this ap- sponse time) for varying number of simultaneous users.
plication is the operation that processes new bids. This operation Fig. 13 shows the measured throughput and latency for the
can be expressed as a workflow composed of several transaction. surveys application. In the baseline version, both throughput and
Hand-coded Synchronous Asynchronous Hand-coded Synchronous Asynchronous
Throughput (Request/sec)
60 1.00

Latency (sec)
50 0.75
40
0.50
30
0.25
20
10 0.00
25 50 75 100 25 50 75 100
Number of concurrent users Number of concurrent users

Figure 13: Average throughput and latency for various versions of the survey application.

Hand-coded Synchronous Asynchronous


Hand-coded Synchronous Asynchronous
Throughput (Request/sec)

40 1.50

Latency (secs)
35 1.20
30 0.90
25 0.60
20 0.30
15 0.00
25 50 75 100 25 50 75 100
Number of concurrent users Number of concurrent users

Figure 14: Average throughput and latency for various versions of the auction application.

latency increase almost linearly with the number of users. All sented as a monad. This implementation technique is decentralized
monad based implementations closely follow this trend, suggesting and does not require distributed coordination. (c) Language con-
very little additional overhead due to our implementation. The structs for idempotent workflow.
throughput (60 operations/second) and latencies are along expected The need to move beyond atomic transactions to sequential
lines. The expected throughput of Azure tables is 500 reads/writes compositions of atomic transactions (i.e., workflows) motivated the
per second, whereas our “operation” is a workflow consisting of early work on Sagas and long running transactions [10, 20]. These
two transactions. constructs are weaker than distributed transactions and are gener-
The auction application (Fig. 14) has slightly different perfor- ally used to orchestrate processes that run for extended periods.
mance characteristics. The throughput of this application is lower Kamath et al [12] discusses several issues relating to the implemen-
than the surveys application and it decreases with load. This is tation of transaction managers in Sagas and workflows. A distin-
expected since the workflow is significantly longer. Synchronous guishing aspect of our work is that it exploits the fact that a service
evaluation achieves better throughput than the hand-coded imple- is required to be observationally idempotent (from its clients’ per-
mented (which is asynchronous) at almost the same latency. As spective) to simplify the internal implementation of the workflow.
expected, asynchronous evaluation improves latency, especially at In particular, this lets us avoid the need for distributed coordination
low loads, but at the cost of reduced throughput (up to 25% lower). with workflow managers.
At high loads, all monad based implementations perform better Modern web applications often exploit horizontal partitioning
than the hand-coded implementation. This is due to a few low level for scalability, which involves storing data in a number of differ-
optimizations (such as batching of requests) performed by our im- ent databases or non-relational data-stores. This leads to a need for
plementation. We leave a more detailed performance analysis and workflow-style computations even within a single intra-enterprise
optimization of the idempotence monad for future work. application. Since scalability is key (and the motivation for data
partitioning), conventional workflow engines are rarely considered
for use in this context. Instead, the programmer usually realizes
6. Related Work the workflow manually, exposing themselves to all the subtleties
Our work builds on previous literature in the topics of building of realizing such workflow correctly. Pritchett [16] describes pro-
reliable distributed systems and transactional workflow. Idempo- gramming methodologies for use in these scenarios. Our goal is to
tence has been widely and informally recognized as an important provide lightweight language mechanisms that can be used to real-
property of distributed systems. Our key contributions include: (a) ize idempotent workflows correctly in such scenarios.
A formal specification of the desired correctness properties, both Helland [11] explains in detail why idempotence is an essential
safety (idempotence) as well as progress (failure freedom). (b) An property for reliable systems.
automatic technique for guaranteeing failfree idempotence, pre-
Frolund et al. [9] define a correctness criteria known as X- (JACM), 32(2):374–382, 1985.
Ability for replicated services. A history is said to be x-able if it is [9] Svend Frolund and Rachid Guerraoui. X-Ability: A Theory of
equivalent to some history where every request is processed exactly Replication. Distributed Computing, 14, 2000.
once. Much like failfree idempotence, X-Ability is both a safety
[10] Hector Garcia-Molina and Kenneth Salem. Sagas. In Proc. of ICMD,
and liveness criteria. Our notion of failfree idempotence is general- pages 249–259, 1987.
izes X-Ability beyond replication requests to workflows. There are
also significant differences in our implementation techniques. [11] Pat Helland. Idempotence is not a medical condition. ACM Queue,
Our work is also related to open nested multi-level transac- 10(4):30–46, 2012.
tions [15, 19]. These two constructs share the use of compensat- [12] Mohan Kamath and Krithi Ramamritham. Correctness Issues in
ing actions, but are semantically different. Open nested transactions Workflow Management. Distributed Systems Engineering, 3(4):213,
provide a way for dealing with conflicts at a higher level of abstrac- 1996.
tion, which often leads to increased concurrency. [13] Sheng Liang, Paul Hudak, and Mark Jones. Monad Transformers and
Our basic setting is similar to Argus [14]. However, the con- Modular Interpreters. In In Proc. of POPL, pages 333–343, 1995.
struct that Argus provides programmers to deal with process fail- [14] Barbara Liskov. Distributed programming in argus. Communications
ures is a conventional transaction. As with sagas, we show that of ACM, 31:300–312, March 1988.
many applications can be expressed using workflows (as we cover [15] J. Eliot B. Moss. Nested Transactions: An Approach to Reliable
in Section 2) with compensating actions to compensate for the lack Distributed Computing, 1981.
of isolation. The transactor programming model [7] also provide
primitives for dealing with process failures in a distributed system. [16] Dan Pritchett. Base: An acid alternative. Queue, 6(3):48–55, May
2008.
However, there is no shared state in the transactor model. The prim-
itives provided by the transactor model (stabilize, checkpoint, and [17] Philip Wadler and Peter Thiemann. The Marriage of Effects and
rollback) are different from the primitives we study. Monads. ACM Trans. Comput. Log., 4(1):1–32, 2003.
Bruni et al [5] formalize compensating actions in an abstract [18] David Walker, Lester Mackey, Jay Ligatti, George A. Reis, and
language. Their formalism, however, does not explicitly model David I. August. Static Typing for a Faulty Lambda Calculus. In In
state. Their paper, in fact, suggests study of compensation in the ACM International Conference on Functional Programming, 2006.
context of imperative features (state, variables, control-flow con- [19] Gerhard Weikum and Hans-J. Schek. Concepts and Applications of
structs) as future work. Our work provides a compensation-based Multilevel Transactions and Open Nested Transactions. In Database
transactional construct as a library in a real language (F#) for a real Transaction Models for Advanced Applications, pages 515–553,
system (Azure), in addition to the theoretical treatment in the set- 1992.
ting of lambda-calculus with mutable shared state. [20] Gerhard Weikum and Gottfried Vossen. Transactional Information
Luis et. al [6] propose an abstract model of compensating ac- Systems: Theory, Algorithms, and the Practice of Concurrency
tions (based on process calculus) for reasoning about correctness Control. 2001.
for workflows that use compensating actions.
Faulty lambda calculus [18] is a programming language and
a type system for fault tolerance. However, λFAIL is addresses A. An Abstract Definition of Idempotence
process failures, while faulty lambda calculus addresses transient We now present a more abstract definition of idempotence in terms
data corruption errors. of histories.

Basics. The users of the system issue requests to the system. Let
Acknowledgments I denote a set of input messages (or requests). The system responds
We would like to acknowledge Siddharth Agarwal, Nirmesh Malviya, to requests with a response. Let O denote a set of output values. An
Khilan Gudka, and Dheeraj Singh for their contributions. output message or response is a pair (i, o) ∈ I × O indicating that
the output value o is produced in response to input request i.
A history π is a sequence e1 e2 · · · ek of requests and responses
References (i.e., each ei can be either a request or a response). We will restrict
[1] Azurescope: Benchmarking and Guidance for Windows Azure. our attention to histories that satisfy the simple property that every
http://azurescope.cloudapp.net/BenchmarkTestCases/ response corresponds to an earlier request in the history. We define
#4f2bdbcc-7c23-4c06-9c00-f2cc12d2d2a7, June 2011. a specification Φ to be a set of histories. In the sequel, we use a set
[2] Bid Now Sample. http://bidnow.codeplex.com, June 2011. of histories to specify desired safety properties of the system.
[3] The Tailspin Scenario. http://msdn.microsoft.com/ In the sequel, let q ∈ I range over requests and let r ∈ I × O
en-us/library/ff966486.aspx, June 2011. range over responses. Let α, β, and γ range over sequences of
[4] Windows Azure Patterns and Practices. http://wag.codeplex. requests and responses.
com/, 2011.
D EFINITION A.1. A specification Φ is said to be asynchronous if
[5] Roberto Bruni, Hernán Melgratti, and Ugo Montanari. Theoretical it satisfies the following properties.
Foundations for Compensations in Flow Composition Languages. In
Proceedings of POPL, pages 209–220, 2005. 1. αβqγ ∈ Φ ⇒ αqβγ ∈ Φ.
[6] Luis Caires, Carla Ferreira, and Hugo Vieira. A Process Calculus 2. αrβγ ∈ Φ ⇒ αβrγ ∈ Φ.
Analysis of Compensations. In Trustworthy Global Computing,
volume 5474 of Lecture Notes in Computer Science, pages 87–103. The above conditions are a natural restriction on specifications
2009. because of messaging delays that cannot be controlled. The above
[7] John Field and Carlos A. Varela. Transactors: A Programming Model property is also related to the notion of linearizability. Given any
for Maintaining Globally Consistent Distributed State in Unreliable sequential specification Φs , “linearizable Φs ” can be defined as the
Environments. In Proceedings of POPL, pages 195–208, 2005. smallest asynchronous specification that contains Φs . Every lin-
[8] M.J. Fischer, N.A. Lynch, and M.S. Paterson. Impossibility of earizable specification satifies the asynchronous property defined
Distributed Consensus with one Faulty Process. Journal of the ACM above, but not all asynchronous specifications are linearizable.
Idempotence. Two requests q1 and q2 (in the same history) are has been initiated. This has the effect of creating a new agent
said to be duplicates if q1 = q2 . Two responses (q1 , r1 ) and v I e. Rule IDFF-USE allows the computation idff v e to proceed
(q2 , r2 ) (in the same history) are said to be duplicates if q1 = q2 . once the created agent completes evaluation (as indicated by the
presence of an agent of the form v I w, where w is a value). The
D EFINITION A.2. A specification Φ is said to idempotent iff: same rule also applies to “duplicate” evaluations with the same id.
1. Duplicate requests have no effect: αqβqγ ∈ Φ iff αqβγ ∈ Φ.
2. Duplicates responses have the same value:
α(q, o1 )β(q, o2 )γ ∈ Φ ⇒ o1 = o2 .
3. Duplicate responses are allowed: αrβrγ ∈ Φ iff αrβγ ∈ Φ.
(The above definition is intended for asynchronous specifica-
tions. Hence, the asynchrony conditions have been used to simplify
the definition.)
Idempotence Closure. We define a history to be repetitionless
if it contains no duplicate requests or duplicate responses. We
define a specification Φ to be repetitionless if all histories in Φ are
repetitionless.
Given a repetitionless specification Φ, we define its idempo-
tence closure idem(Φ) to be the smallest specification that contains
Φ and is idempotent.
We now summarize our goal: given a program p that satisfies a
repetitionless specification Φ in the absence of process failures and
duplicate requests, construct a program p0 that satisfies idem(Φ)
even in the presence of process failures and duplicate requests.
Extension The above definitions can be generalized to permit
requests to be of the form (k, v), where k is a unique-identifier
(key) for an input request, and allowing responses to be of the
form (k, o), where k is the unique-identifier of the input request
for which the response is produced. Note that the above definition
does not capture the progress conditions of failure-freedom.

B. The Semantics of λIDFF


We adapt the earlier operational semantics of λFAIL as shown in
Fig. 15 to define the semantics of λIDFF . We extend the definition
ΣA , the set of agents: previously, an agent was of the form v . e.
Now, an agent may now also be of the form v I e, indicating a
idff evaluation of expression e with identifier v.

Syntax Extension: e ∈ hExpi ::= · · · | idff v1 v2

Semantic Domain Changes: ΣA = hExpi×hExpi+hExpi×hExpi

Additional Evaluation Rules:


[IDFF-NEW]

v . E[idff id e] ∈ α 6 ∃e0 .id I e0 ∈ α



hp, µ, α, I, Oi ⇒ hp, µ, α ] {id I e()}, I, Oi
[IDFF-USE]
id I w ∈ α w ∈ hVali

hp, µ, α ] {v . E[idff id e]}, I, Oi ⇒ hp, µ, α ] {v . E[w]}, I, Oi
[NORMAL]
`
hµ, ei hµ0 , e0 i
hp, µ, α ] {v I e}, I, Oi ⇒ hp, µ0 , α ] {v I e0 }, I, Oi

Figure 15: The language λIDFF , defined via extensions to language


λFAIL . Rules IDFF-NEW and IDFF-USE must be duplicated for v I
E[idff id e] as well.

Rule IDFF-NEW handles the evaluation of the construct idff v e,


when no preceding idff computation with the same identifier v

You might also like