Markov Chain Blackjack
Markov Chain Blackjack
Markov Chain Blackjack
1 Introduction
Many statistical problems of practical interest are simply too complicated to explore analytically.
In these cases, researchers often turn to simulation techniques in order to evaluate the expected
outcomes. When approached creatively, however, these problems sometimes reveal a structure
that is consistent with a much simpler mathematical framework, possibly permitting an analytical
exploration of the problem. It is this situation that we have encountered with the game of blackjack.
Blackjack receives considerable attention from mathematicians and entrepreneurs alike, due to
its simple rules, its inherent random nature, and the abundance of “prior” information available to
an observant player. Indeed, many attempts have been made to propose card-counting systems that
exploit such information to the player’s advantage. Most card-counting systems include two aspects:
a method for monitoring the cards played to watch for favorable situations, and a strategy for
playing and betting depending on the current state of the game. Because blackjack is a complicated
game, attempts to actually calculate the expected gain from a particular system often rely on
simulation techniques [3, 5]. While such techniques may yield correct results, they may also fail to
explore the interesting mathematical properties of the game.
Despite the apparent complexity, there is a great deal of structure inherent in both the blackjack
rules and the card-counting systems. Exploiting this structure and elementary results from the
theory of Markov chains, we present a novel framework for analyzing the expected advantage of a
card-counting system entirely without simulation. The method presented here requires only a few,
mild simplifying assumptions, can account for many rule variations, and is applicable to a large
class of counting systems. As a specific example, we verify the reported advantage provided by one
of the earliest systems, the Complete Point-Count System, introduced by Harvey Dubner in 1963
and discussed in Edward Thorp’s famous book, Beat the Dealer [5, pp. 93-101]. While verifying
this analysis is satisfying, in our opinion the primary value of this work lies in the exposition of an
interesting mathematical framework for analyzing a complicated “real-world” problem.
1
matrix. The entry in row i and column j of the transition matrix is the probability of moving from
state i to state j in one step. States that only allow transitions back to themselves (with probability
1) are called absorbing states. If we raise the transition matrix to the n th power, entry (i, j) is the
probability of moving from state i to state j in exactly n steps. Certain classes of Markov chains will
converge to an equilibrium distribution as n gets large. This equilibrium represents the long-term
proportion of time that the chain spends in each state (independent of the starting state). Markov
chains which do converge to equilibrium are those that are irreducible and aperiodic. A chain is
irreducible if there exists a path of non-zero probability from every state to every other state. If a
chain is irreducible, it is also aperiodic if when considering all possible paths from any state i back
to itself, the greatest common denominator of the path lengths is one.
In this paper we employ two distinct approaches for Markov chain analysis of blackjack. First,
we develop a collection of Markov chains to model the play of a single hand, and we use absorbing
states to compute the player’s expected advantage on one hand. Second, we use a Markov chain
to model the progression of the unseen cards, and we analyze its equilibrium to calculate the long-
term proportion of time the player will have an advantage over the dealer. After further exploiting
the structure in the game to make the calculations tractable, we combine the results from our two
analyses to calculate the overall expected gain of a particular card-counting system.
For the remainder of this paper, we assume a standard set of blackjack rules and terminology. 1
For instance, we refer to a hand with an 11-valued Ace as soft and a hand with a 1-valued Ace (or
with no Aces) as hard. A box called the shoe holds the cards to be dealt. When a certain number
of cards have been played (roughly 3/4 of a shoe), the entire shoe is reshuffled after the next hand
is completed. The number ∆ of 52-card decks in the shoe is specified when relevant (∆ = 6 being
typical in many casinos).
The player places a bet B at the beginning of each hand. The player wins a profit of 32 B if
dealt a blackjack (a total of 21 on the initial 2 cards), as long as the dealer is not also dealt a
blackjack. If the dealer shows an Ace, the player may optionally place a side insurance bet of B2
that the dealer holds a blackjack (returned with a profit of B on success). If neither the player nor
dealer hold blackjack, a profit of B is won when the player’s final total is higher than the dealer’s
(without exceeding 21). The player can elect to hit or stand at will, but the dealer must hit on any
total of 16 or less and stand otherwise. When holding the first two cards, a player may also elect
to double down, doubling the initial bet and drawing only one additional card. If the player’s first
two cards have the same value, the player can also split, dividing the cards into two separate hands
and placing an additional bet B.
2
It is not surprising, given this observation, that Markov chains may be used to analyze the play
of a single hand. Roughly speaking, a state space could contain all possible totals for the player’s
hand, and transition probabilities could be assigned according to the distribution of cards in the
shoe. In this section, we develop this idea more fully, obtaining a framework for analyzing the play
of a single hand and ultimately computing the player’s advantage.
where d(i) denotes the probability that the next card drawn will be i. We also assume that the player
plays according to some prescribed strategy S throughout the hand. A strategy is a collection of
deterministic decisions the player will make during the hand regarding hitting, standing, doubling,
and splitting, where each decision depends only on the player’s total and the dealer’s up-card. 2
While the strategy S may be chosen according to the player’s knowledge of the distribution D at
the beginning of the hand, we assume the strategy remains fixed throughout the play of one hand.
3
To compute a probability distribution among the dealer’s possible outcomes, we need only
to find the distribution among the absorbing states. This is accomplished by constructing the
transition matrix D and computing D 17 . Given that the dealer shows initial card γ, for example,
we obtain the distribution on the possible outcomes by examining row f irst γ of the matrix D 17 .
Note that because D depends on D, the distribution on the absorbing states is conditioned at this
point on a particular deck distribution.
• {f irsti : i ∈ {2, . . . , 11}}: the player holds a single card, valued i, and will automatically be
dealt another.
• {twoHardi : i ∈ {4, . . . , 19}}: the player holds two different cards for a hard total of i and
may hit, stand, or double.
• {twoSof ti : i ∈ {12, . . . , 20}}: the player holds two different cards for a soft total of i and
may hit, stand, or double.
• {pairi : i ∈ {2, . . . , 11}}: the player holds two cards, each of value i, and may hit, stand,
double, or split.
• {hardi : i ∈ {5, . . . , 20}}: the player holds more than two cards for a hard total of i and may
hit or stand.
• {sof ti : i ∈ {13, . . . , 20}}: the player holds more than two cards for a soft total of i and may
hit or stand.
• {standi : i ∈ {4, . . . , 21}}: the player stands with the original bet and a total of i.
• {doubStandi : i ∈ {6, . . . , 21}}: the player stands with a doubled bet and a total of i.
• {spliti : i ∈ {2, . . . , 11}}: the player splits a pair, each card valued i (modeled as an absorbing
state).
Note that different states with the same total often indicate that different options are available to
the player. In total, we obtain a state space with |Ψ P | = 116.
Analysis of this Markov chain is similar to the dealer’s chain described above. The player’s
strategy dictates a particular move by the player (hit, stand, double, or split) for each of the
states; in terms of this Markov chain, the strategy dictates the allowable transitions. Subsequently,
transition probabilities are assigned according to the distribution D of cards in the shoe. In this
way, the transition matrix encodes both the player’s strategy and the shoe distribution.
4
Because the player’s total increases with each transition (except possibly once when a soft hand
transitions to hard), it follows that within 21 transitions, any random walk will necessarily reach
one of the absorbing states. Once again, it follows that row f irst i of Pγ21 will yield the distribution
on these states, assuming the player begins with card i. Averaging over all possible initial cards
i (and weighting by the probability d(i) that the starting card is i), we may compute the overall
distribution on the player’s absorbing states.
While the player’s analysis is similar to the dealer’s analysis (Section 3.2) in most respects, the
primary difference comes from the possibility of a split hand. We include a series of absorbing
states {spliti } for the event where the player elects to split a pair of cards i. Intuitively, we imagine
that the player then begins two new hands, each in the state f irst i. To model the play of one of
these hands, we create another Markov chain (similar to the player’s chain described above), but
we construct this chain using the particular rules that govern the play of a post-split hand [5, 6]. 3
We reiterate that this result implicitly depends on the distribution D and the strategy S.
• The player may vary the betting and playing strategy, while the dealer must play a fixed
strategy.
3
For simplicity, we assume the player is limited to splitting only once, although the analysis is easily extended to
multiple splits.
5
• When the shoe has relatively many high cards remaining, the dealer is at a disadvantage by
having to hit on totals 12–16.
• When the shoe has relatively few high cards remaining, the house typically has a small
advantage over the player, regardless of the playing strategy.
These observations are fundamental to most card-counting strategies and are also the basic reasons
why card-counting is not welcomed by many casinos. By counting cards as they are played, a player
can obtain partial knowledge about D and vary the playing and betting strategy accordingly. As a
result, card-counting can in fact give the player a long-term advantage over the house.
As one example, in this section we calculate the player’s overall advantage using a card-counting
system known as the Complete Point-Count System [5, pp. 93-101]. From the previous section, we
are able compute the specific per-hand advantage of the player, given a distribution D and a strategy
S. To analyze a card-counting system, however, we must also be able calculate the percentage of
time a player expects the shoe to be in favorable and unfavorable states. For this problem, we
again propose a Markov chain analysis. Although we focus on one particular card-counting system,
the spirit of our approach is quite general and applies to a large class of techniques.
After briefly explaining the Complete Point-Count System, we introduce the Markov chain
framework used to track the state of a shoe throughout a round of blackjack. We use the convergence
properties described in Section 2 to determine the long-term proportion of time that the shoe is in
favorable and unfavorable states.
L−H
HLI = 100 · .
R
Thorp gives a series of tables to be used by the player in determining S as a function of the HLI [5,
p. 98]. The HLI gives an estimate of the condition of the shoe: when positive, the player generally
has an advantage and should bet high; when negative, the player generally has a disadvantage and
should bet low. Thorp offers one possible suggestion for varying bets [5, p. 96]:
b
if −100 ≤ HLI ≤ 2,
B= HLI (1)
2 b if 2 < HLI ≤ 10,
5b if 10 < HLI ≤ 100,
where b is the player’s fundamental unit bet. It is important also to note that, although the
player’s advantage is presumably high when HLI > 10, Thorp recommends an upper limit on bets
4
In reality, the player can compute the HLI simply by keeping count of (L − H) and R; for simplicity of our
analysis, we continue to refer to an ordered triple.
6
for practical reasons. If a casino suspects that a player is counting cards, they will often remove
that player from the game. Finally, Thorp also recommends taking the insurance bet when HLI > 8.
Suppose that the player observes an ordered triple (L, M, H) prior to beginning a hand. From
the triple we may compute the HLI and obtain the strategy S. After making some mild simplifying
assumptions, we can use the techniques of Section 3 to compute the player’s expected advantage
on such a hand. First, we assume that D is uniform over each category: low, medium, and high.
To be precise, we set D as follows:
1 20∆ − L
d(2) = · · · = d(6) = ,
5 R
1 12∆ − M
d(7) = d(8) = d(9) = ,
3 R
4 20∆ − H
d(10) = ,
5 R
1 20∆ − H
d(A) = .
5 R
Second, we assume that D does not change during the play of the hand. Third, we require that the
player fixes a strategy at the beginning of the hand (based on the HLI) — that is, the player does
not respond to changes in the HLI until the hand is complete. With these assumptions, we are
able to compute the player’s advantage for a hand beginning with any arbitrary triple (L, M, H).
If we were also able to determine the overall probability that the player begins a hand with triple
(L, M, H), we would be able to compute the overall long-term advantage simply by averaging over
all triples. We turn once again to Markov chains to find these probabilities.
Clearly |Σ| grows as N 3 . Table 1 shows the number of states for some example shoe sizes.
Each state in Σ has two different types of transitions out: either a card can be played or the
deck can be reshuffled (i.e., a transition back to the state (0, 0, 0)). The location of the reshuffle
point will not depend on what specific cards have been played, but only how many cards have been
played. Let Cn ⊂ Σ be the set of all states such that n cards have been played, C n = {(L, M, H) ∈
Σ : L + M + H = n}. We define ρ(n) as the probability of a reshuffle from any state in C n .
7
Table 1: The size of the state space and transition matrices for Markov chains (Σ, X) and (Γ, Y )
with common shoe sizes.
∆ N |Σ| N ×N |Σ| × |Σ|
1 52 5733 2704 3.29 × 107
2 104 42,025 10,816 1.7 × 109
4 208 321,489 43,264 1.03 × 1011
6 312 1,068,793 97,344 1.14 × 1012
Given that we are in state (L, M, H) and a reshuffle does not occur, the chain can transition to
(L + 1, M, H), (L, M + 1, H) and (L, M, H + 1) with probabilities equal to the probability that the
next card drawn is a low, medium or high card, respectively. To be more explicit, if the current
state is (L, M, H) ∈ Cn , the transition matrix for each row is given by
20∆−L
R (1 − ρ(n)) if (a, b, c) = (L + 1, M, H),
12∆−M
(1 − ρ(n)) if (a, b, c) = (L, M + 1, H),
R
20∆−H
(X)(L,M,H),(a,b,c) = R (1 − ρ(n)) if (a, b, c) = (L, M, H + 1), (2)
ρ(n) if (a, b, c) = (0, 0, 0),
0 otherwise.
Note that some of these probabilities could be zero, but these are the only possible transitions.
In a typical casino situation, a dealer will normally play through most, but not all, of a shoe
before reshuffling. For example, a dealer may cut ∼75% into the shoe and play up to this point
before reshuffling. We model the reshuffle point as a (normalized) Laplacian random variable
centered around .75(N ), with support over the second half of the shoe. The variance is scaled with
the size of the shoe in order to keep a constant proportion of the shoe in a region with a high
probability of a reshuffle. Precisely, the probability of the reshuffle point being immediately after
the nth card is played is given by
√κ −|n−.75(N )|
exp √ if n ≥ dN/2e,
Pr [reshuffle = n] = 2σ 2 σ 2 /2
0 otherwise,
where σ 2 = N/10, and κ is a normalizing constant to make the distribution sum to one.
To calculate the probability ρ(n) of a reshuffle from a state in C n , we calculate the probability
that the reshuffle point is n conditioned on the event that the reshuffle point is at least n,
Pr [reshuffle = n]
ρ(n) = Pr [reshuffle = n|reshuffle ≥ n] = PN . (3)
m=n Pr [reshuffle = m]
The probability distribution on the reshuffle location is shown in Figure 1(a), and the reshuffle
transition probabilities are shown in Figure 1(b).
Now that we have constructed a Markov chain representing the play of cards from the shoe,
we must confirm that this chain will actually serve our purposes. By inspection, it is clear that
from any state, it is possible to reach any other state with some non-zero probability. Observing
the possible paths that start at the beginning of the shoe (state (0, 0, 0)) and then reshuffle back
to the beginning, we see that there is no periodic behavior in the chain. This is stated precisely in
8
0.35 1 0.03
0.3 0.025
0.8
0.25
Pr[reshuffle = n]
0.02
0.6
0.2
µ (n)
ρ (n)
0.015
0.15
0.4
0.01
0.1
0.2
0.05 0.005
0 0 0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
n n n
Figure 1: (a) Reshuffle point density for 1 deck. (b) State reshuffle transition probability for 1
deck. (c) Equilibrium for column Markov chain (Γ, Y ).
the fact that two possible return times are dN/2e and dN/2e + 1 and gcd(dN/2e, dN/2e + 1) = 1.
The Markov chain representing the changing state of the shoe is therefore irreducible [4, p. 11]
and aperiodic [4, pp. 40-41]. As described in Section 2, such a chain converges to an equilibrium
distribution that describes the long-term proportion of the time the chain will spend in each state.
where I and E are the |Σ| × |Σ| identity and ones matrices, respectively [4, p. 40, Exercise 1.7.5].
In Table 1 we see that even for a one deck shoe, X would have 33 million entries. To store X as a
matrix of 8-byte, double-precision floating point numbers, it would require approximately 263MB
of memory. To analyze a two deck shoe, X would require approximately 13.6GB of memory. Aside
from the issue of memory usage, one would also need to invert a matrix of this size. We find ourselves
in an interesting dilemma — although we have perfect knowledge of the transition matrix, we are
prevented from dealing with X as a whole. In this section we describe how we further expoit the
structure of the game and the playing strategies to make the problem tractable.
Looking more carefully at the Markov chain we have constructed, there is a great deal of
structure. The form of the chain is more clear in a graphical representation. Imagine that we
arranged all of the states so that states representing the same number of cards played (i.e., belonging
to the same set Cn ) are in the same column, and each column represents one more card played
than in the previous column (depicted graphically in Figure 2). Note that when a card is played,
a state in Cn can only move to a state in Cn+1 or the (0, 0, 0) state. Only the states in C n for
n ≥ dN/2e are capable of causing a reshuffle (transitioning back to the (0, 0, 0) state), and each
state in Cn reshuffles with the same probability ρ(n). Columns closer to the midpoint of the shoe
contain more states, and the first and last columns taper down to one state each.
Starting at state (0, 0, 0), a walk will take one particular path from left to right, moving one
column with each step and always taking exactly dN/2e steps to reach the midpoint. After the
midpoint, the states can also reshuffle at each step with a probability ρ(n) that only depends on
9
ρ(N
− 1)
(L−1,M,H)
(2,0,0)
(1
(1,0,0) −
ρ(
ρ(
,0) N
N
,(1,0 (1,1,0) −
,0) 1))
−
0,0
X) (
1)
(
(1,0,1)
)
−1
ρ(N
Figure 2: Graphical depiction of full state space Σ. Each state represents an ordered triple (L, M, H)
denoting the number of low, medium, and high cards that have been played from the shoe.
PSfrag replacements
1
0
1
1 2
(1 − ρ(N − 1))
N−1 N
ρ(N − 1)
ρ(N ) = 1
Figure 3: Graphical depiction of reduced column state space Γ. Each state represents one the number
of cards played (i.e., one column of the state space depicted in Figure 2).
the the current column and not on either the path taken up to that point or the exact current state
within the column. Essentially, this structure allows us to separate the calculation of π into two
components: how much relative time is spent in each column, and within each individual column,
what proportion of time is spent in each state. To determine the relative time spent in each column,
we create a reduced chain with state space Γ and transition matrix Y , where each column in Figure
2 corresponds to one state in Γ (depicted in Figure 3). The transition matrix is given by
1 − ρ(n) if m = n + 1,
(Y )n,m = ρ(n) if m = 0,
0 otherwise.
It is clear that this chain is also irreducible and aperiodic, and therefore will converge to a unique
equilibrium µ. The equilibrium of the reduced chain (Γ, Y ) represents the relative proportion of
time that the original chain (Σ, X) spends in each column of Figure 2. This is stated precisely as
X
µ(n) = π(k).
k∈Cn
10
0.25 0.25
0.2 0.2
0.15 0.15
π
π
0.1 0.1
0.05 0.05
0 0
−100 −50 0 50 100 −100 −50 0 50 100
HLI HLI
(a) (b)
Figure 4: Equilibria π for 1 and 4 deck shoes (a and b, respectively) representing the long-term
proportion of time spent at each High-Low Index (HLI).
Figure 1(c) shows µ for ∆ = 1. It is important to note that the dimension of the reduced column-
space chain is much smaller than the original chain, with |Γ| = O(N ) and |Σ| = O(N 3 ). Even in
the case when ∆ = 6, one can easily calculate
µ = (1, 1, . . . , 1)(I − Y + E)−1 .
Using the easily calculated µ, we can exploit the relationship between the two Markov chains to
calculate π. First, it is critical to note that because |C 0 | = 1, the equilibrium distributions on the
initial states of both chains are equal, π((0, 0, 0)) = µ(0). To calculate all other elements of π we
make two key observations. First, the properties of the equilibrium distribution [4, p. 33] allow one
to relate the values of the equilibrium distribution to each other through the expression π = πX.
Second, in our particular Markov chain (Σ,X), states in C n only have input transitions coming
from states in Cn−1 (i.e., ∀m ∈ Cn , (X)k,m = 0 if k ∈ / Cn−1 ). These two observations together give
us that for any m ∈ Cn , the following relation holds:
X
π(m) = π(k) (X)k,m . (4)
k∈Cn−1
Using only knowledge of π((0, 0, 0)), (4) can be used to calculate π(m) for any m ∈ C 1 . Iterating
this process we calculate π(m) for any m ∈ C 2 , and so on, until we have completely calculated π.
The technique described here takes advantage of the rich structure in the chain to analytically
calculate π exactly using only local knowledge of X, and the inversion of an N × N matrix. The
algorithm can calculate the equilibrium when ∆ = 6 in under an hour. Equilibria calculated
through this method for ∆ = {1, 4} are shown in Figure 4. Because the equilibrium would be
difficult to plot in the three dimensional state space Σ, states with the same HLI are combined and
the equilibria are plotted vs. HLI.
11
often by bounding (X n )i,j − π(j) as a function of n. For our blackjack analysis, the answer to
this question roughly reflects how long a player must play before observing “typical” behavior.
By once again exploiting the column structure of the Markov chain (Σ, X) we are able to derive
an interesting relation. Suppose n > N + 1, and let i, j ∈ Σ be ordered triples. We define c i to be
the column index of triple i (i.e., i ∈ C ci ). We have shown that
n n−cj
(X )i,j − π(j) ≤ Y − µ(0) .
c ,0 i
Not surprisingly, the convergence of (Σ, X) is closely related to the convergence of the smaller
chain (Γ, Y ), which could possibly simplify the derivation of convergence bounds using the standard
techniques. A useful topic for future research would involve extending such bounds to model the
variance of a player’s average profit over many hands.
5 Analysis
We present in this section our analysis of the Complete Point-Count System. Because the betting
is not constant in this system, it is important now to distinguish between the player’s advantage
and the player’s expected gain. As defined in Section 3, the player’s advantage a is the expected
profit on a hand that begins with a unit bet. We define the player’s expected gain g to be the
dollar amount one expects to profit from each hand when betting according to (1) with b = $1.
These quantities are related on a single hand by the expression g = B · a.
12
(a)
Advantage
15%
with insurance
10%
without insurance
5%
−50 −25 25 50
HLI
−5%
(b) (c)
Figure 5: Player advantage vs. HLI. (a) Scatter plot of advantage for individual (L, M, H) triples.
(b) Weighted average over (L, M, H) triples using equilibrium π to determine advantage at a specific
HLI. (c) Thorp’s result, reprinted from [5, p. 97], permission pending.
Table 2: Overall expected gain for player using the Complete Point-Count System (with insurance).
∆ Expected gain, g
1 0.0296
2 0.0129
4 0.0030
13
0.3 0.3
0.25 0.25
Relative time spent
0.15 0.15
0.1 0.1
0.05 0.05
0 0
−10 0 10 20 30 −0.1 0 0.1 0.2 0.3
Player advantage (%) Expected gain (dollars)
(a) (b)
Figure 6: (a) Relative time spent with different player advantages (∆ = 4). (b) Relative time spent
with different expected gains.
6 Conclusions
Blackjack is a complicated game, where subtle changes in rules or playing strategy can have a
surprising impact on the player’s advantage. As with many other complex real-world systems,
blackjack analysis is typically done via simulation. Aside from being somewhat unsatisfying math-
14
ematically, simulation can make it difficult to explore a large number of rule and strategy variations
and rarely provides insight into the underlying reasons for the results.
We have presented an analysis framework for card-counting systems that operates completely
without simulation. This framework is based on the observation that certain aspects of the game of
blackjack have finite memory and are well modeled by discrete Markov chains. By using techniques
from Markov chain analysis, we developed a framework for calculating the player’s long-term ex-
pected gain and exercised this technique on the well-known Complete Point-Count System. By
using our method, one could further investigate detailed aspects of the game. For example, by
determining the reasons underlying particular advantages, a player could assess rule variations and
make strategy adjustments accordingly. 6
As in any case of trying to match the real world to a tidy mathematical framework, some sim-
plifying assumptions were required. These assumptions introduce inaccuracies into the analysis,
but we have attempted to make only the mildest assumptions required. Even then, a direct appli-
cation of Markov chains produced a problem that was technically well-defined but computationally
intractable. It was only through further exploring the structure of the system that we were able
to reduce the complexity of the calculations. More accuracy could be achieved if we worked to
reduce the impact of our assumptions, for example by keeping track of card counts over more than
three categories. However, increasing the complexity in this way would quickly lead us back into a
problem that is computationally impractical.
In this work, Markov chains have been a powerful tool for exploiting the inherent structure
that exists both in the game of blackjack and in card-counting strategies. Though black-
jack is only a casino game, our exercise illustrates the power of Markov chains in analyzing
complex systems. The continued development of more advanced techniques for Markov chain
analysis should only extend the utility of Markov chains for more complicated, real-world problems.
Acknowledgments. The authors are grateful to Dr. Robin Forman for discussions that en-
couraged this analysis. Figure 5(c) is reproduced from [5, p. 97], permission pending.
References
[1] P. A. Griffen. The Theory of Blackjack: The Compleat Card Counter’s Guide to the Casino
Game of 21. Huntington Press, Las Vegas, NV, USA, 1996.
[2] M. Jerrum. Mathematical foundations of the Markov chain Monte Carlo method. In M. Habib,
C. McDiarmid, J. Ramirez-Alfonsin, and B. Reed, editors, Probabilistic Methods for Algorithmic
Discrete Mathematics. Springer-Verlag, 1998.
[3] M. H. Millman. A statistical analysis of casino blackjack. The American Mathematical Monthly,
90(7):431–436, Aug.–Sep. 1983.
[5] E. O. Thorp. Beat the Dealer: A Winning Strategy for the Game of Twenty-One. Vintage, New
York, 1966.
[6] S. Wong. Professional Blackjack. Pi Yee Press, La Jolla, CA, USA, 1994.
6
For any reader interested in further experimentation or analyzing alternative systems, our Matlab code will soon
be publicly available in a Blackjack Markov Chain Toolbox.
15