probasim

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Probabilité et Simulation

PolyTech INFO4 (Grenoble) – 2024-2025 – Practical Sessions


Last updated: 2024-10-02. Info: davide.legacci@univ-grenoble-alpes.fr

1. Theory recap 11.9.24


• Jet set Ω = finite set of possible outcomes 𝜔
• Probability on Ω = set of weights 𝑃 (𝜔) ∈ ℝ on each 𝜔 ∈ Ω such that
‣ 𝑃 (𝜔) > 0∀𝜔 ∈ Ω
‣ ∑𝜔∈Ω 𝑃 (𝜔) = 1
• Event 𝐴 ⊆ Ω = subset of the jet set
• Complementary event 𝐴𝑐 = Ω/𝐴
• The cardinality of a set 𝑆 is denoted by |𝑆|
• Uniform probability of the event 𝐴
1 |𝐴|
𝑃 (𝐴) = ∑ 𝑃 (𝜔) = ∑ = (1)
𝜔∈𝐴 𝜔∈𝐴
|Ω| |Ω|

𝑃 (𝐴 ∪ 𝐵) = 𝑃 (𝐴) + 𝑃 (𝐵) − 𝑃 (𝐴 ∩ 𝐵) (2)

• Binomial theorem
𝑛
𝑛
(𝑥 + 𝑦)𝑛 = ∑( )𝑥𝑛−𝑗 𝑦𝑗 (3)
𝑗=0
𝑗

1.1. Counting
• Number of permutations of 𝑘 elements:
‣ Number of ways to order 𝑘 elements
‣ Only order matters
𝑃𝑘 = 𝑘! (4)

• Number of dispositions of 𝑘 elements out of 𝑛 (𝑘 ≤ 𝑛):


‣ Number of ways to choose and order 𝑘 elements out of 𝑛
‣ Order and elements matter
‣ Number of injections 𝑓 : {1, …, 𝑘} → {1, …, 𝑛}
𝑛!
𝐷𝑛,𝑘 = 𝑛(𝑛
⏟⏟⏟ − 1)…
⏟⏟ = 𝑛(𝑛 − 1)…(𝑛 − 𝑘 + 1) = (𝑛 − 𝑘)! (5)
𝑘 times

• Number of combinations of 𝑘 elements out of 𝑛 (𝑘 ≤ 𝑛):


‣ Number of ways to choose 𝑘 elements out of 𝑛
‣ Only elements matter
‣ Number of subsets of cardinality 𝑘 of a set of cardinality 𝑛
‣ Number of dispositions modulo number of permutations
𝐷𝑛,𝑘 𝑛! 𝑛
𝐶𝑛,𝑘 = = = ( ) = choose(𝑛, 𝑘) (6)
𝑃𝑘 𝑘!(𝑛 − 𝑘)! 𝑘

1
Exercises
Exercise 1.1 (Handshakes and kisses)
There are 𝑓 girls and 𝑔 boys in a room. Boys exchange handshakes, girs exchange kisses, boys
and girls exchange kisses. How many kisses in total?

The number of kisses exchanegs among girls is the number of subsets of cardinality 2 of a set of
𝑓(𝑓−1)
cardinality 𝑓, that is ( 𝑓2 ) = 2 . Or, think that each girl gives 𝑓 − 1 kisses, and one needs a factor
of one half to avoid double counting.
For the number of kisses exchanged between boys and girls: the first girl gives 𝑔 kisses, the second
girl gives 𝑔 kisses, and so on, so we have 𝑓𝑔 in total.
𝑓(𝑓 − 1)
number of kisses = + 𝑓𝑔 (7)
2

Exercise 1.2 (Throwing a dice) Throw a fair dice with 𝑓 faces 𝑛 times. What’s the prob to never
get the same result twice?

Let 𝒩 = {1, …, 𝑛} and ℱ = {1, …, 𝑓}. The jet set is


Ω = {𝜔 = (𝜔1 , …, 𝜔𝑛 ) : 𝜔𝑖 ∈ ℱ for all 𝑖 ∈ 𝒩} = ℱ𝑛 (8)

with cardinality

|Ω| = |ℱ𝑛 | = |ℱ|𝑛 = 𝑓 𝑛 (9)

The event we’re looking at is

𝐴 = {𝜔 ∈ Ω : 𝜔𝑖 ≠ 𝜔𝑗 for all 𝑖 ≠ 𝑗 ∈ 𝒩} (10)


|𝐴|
Clearly if 𝑛 > 𝑓 then 𝑃 (𝐴) = 0. Let 𝑛 ≤ 𝑓. The (uniform) probability of the event 𝐴 is 𝑃 (𝐴) = |Ω|
,
with
|𝐴| = # of ways to choose and order 𝑛 elements out of 𝑓
𝑓! (11)
= 𝑓(𝑓
⏟⏟⏟ − 1)…
⏟⏟ = 𝑓(𝑓 − 1)…(𝑓 − 𝑛 + 1) = (𝑓 − 𝑛)!
𝑛

𝑓!
𝑃 (𝐴) = (12)
𝑓 𝑛 (𝑓− 𝑛)!

Exercise 1.3 (Birthday paradox) What is the probability that at least 2 people out of 𝑛 have the
same birthday? (Assume: uniform birth probability and year with 𝑦 number of days).

Quick solution

𝑃 (𝐴) = 1 − 𝑃 ⎛
⎜no two people have the same birthday⎞
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ ⎟
⎝ Ex. 2 ⎠ (13)
𝑦!
=1− 𝑛
𝑦 (𝑦 − 𝑛)!

Formal solution Let 𝒩 = {1, …, 𝑛} and 𝒴 = {1, …, 𝑦} with 𝑛 ≤ 𝑦. The jet set is

2
Ω = distributions of possible birthdays of 𝑛 people
(14)
= {𝜔 = (𝜔1 , …, 𝜔𝑛 ) : 𝜔𝑖 ∈ 𝒴 for all 𝑖 ∈ 𝒩} = 𝒴𝑛

where 𝜔𝑖 is the birthday of the 𝑖-th person. The cardinality of the jet set is

|Ω| = |𝒴𝑛 | = |𝒴|𝑛 = 𝑦𝑛 (15)

The event we’re looking at is

𝐴 = {𝜔 ∈ Ω : ∃𝑖 ≠ 𝑗 ∈ 𝒩 : 𝜔𝑖 = 𝜔𝑗 } (16)

Note that this is the complementary event to the event defined in Equation 10 of Exercise 2. Thus we
can compute its probability as
𝑃 (𝐴) = 1 − 𝑃 (𝐴𝑐 ) (17)

in agreement with Equation 13.

Figure 1: Birthday paradox probability. Code available.

Exercise 1.4 (Same birthday as the prof) What is the probability that at least 1 student out of 𝑛
has the same birthday of the prof? (Assume: uniform birth probability and year with 𝑦 number
of days).

Quick solution

𝑃 (𝐴) = 1 − 𝑃 (nobody has the prescribed birth date)


⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
(18)
𝑦−1 𝑛
=1−( )
𝑦

Formal solution 1 As above 𝒩 = {1, …, 𝑛} and 𝒴 = {1, …, 𝑦} with 𝑛 ≤ 𝑦. The jet set is Ω = 𝒴𝑛+1 ,
that is the set of possible birthdays of 𝑛 + 1 people, the (𝑛 + 1)-th being the prof. Its cardinality is
|Ω| = 𝑦𝑛+1 . The event we’re looking at is

𝐴 = {𝜔 ∈ Ω : ∃𝑖 ∈ 𝒩 : 𝜔𝑖 = 𝜔𝑛+1 } (19)

with complementary event

3
𝐴𝑐 = {𝜔 ∈ Ω : 𝜔𝑖 ≠ 𝜔𝑛+1 ∀𝑖 ∈ 𝒩} (20)
𝑐
|𝐴 |
As usual 𝑃 (𝐴) = 1 − 𝑃 (𝐴𝑐 ) = 1 − |Ω|
, with

(𝑦 − 1)𝑛
|𝐴𝑐 | = ⏟𝑦 ⋅ ⏟
(21)
prof students
𝑦(𝑦−1)𝑛 𝑛
So, 𝑃 (𝐴) = 1 − 𝑦𝑛+1
= 1 − ( 𝑦−1
𝑦
) , in agreement with Equation 18.
Formal solution 2 Using the probability of the complementary event is often the smartest way to
proceed, but for the sake of completeness let’s see how to get the same result directly. Let 𝐴𝑗 be the
event “exactly 𝑗 students out of 𝑛 have the same birthday as the prof”. The event we look at then is
𝐴 = ⊔𝑗∈𝒩 𝐴𝑗 (22)

with probability (cf Equation 2)


∑𝑗∈𝒩 |𝐴𝑗 |
𝑃 (𝐴) = ∑ 𝑃 (𝐴𝑗 ) = (23)
𝑗∈𝒩
|Ω|

The cardinality of 𝐴𝑗 is
𝑛
|𝐴𝑗 | = ⏟
1…1 ⋅ (𝑦
⏟⏟ −⏟⏟⏟
1)…(𝑦 −
⏟⏟1) ⋅ ⏟𝑦 ⋅ ( )
⏟𝑗
𝑗 times 𝑛−𝑗 times prof
number of ways to choose 𝑗 elements out of 𝑛 (24)
𝑛
= 𝑦(𝑦 − 1)𝑛−𝑗 ( )
𝑗

By an application of the binomial theorem (Equation 3) and a short manipulation,


𝑛
∑|𝐴𝑗 | = 𝑦(𝑦𝑛 − (𝑦 − 1)𝑛 ) (25)
𝑗=1

which leads back to Equation 18.

2. Theory recap 18.9.24


2.1. Conditional probability
• Conditional probability
𝑃 (𝐴 ∩ 𝐵)
𝑃 (𝐴 | 𝐵) = if 𝑃 (𝐵) ≠ 0 (26)
𝑃 (𝐵)

• not reallty defined if 𝑃 (𝐵) = 0, cf [1] pag. 427.


• often used as
𝑃 (𝐴 ∩ 𝐵) = 𝑃 (𝐴 | 𝐵)𝑃 (𝐵) (27)
• Conditional probability and complementary event (proof: simple exercise.)
𝑃 (𝐴 | 𝐵) + 𝑃 (𝐴𝑐 | 𝐵) = 1 (28)
• Total probability theorem
𝑃 (𝐴) = 𝑃 (𝐴 | 𝐵)𝑃 (𝐵) + 𝑃 (𝐴 | 𝐵𝑐 )𝑃 (𝐵𝑐 ) (29)
• Bayes theorem

4
𝑃 (𝐴 ∩ 𝐵) = 𝑃 (𝐵 ∩ 𝐴) ⇒ 𝑃 (𝐴 | 𝐵)𝑃 (𝐵) = 𝑃 (𝐵 | 𝐴)𝑃 (𝐴) (30)

See this notebook for an example of Bayes theorem in action.

2.2. Independent events


Let Ω be equipped with a probability 𝑃 .
• two events 𝐴, 𝐵 ⊆ Ω are said independent if
𝑃 (𝐴 ∩ 𝐵) = 𝑃 (𝐴)𝑃 (𝐵) (31)
• 𝑛 events 𝐴1 , …, 𝐴𝑛 are said independent if

𝑃 (∩𝑖∈𝐼 𝐴𝑖 ) = ∏ 𝑃 (𝐴𝑖 ) for all 𝐼 ⊆ {1, …, 𝑛} (32)


𝑖∈𝐼
• pairwise independence does not imply independence of all events!

Exercises
Exercise 2.1 (Pile ou Face) Jet de 2 pieces, Ω = {PP, PF, FP, FF}. Cet espace est muni de la
probabilite uniforme. Soient les evenements:
• 𝐴 = 1ere piece donne P
• 𝐵 = 2eme piece donne F
• 𝐶 = le deux pieces donnent le meme resultat
Questions:
• 𝐴 et 𝐵 sont indépendantes?
• 𝐴, 𝐵 et 𝐶 sont indépendants?

Figure 2: Pairwise independence does not imply independence of all events!

Exercise 2.2 (Pieces mecaniques defectueuses) Parmi 10 pièces mécaniques, 4 sont


déefectueuses. On prend successivement deux pièces au hasard dans le lot (sans remise). Quelle
est la probabilité pour que les deux pièces soient correctes?

Solution 1 Let 𝐴𝑖 be the event the i-th drawn piece is good, with 𝑖 ∈ {1, 2}. We need the probability
of the event 𝐴2 ∩ 𝐴1 . By definition of conditional probavility,
1
𝑃 (𝐴2 ∩ 𝐴1 ) = 𝑃
⏟⏟(𝐴⏟
2 |⏟
𝐴⏟ 𝑃 (𝐴1 ) = .
1 )⏟
3 (33)
5 6
9 10

Solution 2 The jet set is the set of subsets of cardinality 2 of a set of cardinality 10, so |Ω| = ( 10
2
).
6
The event we consider is the set of subsets of cardinality 2 of a set of cardinality 6, so |𝐴| = ( 2 ).
Then

5
( 62 ) 6⋅5 1
𝑃 (𝐴) = = = . (34)
( 10
2
) 10 ⋅ 9 3

Exercise 2.3 (Betting on cards) We have three cards:


• a red card with both faces red;
• a white card with both faces white;
• a mixed card with a red face and a white face.
One of the three cards is drawn at random and one of the faces of this card (also chosen at
random) is exposed. This face is red. You are asked to bet on the color of the hidden face. Do you
choose red or white?

Intuitive solution The cards are 𝑅𝑅, 𝑅𝑊 , 𝑊 𝑊 with 𝑊 for white and 𝑅 for red. Call 𝑅𝑅 the
“red” card, 𝑊 𝑊 the “white” card, and 𝑊 𝑅 the “mixed” card. Since we observe a red face, the white
card cannot be on the table. There are three possibilities left: 1. we’re observing a face of the red card
(in which case the hidden face is red); 2. we are observing the other face of the red card (in which
case the hidden face is red); 3. we are observing the red face of the mixed card (in which case the
hidden face is white). So the hidden face is red 2 out of 3 times.
Formal solution The jet set contains the possible outcomes of a sequence of two events: 1. draw a
card (out of three), and 2. observe a face (out of two). Denote by 𝑅 a red face and by 𝑊 a white face,
and denote by a subscript 𝑜 the observed face, and by a subscript ℎ the hidden face. The possible
outcomes then are
Ω = {𝑅ℎ ∩ 𝑅𝑜 , 𝑅ℎ ∩ 𝑊𝑜 , 𝑊ℎ ∩ 𝑅𝑜 , 𝑊ℎ ∩ 𝑊𝑜 } (35)

where the first entry indicates the hidden face, and the second entry indicates the observed face. For
example, 𝑊ℎ ∩ 𝑅𝑜 is the event “the hidden face is white and the observed face is red”, and similarly for
the others.
In this formulation, every element in the jet set is the intersection of two (dependent) events of the
type 1. a face is hidden, and 2. a face is observed. Note that the event 𝑊ℎ ∩ 𝑅𝑜 is equivalent to the
event “the mixed card is drawn, and the red face is observed.” Under this second point of view, each
outcome in Ω is the intersection of two (dependent) events of the type 1. a card is drawn, and 2. a
face is observed. Denoting the event “draw the red card” by 𝑟, the event “draw the white card” by 𝑤,
and the event “draw the mixed card” by 𝑚, the jet set is equivalently
Ω = {𝑟 ∩ 𝑅𝑜 , 𝑚 ∩ 𝑊𝑜 , 𝑚 ∩ 𝑅𝑜 , 𝑤 ∩ 𝑊𝑜 } (36)

This formulation helps to understand that the probability on Ω is not uniform. The probabilities of
the events in Ω are computed by Equation 27:
𝑃 (𝑟 | 𝑅𝑜 )
𝑃 (𝑅ℎ ∩ 𝑅𝑜 ) = 𝑃 (𝑟 ∩ 𝑅𝑜 ) = (37)
𝑅𝑜

However, we do not know the probabilities on the right hand side. As a simple trick, remember that
𝑃 (𝐴 ∩ 𝐵) = 𝑃 (𝐵 ∩ 𝐴), so we can turn this around:
𝑃 (𝑅ℎ ∩ 𝑅𝑜 ) = 𝑃 (𝑅𝑜 ∩ 𝑟)
2
=𝑃 (𝑅 |⏟ 𝑃 (𝑟) =
𝑟)⏟ (38)
⏟⏟ ⏟ 𝑜 ⏟
6
1 1
3

6
𝑃 (𝑅ℎ ∩ 𝑊𝑜 ) = 𝑃 (𝑊𝑜 ∩ 𝑚)
1
=𝑃 (𝑊𝑜 |⏟
𝑚) 𝑃 (𝑚) = (39)
⏟⏟ ⏟ ⏟⏟ 6
1 1
2 3

𝑃 (𝑊ℎ ∩ 𝑅𝑜 ) = 𝑃 (𝑅𝑜 ∩ 𝑚)
1
=𝑃 (𝑅 𝑜 |⏟
𝑚) 𝑃 (𝑚) = (40)
⏟⏟ ⏟ ⏟⏟ 6
1 1
2 3

𝑃 (𝑊ℎ ∩ 𝑊𝑜 ) = 𝑃 (𝑊𝑜 ∩ 𝑤)
2
=𝑃 (𝑊 𝑜 |⏟𝑤) 𝑃 (𝑤) = (41)
⏟⏟ ⏟ ⏟⏟ 6
1 1
3

Now by Equation 26 and using these probabilities,


𝑃 (𝑊ℎ ∩ 𝑅𝑜 )
𝑃 (𝑊ℎ | 𝑅𝑜 ) =
𝑃 (𝑅𝑜 )
(42)
𝑃 (𝑊ℎ ∩ 𝑅𝑜 ) 1
= =
𝑃 (𝑅ℎ ∩ 𝑅𝑜 ) + 𝑃 (𝑊ℎ ∩ 𝑅𝑜 ) 3

𝑃 (𝑅ℎ ∩ 𝑅𝑜 )
𝑃 (𝑅ℎ | 𝑅𝑜 ) =
𝑃 (𝑅𝑜 )
𝑃 (𝑅ℎ ∩ 𝑅𝑜 ) 2 (43)
= =
𝑃 (𝑅ℎ ∩ 𝑅𝑜 ) + 𝑃 (𝑊ℎ ∩ 𝑅𝑜 ) 3
= 1 − 𝑃 (𝑊ℎ | 𝑅𝑜 )

where the last line follows from Equation 28 and gives directly the answer. So in conclusion, given
the fact that we observe a red face, the hidden face is also red with probability 2/3.

Exercise 2.4 (Russian roulette) You are playing two-person Russian roulette with a revolver
featuring a rotating cylinder with six bullet slots. Each time the gun is triggered, the cylinder
rotates by one slot. Two bullets are inserted one next to the other into the cylinder, which is then
randomly positioned. Your opponent is the first to place the revolver against her temple. She
presses the trigger and… she stays alive. With great display of honor, she offers you to rotate the
barrel again at random before firing in turn. What do you decide?

The bullets occupy the positions 𝑥 and 𝑥 + 1 mod 6:


Ω = {12, 23, 34, 45, 56, 61} (44)

Say the revolver shots from position 1. The event “the first player dies” is
die1 = {12, 61} (45)
1
so 𝑃 (die1 ) = 3
and 𝑃 (live1 ) = 23 . We need to compute
𝑃 (die2 ∩ live1 )
𝑃 (die2 | live1 ) = (46)
𝑃 (live1 )

7
Since the cylynder rotates after being triggered we have die2 = {56, 61} and die2 ∩ live1 = {56}, so
𝑃 (die2 | live1 ) = 16 / 23 = 14 < 𝑃 (die1 ). So you don’t shuffle the barrel.

3. Theory recap 26.9.24


3.1. Probability measure
The relevant references are [2] pag. 11 and [1], pag. 22 and 160.
Definition 3.1.1 (sigma-field) Let Ω be any set. A 𝜎-field 𝒜 on Ω is a collection of subsets of Ω
that¹
1. is closed under complement: if 𝐴 ∈ 𝒜 then 𝐴𝑐 ∈ 𝒜;
2. contains the whole set: Ω ∈ 𝒜;
3. is closed under countable union: if 𝐴1 , 𝐴2 , … is a countable family of sets of 𝒜 then their union
∪∞𝑖=1 𝐴𝑖 is also in 𝒜.

A subset of Ω that is in 𝒜 is called event.


Definition 3.1.2 (Measure) Given a set Ω and a 𝜎-algebra 𝒜 on Ω, a measure 𝜇 is a function
𝜇 : 𝒜 → ℝ≥0 (47)

such that
1. 𝜇(∅) = 0
2. countable additivity (also called 𝜎-additivity) is fulfilled, namely the measure of a disjoint
countable union of sets in 𝒜 is the sum of their measures:
∞ ∞
𝜇(⨆ 𝐴𝑖 ) = ∑ 𝜇(𝐴𝑖 ). (48)
𝑖=1 𝑖=1

Definition 3.1.3 (Probability measure) Given a set Ω and a 𝜎-algebra 𝒜 on Ω, a probaility measure
𝑃 is a measure (in the sense above) with the additional requirement that
𝑃 (Ω) = 1. (49)
• Note that this implies that 𝑃 (𝐴) ≤ 1 for all events 𝐴 ∈ 𝒜.
• A triple (Ω, 𝒜, 𝑃 ) where 𝒜 is a 𝜎-algebra on Ω and 𝑃 is a probability measure is called probability
space.

Take-away Putting all together, a probability measure 𝑃 : 𝒜 → [0, 1] on a space Ω is a


function from a “well-behaved” collection of subsets of Ω (the 𝜎-field) to [0, 1], such that
𝑃 (∅) = 0, 𝑃 (Ω) = 1, and fulfilling countable additivity.

3.2. Discrete random variables


Definition 3.2.1 (Discrete random variable) Given a probability space (Ω, 𝒜, 𝑃 ), a discrete random
variable 𝑋 is a function 𝑋 : Ω → 𝐹 such that
1. 𝐹 is a countable set;
2. the level sets of 𝑋 are events, that is
{𝑋 = 𝑥} ≔ {𝜔 ∈ Ω : 𝑋(𝜔) = 𝑥} ∈ 𝒜 for all 𝑥 ∈ 𝐹 (50)
• clearly, {𝑋 = 𝑥} = ∅ ∈ 𝒜 for all 𝑥 ∈ 𝐹 ∖ Im(𝑋)

¹In french, this set is called tribu on Ω. The term 𝜎-algebra is also used – and is more common in the context of pure
analysis, c.f. [3] – whereas the term 𝜎-field is more common in the context of probabioloty theory, c.f. [1].

8
• the second property guarantees that 𝑃 {𝑋 = 𝑥} is well-defined for all 𝑥 ∈ 𝐹 , which allows for the
following definition:
Definition 3.2.2 (Distribution of a discrete random variable) The distribution (or law) of a random
variable 𝑋 is the function 𝜇 : 𝐹 → [0, 1] defined by
𝜇(𝑥) = 𝑃 {𝑋 = 𝑥} for all 𝑥 ∈ 𝐹 . (51)

• two discrete random variables 𝑋 and 𝑌 taking values resp. in 𝐹 and 𝐺 are independent if
𝑃 {𝑋 = 𝑥, 𝑌 = 𝑦} = 𝑃 {𝑋 = 𝑥}𝑃 {𝑌 = 𝑦} for all 𝑥 ∈ 𝐹 , 𝑦 ∈ 𝐺 (52)
• it is understood that {𝑋 = 𝑥, 𝑌 = 𝑦} is a shorthand for the event
{𝜔 ∈ Ω : 𝑋(𝜔) = 𝑥} ∩ {𝜔 ∈ Ω : 𝑌 (𝜔) = 𝑦} ∈ 𝒜. (53)
• the definition generalises to collections of DRVs, see Section 2.2.3 in [2].

Take-away A discrete random variable is a function on Ω with countable range. Think of it as an


experiment with a random outcome. Its law, or distribution, gives the probability to observe each
of the possible (countable) values of the random variable.

Take-away
• (Ω, 𝒜, 𝑃 ) with 𝑃 : 𝒜 → [0, 1] and 𝑃 (Ω) = 1
• 𝑋 : Ω → 𝐹 countable, with {𝑋 = 𝑥} ∈ 𝒜 for all 𝑥 ∈ 𝐹
• 𝜇 : 𝐹 → [0, 1] such that 𝜇(𝑥) = 𝑃 {𝑋 = 𝑥}

3.3. Standard discrete distributions


3.3.1. Bernoulli ℬ(𝑝)
• The Bernoulli distribution models a random experiment which has two possible outcomes.
• More precisely, the Bernoulli distribution is the distribution of a discrete random variable 𝑋 that
can assume only values in 𝐹 = {0, 1}.
• The distribution is parametrized by 𝑝 ∈ [0, 1], and we write 𝑋 ∼ ℬ(𝑝).
𝜇 : 𝐹 → [0, 1]
1⟼𝑝
(54)
0⟼ 1−𝑝
𝑥 ⟼ 𝑝𝑥 (1 − 𝑝)1−𝑥

3.3.2. Binomial ℬ(𝑛, 𝑝)


• Distribution of the discrete random variable 𝑋 = 𝑋1 + … + 𝑋𝑛 , where the 𝑋𝑖 -s are independent
Bernoulli variables of parameter 𝑝 ∈ [0, 1].
• 𝐹 = {0, …, 𝑛}; 𝑘 ∈ 𝐹 is value of the sum
𝜇 : 𝐹 → [0, 1]
𝑛 (55)
𝑘 ⟼ ( )𝑝𝑘 (1 − 𝑝)𝑛−𝑘
𝑘

3.3.3. Poisson 𝒫(𝜆)


• probability of observing a given number of independent events occurring at constant rate 𝜆 > 0
• 𝐹 = ℕ; 𝑛 ∈ 𝐹 is number of observed events

9
𝜇 : 𝐹 → [0, 1]
𝜆𝑛 (56)
𝑛 ⟼ 𝑒−𝜆
𝑛!

3.3.4. Geometric 𝒢(𝑝)


• First successful event from a sequence of independent 𝑝-Bernoulli events.
• 𝐹 = ℕ∗ ; 𝑘 ∈ 𝐹 is first succesful event
𝜇 : 𝐹 → [0, 1]
(57)
𝑘 ⟼ 𝑝(1 − 𝑝)𝑘−1

3.4. Useful stuff


• Vandermonde’s identity
𝑘
𝑛1 𝑛2 𝑛 + 𝑛2
∑( )( )=( 1 ) (58)
𝑘1 =0
𝑘1 𝑘 − 𝑘1 𝑘

Exercises
Exercise 3.1 (Sum of independent binomial distributions) Let 𝑋𝑖 ∼ ℬ(𝑛𝑖 , 𝑝) with 𝑖 ∈ {1, 2} be
independent discrete random variables following the Bernoulli law. Find the law of 𝑋1 + 𝑋2 .

Hint: c.f. derivation of binomial distribution [2] pag. 16.


The laws 𝜇𝑖 : 𝐹𝑖 = {0, …, 𝑛𝑖 } → [0, 1] of the two variables are given by
𝑛𝑖 𝑘
𝜇𝑖 (𝑘𝑖 ) = 𝑃 (𝑋𝑖 = 𝑘𝑖 ) = ( )𝑝 𝑖 (1 − 𝑝)𝑛𝑖 −𝑘𝑖 (59)
𝑘𝑖

The law of 𝑋1 + 𝑋2 takes value in 𝐹 = {0, …, 𝑛1 + 𝑛2 } and for all 𝑘 ∈ 𝐹 is given by


𝜇(𝑘) = 𝑃 (𝑋1 + 𝑋2 = 𝑘)


⎜ ⎞

= 𝑃⎜
⎜ ⨆ {𝑋1 = 𝑘1 , 𝑋2 = 𝑘2 }⎟

⎜ 𝑘 ∈𝐹 ⎟
𝑖 𝑖
⎝ 1 2
𝑘 +𝑘 =𝑘 ⎠
= ∑ 𝑃 (𝑋1 = 𝑘1 )𝑃 (𝑋2 = 𝑘2 ) by c. add and indep.
𝑘𝑖 ∈𝐹𝑖
𝑘1 +𝑘2 =𝑘

= ∑ 𝜇(𝑘1 )𝜇(𝑘2 ) by def of law (60)


𝑘𝑖 ∈𝐹𝑖
𝑘1 +𝑘2 =𝑘

𝑛1 𝑛
= ∑ ( )( 2 )𝑝𝑘1 +𝑘2 (1 − 𝑝)𝑛1 +𝑛2 −𝑘1 −𝑘2
𝑘𝑖 ∈𝐹𝑖
𝑘1 𝑘2
𝑘1 +𝑘2 =𝑘

𝑛1 𝑛
= 𝑝𝑘 (1 − 𝑝)𝑛1 +𝑛2 −𝑘 ∑ ( )( 2 )
𝑘𝑖 ∈𝐹𝑖
𝑘1 𝑘2
𝑘1 +𝑘2 =𝑘

Let’s focus on the sum. For each fixed 𝑘1 ∈ 𝐹1 , 𝑘2 is constrained to be 𝑘 − 𝑘1 . Furthermore, in order
for 𝑘2 to be ≥ 0, 𝑘1 can be at most equal to 𝑘. So the constraints

10
𝑘1 ∈ {0, …, 𝑛1 }
𝑘2 ∈ {0, …, 𝑛2 } (61)
𝑘1 + 𝑘2 = 𝑘

can be replaced by the constraints


𝑘1 ∈ {0, …, 𝑘}
(62)
𝑘2 = 𝑘 − 𝑘1

namely
𝑘
𝑛1 𝑛 𝑛 𝑛2 𝑛 + 𝑛2
∑ ( )( 2 ) = ∑ ( 1 )( )=( 1 )
𝑘1 𝑘2 𝑘1 𝑘 − 𝑘1 𝑘 (63)
𝑘𝑖 ∈𝐹𝑖 𝑘 =01
𝑘1 +𝑘2 =𝑘

where the second step follows by by Vandermonde’s identity.


Remark 3.4.1 Note that it is correct to have 𝑘1 running from 0 to 𝑘:
• If 𝑘 ≤ 𝑛1 , 𝑘1 can be at most 𝑘 so that 𝑘2 = 𝑘 − 𝑘1 ≥ 0, so we have directly Equation 63.
• If 𝑘 > 𝑛1 , we have
1 𝑛
𝑛1 𝑛2 𝑛 𝑛2
∑ ( )( ) = ∑ ( 1 )( )
𝑘 ∈𝐹
𝑘1 𝑘2 𝑘 =0
𝑘1 𝑘 − 𝑘1
𝑖 𝑖 1
𝑘1 +𝑘2 =𝑘
(64)
𝑛1 𝑘
𝑛1 𝑛2 𝑛 𝑛2
= ∑( )( ) + ∑ ( 1 )( )
𝑘1 =0
𝑘1 𝑘 − 𝑘1 𝑘 =𝑛 +1
𝑘1 𝑘 − 𝑘1
1 1

since each summand in the the second sum is zero², and we get again Equation 63.
So in conclusion
𝑛1 + 𝑛2 𝑘
𝜇(𝑘) = ( )𝑝 (1 − 𝑝)𝑛1 +𝑛2 −𝑘 (65)
𝑘
namely 𝑋1 + 𝑋2 ∼ ℬ(𝑛1 + 𝑛2 , 𝑝).

Exercise 3.2 (Sum of independent Poisson distributions) Let 𝑋𝑖 ∼ 𝒫(𝜆𝑖 ) with 𝑖 ∈ {1, 2} be
independent discrete random variables following the Poisson law. Find the law of 𝑋1 + 𝑋2 .

Hint: c.f. previous exercise and binomial theorem.


Analogously to before, with 𝑖 ∈ {1, 2}, we look for the law 𝜇 : ℕ → [0, 1] given by

𝜇(𝑛) = 𝑃 (𝑋1 + 𝑋2 = 𝑛) = ∑ 𝜇1 (𝑛1 )𝜇2 (𝑛2 )


𝑛𝑖 ∈ℕ
𝑛1 +𝑛2 =𝑛
𝑛1 𝑛2 (66)
−𝜆1 𝜆1 −𝜆2 𝜆2
= ∑ 𝑒 𝑒
𝑛𝑖 ∈ℕ
𝑛1 ! 𝑛2 !
𝑛1 +𝑛2 =𝑛

As before we replace the consraint by 𝑛1 ∈ {0, …, 𝑛} and 𝑛2 = 𝑛 − 𝑛1 , so

²Recall that ( 𝑎𝑏 ) = 0 if 𝑏 > 𝑎.

11
𝑛 𝑛 𝑛−𝑛
𝜆1 1 𝜆2 1 𝑛!
𝜇(𝑛) = 𝑒−(𝜆1 +𝜆2 ) ∑ ⋅
𝑛1
𝑛 ! 𝑛 − 𝑛1 ! 𝑛!
=0 1

𝑒−(𝜆1 +𝜆2 ) 𝑛 𝑛 𝑛 𝑛−𝑛


= ∑ ( )𝜆1 1 𝜆2 1 (67)
𝑛! 𝑛 =0
𝑛1
1

(𝜆1 + 𝜆2 )𝑛
= 𝑒−(𝜆1 +𝜆2 )
𝑛!
So 𝑋1 + 𝑋2 ∼ 𝒫(𝜆1 + 𝜆2 ).

Exercise 3.3 (Min of independent geometric distributions) Let 𝑋𝑖 ∼ 𝒢(𝑝𝑖 ) with 𝑖 ∈ {1, 2} be
independent DRVs following the geometric law. Find the law of min{𝑋1 + 𝑋2 }.

Hint: set up the problem in terms of inequalities.


Let 𝑍 = min{𝑋1 + 𝑋2 }. We look for the law 𝜇 : ℕ∗ → [0, 1] such that
𝜇(𝑘) = 𝑃 (𝑍 = 𝑘)
= 𝑃 (𝑍 ≥ 𝑘) − 𝑃 (𝑍 ≥ 𝑘 + 1)
(68)
= 𝑃 (𝑋1 ≥ 𝑘, 𝑋2 ≥ 𝑘) − 𝑃 (𝑋1 ≥ 𝑘 + 1, 𝑋2 ≥ 𝑘 + 1)
= 𝑃 (𝑋1 ≥ 𝑘)𝑃 (𝑋2 ≥ 𝑘) − 𝑃 (𝑋1 ≥ 𝑘 + 1)𝑃 (𝑋2 ≥ 𝑘 + 1)

Let’s drop the subscript for a moment. For a DRV 𝑋 ∼ 𝒢(𝑝) and for 𝑘 ∈ ℕ∗ we need

𝑃 (𝑋 ≥ 𝑘) = 𝑃 ( ⨆ (𝑋 = 𝑖))
𝑥≥𝑘

= ∑ 𝑃 (𝑋 = 𝑖)
𝑖≥𝑘

= ∑ 𝑝(1 − 𝑝)𝑖−1
𝑖≥𝑘
(69)
= 𝑝(1 − 𝑝)𝑘−1 + 𝑝(1 − 𝑝)𝑘 + 𝑝(1 − 𝑝)𝑘+1 + …
= 𝑝(1 − 𝑝)𝑘−1 (1 + (1 − 𝑝) + (1 − 𝑝)2 + …)

= 𝑝(1 − 𝑝)𝑘−1 ∑ (1 − 𝑝)𝑗
𝑗=0

= (1 − 𝑝)𝑘−1

Plugging in Equation 68 we get


𝜇(𝑘) = 𝑃 (𝑋1 ≥ 𝑘)𝑃 (𝑋2 ≥ 𝑘) − 𝑃 (𝑋1 ≥ 𝑘 + 1)𝑃 (𝑋2 ≥ 𝑘 + 1)
= (1 − 𝑝1 )𝑘−1 (1 − 𝑝2 )𝑘−1 − (1 − 𝑝1 )𝑘 (1 − 𝑝2 )𝑘 (70)
= (1 − 𝑝1 )𝑘−1 (1 − 𝑝2 )𝑘−1 [1 − (1 − 𝑝1 )(1 − 𝑝2 )]

Let 𝛼 = (1 − 𝑝1 )(1 − 𝑝2 ) and 𝛽 = 1 − 𝛼 = 𝑝1 + 𝑝2 − 𝑝1 𝑝2 , so 𝜇(𝑘) = 𝛽(1 − 𝛽)𝑘−1 , i.e. 𝑍 ∼ 𝒢(𝛽).

12
4. Theory recap 2.10.24
Recall from last week that

Take-away
• (Ω, 𝒜, 𝑃 ) with 𝑃 : 𝒜 → [0, 1] and 𝑃 (Ω) = 1
• 𝑋 : Ω → 𝐹 countable, with {𝑋 = 𝑥} ∈ 𝒜 for all 𝑥 ∈ 𝐹
• 𝜇 : 𝐹 → [0, 1] such that 𝜇(𝑥) = 𝑃 {𝑋 = 𝑥}

4.1. Expected value of a discrete random variable


In this section when we say “𝑋 is a RV” we mean “𝑋 : Ω → 𝐹 ⊂ ℝ is a discrete random variable
with real values.”
Definition 4.1.1 (Expected value) A RV is integrable if ∑𝑥∈𝐹 |𝑥|𝑃 (𝑋 = 𝑥) < +∞, and in this case
its expected value 𝔼(𝑋) is the real number

𝔼(𝑋) ≔ ∑ 𝑥𝑃 (𝑋 = 𝑥). (71)


𝑥∈𝐹

Proposition 4.1.2 (Linearity of expectation)


𝔼(𝑋 + 𝑎𝑌 ) = 𝔼(𝑋) + 𝑎𝔼(𝑌 ) (72)

Some properties of the expected value of a RV:


• 𝔼(constant) = constant
• Sufficient condition, positivity, monotonicity: see [2] pag. 20.
Some examples:
• 𝑋 ∼ ℬ(𝑝) ⇒ 𝔼(𝑋) = 𝑝
• 𝑋 ∼ ℬ(𝑛, 𝑝) ⇒ 𝔼(𝑋) = 𝑛𝑝 (immediate by linearity from the above)
• 𝑋 ∼ 𝒫(𝜆) ⇒ 𝔼(𝑋) = 𝜆
• 𝑋 ∼ 𝒢(𝑝) ⇒ 𝔼(𝑋) = 𝑝1
Theorem 4.1.3 (Expectation of function) Let 𝑋 : Ω → 𝐹 ⊂ ℝ be RV, and consider some function
𝑓 : 𝐹 → ℝ. Then

𝔼(𝑓(𝑋)) = ∑ 𝑓(𝑥)𝑃 (𝑋 = 𝑥) (73)


𝑥∈𝐹

whenever defined (see [2] Th. 2.3.6 for details).


Proposition 4.1.4 (Expectation and independence) Let 𝑋, 𝑌 be RV and 𝑓, 𝑔 two functions on their
values such that all the expectations are well-defined (i.e. al the random variables are integrable).
Then
𝔼(𝑓(𝑋)𝑔(𝑌 )) = 𝔼(𝑓(𝑋))𝔼(𝑔(𝑌 )) (74)

4.2. Variance of a discrete random variable


Definition 4.2.1 (Variance) A RV 𝑋 is called square integrable if 𝑋 2 is integrable, that is if
∑𝑥∈𝐹 𝑥2 𝑃 (𝑋 = 𝑥) < +∞, and in this case

Var(𝑋) ≔ 𝔼[(𝑋 − 𝔼[𝑋])2 ] (75)


• If 𝑋 is square integrable then the variance is well defined, cf [2] Remark 2.3.11

13
• The variance is a measure of the spreading, dispersion, of a random variable around its expected
value
The two following properties of the variance are very useful for concrete calculations:
Lemma 4.2.2 (Variance as difference of expectations)

Var(𝑋) = 𝔼(𝑋 2 ) − 𝔼(𝑋)2 (76)

The variance is in general not linear:


Lemma 4.2.3 (Variance after scaling and shifting)

Var(𝑎𝑋 + 𝑏) = 𝑎2 Var(𝑋) for all 𝑎, 𝑏 ∈ ℝ (77)

Proof Exercise; both lemmas follow by linearity of the expectation. □


Proposition 4.2.4 (Variance and independence) Let (𝑋𝑖 )𝑖∈{1,…,𝑛} be a family of square integrable
random variables. Then their sum is square integrable, and if the 𝑋𝑖 are independent then
𝑛 𝑛
Var(∑ 𝑋𝑖 ) = ∑ Var(𝑋𝑖 ) (78)
𝑖=1 𝑖=1

Some examples:
• 𝑋 ∼ ℬ(𝑝) ⇒ Var(𝑋) = 𝑝(1 − 𝑝), see Exercise 4.3 and proof.
• 𝑋 ∼ ℬ(𝑛, 𝑝) ⇒ Var(𝑋) = 𝑛𝑝(1 − 𝑝) (immediate by Proposition 4.2.4)
• 𝑋 ∼ 𝒫(𝜆) ⇒ Var(𝑋) = 𝜆, see proof.

Exercises
Exercise 4.1 (Lost messages) On a telecommunication channel, it has been estimated that in 𝑇
time units there arrives a number of messages that can be estimated by a DRV ∼ 𝒫(𝜆𝑇 ). Each
message has a loss probability equal to 𝑝, independent of the other messages. Find the
probability that the number of lost message in 𝑇 units of time is equal to 𝑙.

Without loss of generality rescale 𝜆 ← 𝜆𝑇 . We need to find the discrete random variable 𝐿 whose
range {0, 1, 2, …} ∋ 𝑙 contains the possible numbers 𝑙 of lost messages in one time unit. The
probability 𝑃 (𝐿 = 𝑙) to lose 𝑙 message is then by definition by the law of 𝐿.
Let 𝑋𝑖 be the DRV for the event “the 𝑖-th message is lost”. Since each message is lost with probability
𝑝, 𝑋𝑖 ∼ ℬ(𝑝) for all 𝑖 ∈ {1, 2, …}.
Let 𝐿𝑎 = ∑𝑎𝑖=1 𝑋𝑖 be the DRV whose range Im(𝐿𝑎 ) = {0, 1, …, 𝑎} ∋ 𝑙 contains the numbers 𝑙 of
possible lost messages out of 𝑎 arrived. Since 𝐿𝑎 is the sum of 𝑎 independent 𝑝-Bernoulli DRVs, 𝐿𝑎
follows the binomial distribution:
𝐿𝑎 ∼ ℬ(𝑎, 𝑝). (79)

Finally, let 𝐴 be the DRV estimating the number of arrived messages 𝑎 ∈ {0, 1, …, } in one time unit;
we are given that 𝐴 ∼ 𝒫(𝜆).
The law of 𝐿 is given by

𝑃 (𝐿 = 𝑙) = 𝑃 [⨆ {𝐿𝑎 = 𝑙 ∩ 𝐴 = 𝑎}], (80)
𝑎=𝑙

14
that is, we look at the disjoint union of all the events in which, given 𝑎 arrived messages, 𝑙 are lost.
By countable additivity and independence,

𝑃 (𝐿 = 𝑙) = ∑ 𝑃 (𝐿𝑎 = 𝑙)𝑃 (𝐴 = 𝑎) (81)
𝑎=𝑙

Now 𝐿𝑎 follows a binomial distribution and 𝐴 follows a Poisson distribution, so



𝑎 𝜆𝑎 𝜆𝑙
𝑃 (𝐿 = 𝑙) = ∑( )𝑝𝑙 (1 − 𝑝)𝑎−𝑙 𝑒−𝜆 ⋅
𝑎=𝑙
𝑙 𝑎! 𝜆𝑙

1 1
= 𝜆𝑙 𝑝𝑙 𝑒−𝜆 ∑ (1 − 𝑝)𝑎−𝑙 𝜆𝑎−𝑙
𝑙! 𝑎=𝑙
(𝑎 − 𝑙)!

1 ∞
(𝜆 − 𝜆𝑝)𝑗 (82)
= 𝜆𝑙 𝑝𝑙 𝑒−𝜆 ∑
𝑙! 𝑗=0
𝑗!
1
= 𝜆𝑙 𝑝𝑙 𝑒−𝜆 𝑒𝜆−𝜆𝑝
𝑙!
(𝜆𝑝)𝑙 −𝜆𝑝
= 𝑒 .
𝑙!
So 𝑉 ∼ 𝒫(𝜆𝑝).

1
Exercise 4.2 (Poisson expectation) Let 𝑁 ∼ 𝒫(𝜆). Find 𝔼(𝑋 ≔ 𝑁+1
)
.
By Theorem 4.1.3 we have

1 −𝜆 𝜆𝑛
𝔼(𝑋) = ∑ 𝑒 (83)
𝑛=0
𝑛+1 𝑛!
1−𝑒−𝜆
Multiply and divide by 𝜆 and shift the running index to get 𝔼(𝑋) = 𝜆
.

Exercise 4.3 (Archery) An archer shots 𝑛 arrows at a target. The shots are independent, and
each shot hits the target with probability 𝑝. Let 𝑋 be the random variable “number of times the
target is hit”.
1. What is the law of 𝑋?
2. What is the expectation of 𝑋?
3. What is the value of 𝑝 that maximises the variance of 𝑋?
The archer bets on his result. He gets 𝑔 euros when he hits the target, and loses 𝑙 euros when he
misses it. Let 𝑌 be the random variable that represent the net gain of the archer at the end of the
𝑛 shots.
4. What is the expectation of 𝑌 ?
5. What is the relation between 𝑔 and 𝑙 that guarantees the archer an expected gain of zero?

1. 𝑋 is the sum of 𝑛 independent 𝑝-Bernoullli variables, hence 𝑋 ∼ ℬ(𝑛, 𝑝) (binomial distribution).


2. We have to compute the expectation of a binomial random variable 𝑋 = 𝑋1 + … + 𝑋𝑛 , where
each 𝑋𝑖 is a Bernoulli variable. Since expectations are linear we can compute the expectation of
the Bernoulli variables, and sum them:
𝔼(𝑋𝑖 ) = 1 ⋅ 𝑝 + 0 ⋅ (1 − 𝑝) = 𝑝 (84)

15
𝔼(𝑋1 + … + 𝑋𝑛 ) = 𝔼(𝑋1 ) + … + 𝔼(𝑋𝑛 ) = 𝑛𝑝 (85)

For example, if 𝑝 = 0.5 and 𝑛 = 10, this means that the archer expects to hit the target 5 times.
3. Let’s compute the variance of a Bernoulli and a binomial variable by Equation 76:

𝔼(𝑋𝑖2 ) = 12 ⋅ 𝑝 + 02 ⋅ (1 − 𝑝) = 𝑝 (86)

Var(𝑋𝑖 ) = 𝔼(𝑋𝑖2 ) − 𝔼(𝑋𝑖 )2 = 𝑝(1 − 𝑝) (87)

By independence, Var(𝑋1 + … + 𝑋𝑛 ) = Var(𝑋1 ) + … + Var(𝑋𝑛 ) = 𝑛𝑝(1 − 𝑝). To find the value


of 𝑝 that maximises the variance differentiate: 𝑛(1 − 2𝑝) = 0 ⇒ 𝑝 = 0.5.
4. Let 𝑌𝑖 = gain of 𝑖-th shot. Then 𝔼(𝑌𝑖 ) = 𝑔𝑝 − 𝑙(1 − 𝑝), and
𝔼(𝑌 = 𝑌1 + … + 𝑌𝑛 ) = 𝑛[𝑔𝑝 − 𝑙(1 − 𝑝)] (88)

For example if 𝑛 = 10, 𝑔 = 1, 𝑙 = 2, we have 𝔼(𝑌 ) = 30𝑝 − 20; and if furthermore 𝑝 = 0.5 then
𝔼(𝑌 ) = −5.
5. To find the value of relation between 𝑔 and 𝑙 required to have an expected gain of zero solve the
equation 𝔼(𝑌 ) = 0 to get
𝑔 1−𝑝
= . (89)
𝑙 𝑝

Thus as the probability 𝑝 to hit the target goes to zero, a very big 𝑔𝑙 is required to guarantee an
expected gain of zero; viceversa 𝑔𝑙 becomes infinitely small as 𝑝 → 1. At 𝑝 = 0.5, as one would
expect, 𝑔 = 𝑙.

Exercise 4.4 (Smart betting?) Let 𝑋𝑛 ∼ ℬ(𝑝 = 12 ), 𝑛 = 0, 1, 2, … be independent Bernoulli


variables. You bet 𝑏𝑛 = 2𝑛−1 on the outcome of 𝑋𝑛 if 𝑋0 = 𝑋1 = 𝑋2 = … = 𝑋𝑛−1 , and zero
else. If you bet and you win, tou receive 𝑔𝑛 = 2𝑏𝑛 .
1. What is the expected amount of money bet?
2. What is the expected gain?

1. Let’s write down explicitely the process:


• 𝑋0 is drawn → get nothing, and bet 𝑏1 on 𝑋1 .
• 𝑋1 is drawn → get 𝑔1 with probability 𝑝. If 𝑋1 = 𝑋0 , bet 𝑏2 on 𝑋2 , else stop the game.
• 𝑋2 is drawn → get 𝑔2 with probability 𝑝. If 𝑋2 = 𝑋0 , bet 𝑏3 on 𝑋3 , else stop the game.
• …
• 𝑋𝑛−1 is drawn → get 𝑔𝑛−1 with probability 𝑝. If 𝑋𝑛−1 = 𝑋0 , bet 𝑏𝑛 on 𝑋𝑛 , else stop the game.
• …
Then the expected amount of money bet 𝑀 is
𝔼(𝑀 ) = 𝑏1 𝑃 (𝑋1 ≠ 𝑋0 ) +
+(𝑏1 + 𝑏2 ) 𝑃 (𝑋1 = 𝑋0 , 𝑋2 ≠ 𝑋0 ) +
+(𝑏1 + 𝑏2 + 𝑏3 )𝑃 (𝑋1 = 𝑋0 , 𝑋2 = 𝑋0 , 𝑋3 ≠ 𝑋0 ) + (90)
+… +
+(𝑏1 + … + 𝑏𝑛 )𝑃 (𝑋1 = 𝑋0 , …, 𝑋𝑛−1 = 𝑋0 , 𝑋𝑛 ≠ 𝑋0 ) + …
1−2𝑛
Let 𝐵𝑛 = ∑𝑛𝑘=1 𝑏𝑘 = ∑𝑛𝑘=1 2𝑘−1 = ∑𝑛−1
𝑙=0
2𝑙 = 1−2
= 2𝑛 − 1. Then

16
∞ ∞
1 2𝑛 − 1
𝔼(𝑀 ) = ∑ 𝐵𝑛 ⋅ = ∑ = +∞. (91)
𝑛=1
2𝑛 𝑛=1 2𝑛

The idea is that the expected bet amount diverges even if one stops betting at some point.
2. This is for the standard doubling scenario, without the constraint bet iff all previous outcomes are the
same.
After 𝑛 bets the bet amount is 𝐵𝑛 = ∑𝑛𝑘=1 𝑏𝑘 = 2𝑛 − 1. If I win at the (𝑛 + 1)-th round the net win
equals the initial bet:
2𝑏𝑛 − 𝐵𝑛 = 2𝑛 − (2𝑛 − 1) = +1. (92)

Now the probability to lose 𝑛 times and win at the (𝑛 + 1)-th time is (1 − 𝑝)𝑛 𝑝, so the expected net
gain is also equal to the initial bet:

𝑝
𝔼(Net gain) = ∑ (1 − 𝑝)𝑛 𝑝 ⋅ (+1) = = 1. (93)
𝑛=0
𝑝

For more details: Martingale betting_system.

Bibliography
[1] P. Billingsley, Probability and Measure. John Wiley & Sons, 2012.
[2] B. Jourdain, Probabilités et statistique pour l'ingénieur. 2018.
[3] E. H. Lieb and M. Loss, Analysis, 2nd ed. in Graduate Studies in Mathematics 14. American
Mathematical Society, 2001.

17

You might also like