3_Conditional Prob

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Slide set  Conditional Probability

∙ Conditional probability
∙ Chain rule
∙ Sequential calculation of probabilities
∙ Law of total probability
∙ Bayes rule
∙ Independence
∙ Summary

© Copyright  Abbas El Gamal


Conditional probability

∙ We are often interested in the probability of event A given that event B occurred
∙ Examples:
󳶳 Alice tests positive for a certain disease, what is the probability that she has this disease?
󳶳 A  is read out of a memory chip, what is the probability that a  was stored?

∙ Definition: The conditional probability of A given B (with P(B) ̸= ) is:


P(A ∩ B) P(A, B)
P(A | B) = , or
P(B) P(B)
∙ Interpretation: Given B, only outcomes in A that are in A ∩ B can occur
Dividing by P(B) then normalizes the probability that A ∩ B occurs,
which we call the conditional probability of A given B

 / 
Example

∙ Roll a -sided die twice. So, the sample space is {, , , }
Assume each sample point has probability /.
Let B be the event that the minimum of the two die rolls is , and
A be the event that the maximum of the two die rolls is 
Find P(A | B)
B
P(A, B) 
P(A | B) =
P(B)

P(B) = 

 nd roll
P(A, B) =
 

P(A | B) =


  
Compared to P(A) = A  
 st roll

 / 
P(⋅ | B) is a probability measure

∙ For a given B, P(⋅ | B) constitutes a probability measure on Ω


󳶳 Positivity: P(A | B) ≥ 
P(Ω ∩ B) P(B)
󳶳 Normalization: P(Ω | B) = = =
P(B) P(B)
󳶳 Additivity: A , A disjoint evens, then

P((A ∪ A ) ∩ B) P 󶀡(A ∩ B) ∪ (A ∩ B)󶀱 P(A ∩ B) + P(A ∩ B)


P(A ∪ A | B) = = =
P(B) P(B) P(B)
= P(A | B) + P(A | B)
B

∙ For the -sided die example:
󳶳 Unconditional probability measure first

nd roll
assigns probability / to each outcome
󳶳 Conditioned on B, the conditional

probability measure first assigns probability
/ to (, ), (, ), (, ), (, ), (, ) and zero 
to all other outcomes    
st roll
 / 
Chain rule
∙ Probability measures are often specified via conditional probabilities
∙ Two methods to calculate probability of an event from conditional probabilities:
󳶳 the sequential method and
󳶳 a “divide and conquer” method using the law of total probability

∙ Both use the chain rule:


By the definition of conditional probability, P(A ∩ B) = P(A | B) P(B)
Suppose that A , A , . . . , An are events, then
P 󶀡 ∩ni= Ai 󶀱 = P 󶀡 ∩n−
i= A i 󶀱 ⋅ P 󶀡A n | ∩ n−
i= Ai 󶀱
= P 󶀡 ∩n−
i= A i 󶀱 ⋅ P 󶀡A n− | ∩ n−
i= A i 󶀱 ⋅ P 󶀡A n | ∩ n−
i= Ai 󶀱
..
.
n−
= P(A ) ⋅ P(A | A ) ⋅ P(A | A ∩ A ) ⋅ ⋅ ⋅ P 󶀡An | ∩i= Ai 󶀱
n
P󶀡 ∩ni= Ai 󶀱 = 󵠉 P(Ai | A , A , . . . , Ai− ), where A = 
i=

 / 
Sequential calculation of probabilities

. Construct a tree description of the sample space for the sequential experiment
. Assign conditional probabilities to the corresponding edges
. By the chain rule, the probability of an outcome is the product of the
conditional probabilities along the path from the root to the outcome’s leaf node
s P(A , A ) = P(A ) P(A | A )

s
P(A |A )

✁✁❅ P(A | A )
c

P(A ) ✁ ❅s P(A , Ac ) = P(A ) P(Ac | A )


s✁



P(Ac ) s P(Ac , A ) = P(Ac ) P(A |Ac )

❆❆s P(A |Ac )

❅s P(Ac , Ac ) = P(Ac ) P(Ac |Ac )
P(Ac |Ac )

 / 
Diagnosing a disease

∙ Let A be the event that a person has a certain disease and


B be the event that he tested positive for the disease. Assume that
P(A) = ., P(B | A) = ., and P(B | Ac ) = ., what is the probability of:
󳶳 False negative (misdiagnosis), i.e., P(A ∩ Bc )
󳶳 False positive (false diagnosis), i.e., P(Ac ∩ B)
∙ We use the sequential method:
s P(A, B) = P(A) P(B | A)

s
P(B|A) = .

✁❅
P(A) = . ✁ .❅s P(A, Bc ) = . × . = . (Misdiagnosis)

s✁


P(Ac ) = .
❆ s P(Ac , B) = . × . = . (False diagnosis)
❆ .
❆s

P(Bc |Ac ) = .
❅s P(Ac , Bc ) = P(Ac ) P(Bc |Ac )
 / 
Example

∙ Cards: Three cards are randomly drawn (without replacement) from


a -deck card. What is the probability of not drawing a heart?
∙ We can use counting to find this probability:

󳶳 The sample space for picking  cards has 󶀤 󶀴 choices

 − 
󳶳 The number of ways of picking  cards with no heart is: 󶀤 󶀴

  −   ×  × 
󳶳 The probability of no hearts is: 󶀤 󶀴󶄤󶀤 󶀴= = .
   ×  × 

 / 
Example

∙ We can also use the sequential method


∙ Let Ai be the event “not heart” at draw i = , , , and we want P(A ∩ A ∩ A )
We construct a (partial) tree with conditional probabilities on its edges
t   

t✟
P(A |A , A ) = /✟✟ P(A , A , A ) = P(A ) P(A | A ) P(A | A , A ) = × ×
  

P(A |A ) = / / ❍t


❍❍

t
✁✁❅
P(A ) = / ✁ /❅❅t

t✁



P(Ac ) = /❆

❆❆t

 / 
Recap
P(A ∩ B)
∙ Conditional probability of A given B (with P(B) ̸= ) is: P(A | B) =
P(B)
P(⋅ | B) is a probability measure
∙ Chain rule: For two events, P(A ∩ B) = P(A | B) P(B)
n
In general: P 󶀡 ∩ni= Ai 󶀱 = 󵠉 P(Ai | A , A , . . . , Ai− ), A = 
i=

∙ The sequential method for computing probabilities


∙ Up next:
󳶳 Computing probabilities via the law of total probability
󳶳 Bayes rule (basis for Bayesian statistical inference)

 / 
Law of total probability

∙ Let A , . . . , An be events that partition Ω, i.e., disjoint and ∪ni= Ai = Ω


n n
Then for any event B: P(B) = 󵠈 P(Ai ∩ B) = 󵠈 P(Ai ) P(B | Ai )
i= i=
Remark: This law holds also for n = ∞
∙ Chess tournament. Alice plays one of  classes of opponents:
󳶳 P( Class  ) = ., P( Alice wins | Class  ) = .
󳶳 P( Class  ) = ., P( Alice wins | Class  ) = .
󳶳 P( Class  ) = ., P( Alice wins | Class  ) = .

What is the probability that Alice wins the game?


∙ By the law of total probability
 
P( Alice wins ) = 󵠈 P( Class i , Alice wins ) = 󵠈 P( Class i ) P( Alice wins | Class i )
i= i=
= . × . + . × . + . × . = .

 / 
Binary communication link

∙ Consider a communication link in which a  or  is sent and a  or  is received


Assume that  is sent with probability . (and  with probability .)
The link is noisy: if a  is sent, a  is received with probability ., and
if a  is sent, a  is received with probability .
What is the probability the received bit is in error?
∙ We can represent the link model by a probability transition diagram
P( | ) = .
P() = .  
.

bit sent bit received

.
P() = .  
P( | ) = .

 / 
Binary communication link

∙ We can represent the link model by a probability transition diagram


P( | ) = .
P() = .  
.

bit sent bit received

.
P() = .  
P( | ) = .

∙ Define the events: A = { is sent}, B = { is received}, E = {received bit is in error}


∙ To find the probability of error, we use the law of total probability
P(E) = P(A) P(E | A) + P(Ac ) P(E | Ac )
= P(A) P(Bc | A) + P(Ac ) P(B | Ac )
= . × . + . × . = .

 / 
Bayes rule
∙ Let A , A , . . . , An be nonzero probability events (the causes) that partition Ω,
and let B be a nonzero probability event (the effect/observation)
∙ Given the a priori probabilities P(Ai ) and
the conditional probabilities P(B | Ai ), i = , . . . , n,
find the a posteriori probabilities P(Aj | B), j = , . . . , n
∙ From the definition of conditional probability,
P(B, Aj ) P(B | Aj )
P(Aj | B) = = P(Aj )
P(B) P(B)
By the law of total probability
Thomas Bayes (-)
n
P(B) = 󵠈 P(Ai ) P(B | Ai )
i=

Substituting we obtain Bayes rule:


P(B | Aj )
P(Aj | B) = P(Aj ) for j = , , . . . , n
∑ni= P(Ai ) P(B | Ai )

 / 
Disease diagnosis

∙ Recall that A is the event a person has the disease and B is the event that he tests
positive, where P(A) = ., P(B | A) = ., P(B | Ac ) = .
What is the probability that the person has the disease given he tested positive?
∙ We want to find P(A|B), so we use Bayes rule
P(B | A)
P(A|B) = c c
P(A)
P(A) P(B | A) + P(A ) P(B | A )
.
= × . = .
. × . + . × .
Versus a priori probability of ., so the test increased the probability a lot
But the test is as good as flipping a fair coin, even though it is % accurate!
What test accuracy would we needed to achieve % correct diagnosis?

 / 
Disease diagnosis

∙ Recall that A is the event a person has the disease and B is the event that he tests
positive, where P(A) = ., P(B | A) = ., P(B | Ac ) = .
What is the probability that the person has the disease given he tested negative?
∙ We want to find P(A|Bc ), so we again use Bayes rule
c P(Bc | A)
P(A|B ) = P(A)
P(A) P(Bc | A) + P(Ac ) P(Bc | Ac )
.
= × . = .
. × . + . × .
Versus a priori probability of ., so the test decreased the probability a lot

 / 
Independence

∙ Sometimes knowing that B occurred doesn’t change the probability of A, i.e.,


P(A | B) = P(A)

In this case we say that the two events are statistically independent
∙ Examples:
󳶳 Flip a fair coin and get a H, the probability of a H in the next flip is still /
󳶳 You are told Alice is born on /, the probability Bob is born on / is still /

∙ From the definition of conditional probability,


P(A, B)
P(A | B) =
P(B)
Hence, an equivalent definition of independence is that

P(A, B) = P(A) P(B)

It also follows that: P(B | A) = P(B)


 / 
Independence

∙ Suppose A and B are independent. i.e., P(A, B) = P(A) P(B),


are Ac and Bc also independent?
∙ Intuitively, the answer should be YES
∙ Let’s verify
P(Ac , Bc ) = P((A ∪ B)c ) De Morgan’s law
=  − P(A ∪ B)
=  − (P(A) + P(B) − P(A, B)) inclusion-exclusion
=  − (P(A) + P(B) − P(A) P(B)) independence
= ( − P(A)) ( − P(B)) = P(Ac ) P(Bc ),

hence Ac and Bc are indeed independent


∙ Two experiments are said to be independent if for every event A from the first
experiment and B from the second experiment, P(A, B) = P(A) P(B)

 / 
Independence for more than two events

∙ In general A , . . . , An are mutually independent if for every subset S ⊆ {, . . . , n},


P(∩i∈S Ai ) = 󵠉 P(Ai )
i∈S

∙ For example, for  events A , A , A :


P(A , A ) = P(A ) P(A ),
P(A , A ) = P(A ) P(A ),
P(A , A ) = P(A ) P(A ), and
P(A , A , A ) = P(A ) P(A ) P(A )

 / 
Independence for more than two events
n
∙ Note that P(A , . . . , An ) = 󵠉 P(Ai ) alone is not sufficient for independence
i=

∙ Example: Roll two fair dice independently. Define the events


A = {st die = , , or }, B = {st die = , , or }, C = {Sum of outcomes = }

Are A, B, and C independent?



∙ Since the dice are fair P(A) = P(B) =

The event C = {(, ), (, ), (, ), (, )}. Since the experiments are independent,
 
P(C) = =
 

Since A ∩ B ∩ C = {(, )}, P(A, B, C) = = P(A) P(B) P(C)

  
But P(A, B) = = ≠ = P(A) P(B), thus A, B, and C are not independent!
  
 / 
Independence for more than two events

∙ Also, independence of subsets does not necessarily imply independence


∙ Example: Flip a fair coin twice independently. Define the events:
󳶳 A: First toss is H
󳶳 B: Second toss is H
󳶳 C: First and second tosses have different outcomes

Are A, B, C independent?
∙ Clearly, A, B are independent, A, C are independent, and B, C are independent

But, P(A, B, C) =  ̸= P(A) P(B) P(C) = ,

since if you know A and B, you know that C could not have occurred

 / 
Coupon collector problem
∙ A food company includes a baseball player card in each cereal box it sells.
Assume k players and each box contains a randomly and independently chosen
card of one of them (with replacement). A fan buys n cereal boxes,
what is the probability that she gets Babe Ruth’s card?
󳶳 Let Ei be the event that she gets Ruth’s card in cereal box i ∈ {, , . . . , n}
We want to find P 󶀢 ∪ni= Ei 󶀲

󳶳 Since the card selection is random and with replacement, P(Ei ) =
k
󳶳 This is not enough information to find the probability of the union
󳶳 Let’s try to find the probability of the complementary event P 󶀢 ∩ni= Eic 󶀲 (De Morgan’s)
󳶳 Since the Ei s are independent, so are the Eic s, hence
n n
 n
P󶀢 ∩ni= Eic 󶀲 = 󵠉 P(Eic ) = 󵠉( − P(Ei )) = 󶀢 − 󶀲
i= i= k

 n
󳶳 Hence, P 󶀢 ∪ni= Ei 󶀲 =  − P 󶀢 ∩ni= Eic 󶀲 =  − 󶀢 − 󶀲
k
For example if k =  and we want P 󶀢 ∪ni= Ei 󶀲 = ., then n ≈ 
 / 
Summary

∙ Conditional probability captures prior information about the event probability


∙ Chain rule provides two important methods for computing probability:
󳶳 Sequential (tree) method
󳶳 Law of total probability method

∙ Bayes rule, basis of statistical inference


∙ Independence of events

 / 

You might also like