MT131 Tutorial - 5 Discrete Probability 2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

7.

Discrete Probability

01/31/2022 1
Introduction to Discrete Probability

Probability Theory

Bayes’ Theorem

01/31/2022 2
Summary
 Introduction to Discrete Probability
 Probabilities of Complements and Unions of Events
 Probabilistic Reasoning
 Assigning Probabilities
 Probabilities of Complements and Unions of Events
 Conditional Probability
 Independence
 Bernoulli Trials and the Binomial Distribution
 Bayes’ Theorem
 Generalized Bayes’ Theorem
 Bayesian Spam Filters
01/31/2022 3
Introduction to Discrete Probability
 An experiment is a procedure that yields one of a given set of possible
outcomes.
 The sample space of the experiment is the set of possible outcomes.
 An event is a subset of the sample space.

Consider the experiment of rolling one die, the sample space is


S1 = {1, 2, 3, 4, 5, 6}.
• E1 = {1}; Event (simple) of occurring 1
• E2 = {2, 4, 6}; Event (compound) of occurring even number

If we are interested only in whether the number is even or odd, the sample
space is simply
01/31/2022
S2 = {even, odd}. 4
Pierre-Simon Laplace
(1749-1827)

Introduction to Discrete Probability


Definition: If S is a finite sample space of equally likely outcomes, and E
is an event, that is, a subset of S, then the probability of E is p(E) = |
E|/|S|.
 For every event E, we have 0 ≤ p(E) ≤ 1. This follows directly from the
definition because 0 ≤ p(E) = |E|/|S| ≤ |S|/|S| ≤ 1, since 0 ≤ |E| ≤ |S|.

Example: A coin is tossed twice. What is the probability that at least 1


head occurs?
Solution: The sample space for this experiment is
S = {HH, HT, TH, TT}.
If E represents the event of at least 1 head occurring, then
E = {HH ,HT, TH} and
01/31/2022 P(E) = (1/4) + (1/4) + (1/4) = 3/4. 5
Applying Laplace’s Definition
Example: An urn contains 4 blue balls and 5 red balls. What
is the probability that a ball chosen from the urn is blue?
Solution: The probability that the ball is chosen is 4/9 since
there are nine possible outcomes, and four of these produce
a blue ball.

Example: What is the probability that when two dice are


rolled, the sum of the numbers on the two dice is 7?
Solution: By the product rule there are 62 = 36 possible
outcomes. Six of these sum to 7. Hence, the probability of
obtaining a 7 is 6/36 = 1/6.
01/31/2022 6
Applying Laplace’s Definition
Example: In a lottery, a player wins a large prize when they pick four digits that
match, in correct order, four digits selected by a random mechanical process.
What is the probability that a player wins the prize?
Solution: By the product rule there are 104 = 10,000 ways to pick four digits.
Since there is only 1 way to pick the correct digits, the probability of winning
the large prize is 1/10,000 = 0.0001.

Example: A smaller prize is won if only three digits are matched. What is the
probability that a player wins the small prize?
Solution: If exactly three digits are matched, one of the four digits must be
incorrect and the other three digits must be correct. For the digit that is
incorrect, there are 9 possible choices. Hence, by the sum rule, there a total of
36 possible ways to choose four digits that match exactly three of the winning
four digits. The probability of winning the small price is 36/10,000 = 9/2500 7=
01/31/2022
Applying Laplace’s Definition
Example: There are many lotteries that award prizes to
people who correctly choose a set of six numbers out of the
first n positive integers, where n is usually between 30 and
60. What is the probability that a person picks the correct
six numbers out of 40?
Solution: The number of ways to choose six numbers out of
40 is
C(40, 6) = 40!/(34!6!) = 3,838,380.
Hence, the probability of picking a winning combination is
1 / 3,838,380 ≈ 0.00000026.

01/31/2022 8
Applying Laplace’s Definition
Example: What is the probability that the numbers 11, 4, 17, 39, and
23 are drawn in that order from a bin with 50 balls labeled with the
numbers 1, 2, …, 50 if
a) The ball selected is not returned to the bin.
b) The ball selected is returned to the bin before the next ball is
selected.
Solution: Use the product rule in each case.
c) Sampling without replacement: The probability is
1/254,251,200 since there are 50 ∙ 49 ∙ 47 ∙ 46 ∙ 45 =
254,251,200 ways to choose the five balls.
d) Sampling with replacement: The probability is 1/505 =
1/312,500,000 since 505 = 312,500,000.
01/31/2022 9
The Probability of Complements and
Unions of Events
Theorem 1: Let E be an event in sample space S. The
probability of the event
E  S  E,
the complementary event of E, is given by

p( E )  1  p( E ).

Theorem 2: Let E1 and E2 be events in the sample space S.


Then
p( E1  E2 )  p( E1 )  p( E2 )  p( E1  E2 )

01/31/2022 10
The Probability of Complements and
Unions of Events
Example: A sequence of 10 bits is chosen randomly. What is
the probability that at least one of these bits is 0?
Solution: Let E be the event that at least one of the 10 bits is
0. Then is the event that all of the bits are 1s. The size of
the sample space S is 210. Hence,

01/31/2022 11
The Probability of Complements and
Unions of Events
Example: What is the probability that a positive integer
selected at random from the set of positive integers not
exceeding 100 is divisible by either 2 or 5?
Solution: Let E1 be the event that the integer is divisible by 2
and E2 be the event that it is divisible 5? Then the event
that the integer is divisible by 2 or 5 is E1  E2 and E1  E2 is
the event that it is divisible by 2 and 5.
It follows that:
p(E1  E2) = p(E1) + p(E2) – p(E1  E2)
= 50/100 + 20/100 − 10/100 = 3/5.

01/31/2022 12
Assigning Probabilities
Laplace’s definition assumes that all outcomes are equally likely. Now
we introduce a more general definition of probabilities that avoids
this restriction.
Let S be a sample space of an experiment with a finite number of
outcomes. We assign a probability p(s) to each outcome s, so that:
i. 0 ≤ p(s) ≤ 1 for each s Î S
ii.

The function p from the set of all outcomes of the sample space S is
called a probability distribution.

01/31/2022 13
Assigning Probabilities
Example: What probabilities should we assign to the
outcomes H (heads) and T (tails) when a fair coin is
flipped? What probabilities should be assigned to these
outcomes when the coin is biased so that heads comes up
twice as often as tails?
Solution: For a fair coin, we have p(H) = p(T) = 1/2.
For a biased coin, we have p(H) = 2p(T).
Because p(H) + p(T) = 1, it follows that
2p(T) + p(T) = 3p(T) = 1.
Hence, p(T) = 1/3 and p(H) = 2/3.

01/31/2022 14
Uniform Distribution
Definition: Suppose that S is a set with n elements. The
uniform distribution assigns the probability 1/n to each
element of S. (Note that we could have used Laplace’s
definition here.)

Example: Consider again the coin flipping example, but with


a fair coin. Now p(H) = p(T) = 1/2.

01/31/2022 15
Probability of an Event
Definition: The probability of the event E is the sum of the
probabilities of the outcomes in E.

Note that now no assumption is being made about the


distribution.

01/31/2022 16
Example
Example: Suppose that a die is biased so that 3 appears twice
as often as each other number, but that the other five
outcomes are equally likely. What is the probability that an
odd number appears when we roll this die?
Solution: We want the probability of the event E = {1, 3, 5}.
We have p(3) = 2/7 and
p(1) = p(2) = p(4) = p(5) = p(6) = 1/7.
Hence,
p(E) = p(1) + p(3) + p(5) = 1/7 + 2/7 + 1/7 = 4/7.

01/31/2022 17
Probabilities of Complements and
Unions of Events
Complements: still holds. Since each
outcome is in either E or E , but not both,

Unions:
also still holds under the new definition.

Theorem: If E1, E2, … is a sequence of pairwise disjoint events


in a sample space S, then

01/31/2022 18
Conditional Probability
Definition: Let E and F be events with p(F) > 0. The conditional
probability of E given F, denoted by P(E|F), is defined as:

Example: A bit string of length four is generated at random so that each


of the 16 bit strings of length 4 is equally likely. What is the probability
that it contains at least two consecutive 0s, given that its first bit is a 0?
Solution: Let E be the event that the bit string contains at least two
consecutive 0s, and F be the event that the first bit is a 0.
Since E  F = {0000, 0001, 0010, 0011, 0100}, p(E  F) = 5/16.
Because 8 bit strings of length 4 start with a 0, p(F) = 8/16 = 1/2.
Hence,

01/31/2022 19
Conditional Probability
Example: What is the conditional probability that a family
with two children has two boys, given that they have at
least one boy. Assume that each of the possibilities BB, BG,
GB, and GG is equally likely where B represents a boy and G
represents a girl.
Solution: Let E be the event that the family has two boys and
let F be the event that the family has at least one boy. Then
E = {BB}, F = {BB, BG, GB}, and E  F = {BB}.
It follows that p(F) = 3/4 and p(E  F) = 1/4.
Hence,

01/31/2022 20
Independence
Definition: The events E and F are independent if and only if
p( E  F )  p( E ) p( F )

Example: Suppose E is the event that a randomly generated bit string of


length four begins with a 1 and F is the event that this bit string
contains an even number of 1s. Are E and F independent if the 16 bit
strings of length four are equally likely?
Solution: There are eight bit strings of length four that begin with a 1,
and eight bit strings of length four that contain an even number of 1s.
Since the number of bit strings of length 4 is 16,
p(E) = p(F) = 8/16 = 1/2.
Since E  F = {1111, 1100, 1010, 1001}, p(E  F) = 4/16 = 1/4.
We conclude that E and F are independent, because
01/31/2022 21
Independence
Example: Assume (as in the previous example) that each of
the four ways a family can have two children (BB, GG, BG,
GB) is equally likely. Are the events E, that a family with
two children has two boys, and F, that a family with two
children has at least one boy, independent?
Solution: Because E = {BB}, p(E) = 1/4. We saw previously
that that p(F) = 3/4 and p(E  F) = 1/4. The events E and F
are not independent since
p(E) p(F) = 3/16 ≠ 1/4 = p(E  F) .

01/31/2022 22
Independence
Example: A small town has one fire engine and one
ambulance available for emergencies. The probability that
the fire engine is available when needed is 0.98, and the
probability that the ambulance is available when called is
0.92. In the event of an injury resulting from a burning
building, find the probability that both the ambulance and
the fire engine will be available, assuming they operate
independently.
Solution: Let A and B represent the respective events that
the fire engine and the ambulance are available. Then
p(A  B) = p(A) p(B) = (0.98)(0.92) = 0.9016.

01/31/2022 23
James Bernoulli
Bernoulli Trials (1854 – 1705)

Definition: Suppose an experiment can have only two


possible outcomes, e.g., the flipping of a coin or the
random generation of a bit.
o Each performance of the experiment is called a Bernoulli
trial.
o One outcome is called a success and the other a failure.
o If p is the probability of success and q the probability of
failure, then p + q = 1.
Many problems involve determining the probability of k
successes when an experiment consists of n mutually
independent Bernoulli trials.
01/31/2022 24
Bernoulli Trials
Example: A coin is biased so that the probability of heads is
2/3. What is the probability that exactly four heads occur
when the coin is flipped seven times?
Solution: There are 27 = 128 possible outcomes. The
number of ways four of the seven flips can be heads is C(7,
4). The probability of each of the outcomes is (2/3)4(1/3)3
since the seven flips are independent. Hence, the
probability that exactly four heads occur is
C(7,4) (2/3)4 (1/3)3 = (35 ∙ 16)/27 = 560/2187.

01/31/2022 25
Probability of k Successes in n
Independent Bernoulli Trials
Theorem 2: The probability of exactly k successes in n
independent Bernoulli trials, with probability of success p
and probability of failure q = 1 − p, is
C(n, k) pkqn−k.
We denote by b(k: n, p) the probability of k successes in n
independent Bernoulli trials with p the probability of
success. Viewed as a function of k, b(k: n, p) is the binomial
distribution. By Theorem 2,
b(k: n, p) = C(n, k) pkqn−k.

01/31/2022 26
The Famous Birthday Problem*
Example: The puzzle of finding the number of people needed in a
room to ensure that the probability of at least two of them having
the same birthday is more than ½ has a surprising answer, which
we now find.
Solution: We assume that all birthdays are equally likely and that
there are 366 days in the year. First, we find the probability pn that
at least two of n people have different birthdays.
Now, imagine the people entering the room one by one. The
probability that at least two have the same birthday is 1 − pn .

01/31/2022 27
The Famous Birthday Problem
(Continued)*

The probability that the birthday of the second person is different from that of the
first is 365/366.

The probability that the birthday of the third person is different from the other two,
when these have two different birthdays, is 364/366.

In general, the probability that the jth person has a birthday different from the
birthdays of those already in the room, assuming that these people all have different
birthdays, is (366 − (j − 1))/366 = (367 − j)/366.
Hence, pn = (365/366)(364/366) ∙∙∙ (367 − n)/366.

Therefore , 1 − pn = 1 − (365/366)(364/366) ∙∙∙ (367 − n)/366.

Checking various values for n with computation help tells us that for n = 22, 1− pn ≈
0.457, and for n = 23, 1 − pn ≈ 0.506. Consequently, a minimum number of 23 people
are needed so that that the probability that at least two of them have the same
birthday is greater than 1/2.

01/31/2022 28
Motivation for Bayes’ Theorem*
Bayes’ theorem allows us to use probability to answer
questions such as the following:
o Given that someone tests positive for having a particular
disease, what is the probability that they actually do
have the disease?
o Given that someone tests negative for the disease, what
is the probability, that in fact they do have the disease?
Bayes’ theorem has applications to medicine, law, artificial
intelligence, engineering, and many diverse other areas.

01/31/2022 29
Thomas Bayes

Bayes’ Theorem* (1702-1761)

Bayes’ Theorem: Suppose that E and F are events from a sample space S
such that p(E)≠ 0 and p(F) ≠ 0. Then:

Example: We have two boxes. The first box contains two green balls and
seven red balls. The second contains four green balls and three red
balls. Bob selects one of the boxes at random. Then he selects a ball
from that box at random. If he has a red ball, what is the probability
that he selected a ball from the first box.
o Let E be the event that Bob has chosen a red ball and F be the event that
Bob has chosen the first box.
o By Bayes’ theorem the probability that Bob has picked the first box is:

01/31/2022 30
Applying Bayes’ Theorem*
Example: Suppose that one person in 100,000 has a
particular disease. There is a test for the disease that gives a
positive result 99% of the time when given to someone
with the disease. When given to someone without the
disease, 99.5% of the time it gives a negative result. Find
a) The probability that a person who test positive has the
disease.
b) The probability that a person who test negative does
not have the disease.
Should someone who tests positive be worried?

01/31/2022 31
Applying Bayes’ Theorem (Continued)*
Solution: Let D be the event that the person has the disease,
and E be the event that this person tests positive. We need to
compute p(D|E) from p(D), p(E|D), p(E| ), p( ).

Can you use this formula


to explain why the
resulting probability is
surprisingly small?

So, don’t worry too much, if your test


for this disease comes back positive.
01/31/2022 32
Applying Bayes’ Theorem*
What if the result is negative?

So, the probability you


have the disease if you
test negative is

So, it is extremely unlikely you have the disease if you test


negative.
01/31/2022 33
Generalized Bayes’ Theorem*
Generalized Bayes’ Theorem: Suppose that E is an event
from a sample space S and that F1, F2, …, Fn are mutually
exclusive events such that

Assume that p(E) ≠ 0 for i = 1, 2, …, n. Then

01/31/2022 34
Bayesian Spam Filters*
How do we develop a tool for determining whether an email is
likely to be spam?
If we have an initial set B of spam messages and set G of non-
spam messages. We can use this information along with Bayes’
law to predict the probability that a new email message is spam.
 We look at a particular word w, and count the number of times
that it occurs in B and in G; nB(w) and nG(w).
o Estimated probability that a spam message contains w is: p(w) =
nB(w)/|B|
o Estimated probability that a message that is not spam contains w
is: q(w) = nG(w)/|G|

01/31/2022 35
Bayesian Spam Filters (Continued)*
Let S be the event that the message is spam, and E be the
event that the message contains the word w.
Using Bayes’ Rule,

Assuming that it is
equally likely that an Note: If we have data on the
arbitrary message is frequency of spam messages,
spam and is not we can obtain a better
spam; i.e., p(S) = ½. estimate for p(s).

Using our
empirical
estimates of
p(E |S) and r(w) estimates the probability that the
p(E |S). message is spam. We can class the message
01/31/2022
as spam if r(w) is above a threshold. 36
Bayesian Spam Filters*
Example: We find that the word “Rolex” occurs in 250 out of
2000 spam messages and occurs in 5 out of 1000 non-
spam messages. Estimate the probability that an incoming
message is spam. Suppose our threshold for rejecting the
email is 0.9.
Solution: p(Rolex) = 250/2000 = 0.0125 and q(Rolex) =
5/1000 = 0.005.

We class the message as spam and reject the email!

01/31/2022 37
Bayesian Spam Filters using Multiple Words*
Accuracy can be improved by considering more than one
word as evidence.
Consider the case where E1 and E2 denote the events that
the message contains the words w1 and w2 respectively.
We make the simplifying assumption that the events are
independent. And again we assume that p(S) = ½.

01/31/2022 38
Bayesian Spam Filters using Multiple Words*
Example: We have 2000 spam messages and 1000 non-spam
messages. The word “stock” occurs 400 times in the spam
messages and 60 times in the non-spam. The word
“undervalued” occurs in 200 spam messages and 25 non-spam.
Solution: p(stock) = 400/2000 = 0.2, q(stock) = 60/1000 = 0.06,
p(undervalued) = 200/2000 = 0.1, q(undervalued) = 25/1000 =
0.025

01/31/2022
If our threshold is 0.9, we class the message as spam and reject it. 39
Bayesian Spam Filters using Multiple Words*
In general, the more words we consider, the more accurate
the spam filter. With the independence assumption if we
consider k words:

We can further improve the filter by considering pairs of words


as a single block or certain types of strings.
01/31/2022 40

You might also like