Chapter 4: Probability Distributions: 4.1 Random Variables
Chapter 4: Probability Distributions: 4.1 Random Variables
Chapter 4: Probability Distributions: 4.1 Random Variables
An event can be associated with a single value of the random variable, or it can be
associated with a range of values of the random variable.
𝑃 𝐴 = 𝑃(𝑋 = 𝑥𝑖 ) or 𝑃 𝐴 = 𝑃(𝑥𝑙 ≤ 𝑋 ≤ 𝑥𝑢 )
There could also be other topology for the random variable to describe the event.
If 𝑥𝑖 , 𝑖 = 1,2, ⋯ , 𝑁 are all the possible values of random variable associated with
the sample space, then
𝑃(𝑋 = 𝑥𝑖 ) = 1
𝑖=1
e.g. Each (composite) outcome consists of 3 ratings
(M,P,C). Let 𝑀1 , 𝑃1 and 𝐶1 be preferred ratings. Let
X be the function that assigns to each outcome the
probabilities x
number of preferred ratings each outcome 𝐶1 0.03 3
possesses. 𝑃1
Since each outcome has a probability, we
𝐶2 0.06 2
can compute the probability of getting each
value x = 0,1,2,3 of the function X
𝐶3 0.07 2
𝑀1
x | P(X = x)
3 | 0.03 𝐶1 0.02 2
2 | 0.29
1 | 0.50 𝐶2 0.01
𝑃2 1
0 | 0.18
𝐶3 0.01 1
𝐶10.09 2
𝑃1
𝐶2 0.16 1
𝑀2
𝐶3 0.01 1
… …
Random variables X can be classified by the number of values x they can assume.
The two common types are
discrete random variables with a finite or countably infinite number of values
continuous random variables having a continuum of values for x
This is the problem of identifying the probability distribution for a random variable.
The probability distribution of a discrete random variable X can be listed as a table of the
possible values x together with the probability P(X = x) for each
e.g. 𝑥1 | 𝑃(𝑋 = 𝑥1 )
𝑥2 | 𝑃(𝑋 = 𝑥2 )
𝑥3 | 𝑃(𝑋 = 𝑥3 )
…
It is standard notation to refer to the values P(X = x) of the probability distribution by f(x)
f(x) ≡ P(X = x)
The probability distribution always satisfies the conditions
𝑓 𝑥 ≥ 0 and 𝑎𝑙𝑙 𝑥 𝑓 𝑥 = 1
𝑥−2
e.g. 𝑓 𝑥 = 2
for x = 1,2,3,4
𝑥2
e.g. 𝑓 𝑥 = for x = 0,1,2,3,4
25
Since the probability distribution for a discrete random variable is a tabular list, it can
also be represented as a histogram, the probability histogram.
For a discrete random variable, the height for the bin value x is f(x), the width of the
bin is meaningless. For a discrete random variable, the probability histogram is
commonly drawn either with touching bins (left) or in Pareto style (right - also referred
to as a bar chart).
and plot it in the ways learned in chapter 2 (with consideration that the x-axis is not
continuous but discrete).
F(x) for number preferred ratings
In probability theory and statistics, the Bernoulli distribution, named after Swiss scientist
Jacob Bernoulli, is a discrete probability distribution, which takes value 1 with success
probability 𝑝 and value 0 with failure probability 𝑞 = 1 − 𝑝 . So if X is a random variable
with this distribution, we have:
𝑃 𝑋 = 1 = 𝑝; 𝑝 𝑋 = 0 = 𝑞 = 1 − 𝑝.
We can refer to the ordered sequence of length n as a series of n repeated trials, where
each trial produces a result that is either “success” or “failure”. We are interested in the
random variable that reports the number x successes in n trials.
If the someone caught not wearing a seatbelt began to warn oncoming cars approaching
the roadblock, then P(𝐴𝑖 ∩ 𝐴𝑗 ) ≠ P(𝐴𝑖 ) · P(𝐴𝑗 ) for all i , j pairs and we would also not be
dealing with Bernoulli trials.
Note that in our definition of Bernoulli trials the number of trials n is fixed in advance
All Bernoulli trials of length n have the same probability distribution!!!!
(a consequence of the assumptions behind the definition of Bernoulli trials)
Probability Distribution
x 0 1 2 3
f(x) 1/8 3/8 3/8 1/8
3 0 3 3 1 2 3 2 1 3 3 0
½ 1−½ ½ 1−½ ½ 1−½ ½ 1−½
0 1 2 3
From this example, we see that the binomial probability distribution, which governs
Bernoulli trials of length n is:
𝑛 𝑥 𝑛−𝑥
𝑓(𝑥) ≡ 𝑏 𝑥; 𝑛, 𝑝 = 𝑝 1−𝑝 (BPD)
𝑥
Note: 1. The term on the RHS of (BPD) is the x’th term of the binomial expansion of
𝑝 + (1 − 𝑝) 𝑛
𝑛 𝑥
i.e. 𝑝 + (1 − 𝑝) 𝑛 = 𝑛𝑥=0 𝑝 (1 − 𝑝)𝑛−𝑥
𝑥
which also proves that
𝑛
𝑛 𝑥
𝑝 (1 − 𝑝)𝑛−𝑥 = 1𝑛 = 1
𝑥
𝑥=0
a) “s” = “at least 1/3” (i.e. 1/3 or greater) “f” = “less than 1/3”
P(Ai) = p = 0.6
Assume c) of Bernoulli trial assumptions holds.
5
Then f(4) = b(4; 5, 0.6) = 0.64 0.41
4
5 5
b) We want f(4) + f(5) = b(4; 5, 0.6) + b(5; 5, 0.6) = 0.64 0.41 + 0.65 0.40
4 5
Examples of binomial distribution
Cumulative binomial probability distribution
𝒙
𝑩 𝒙; 𝒏, 𝒑 ≡ 𝒃 𝒌; 𝒏, 𝒑 (𝐂𝐁𝐏𝐃)
𝒌=𝟎
𝒃 𝒙; 𝒏, 𝒑 = 𝑩 𝒙; 𝒏, 𝒑 − 𝑩(𝒙 − 𝟏; 𝒏, 𝒑)
1.0 − B(4; 20, 0.10) = 0.0432 is the probability of seeing 5 or more hard drives requiring
repair in 12 months.
This says that in only 4% of all year-long periods (i.e. in roughly 1 year out of 25) should
one see 5 or more hard drives needing repair. The fact that we saw this happen in the
very first year makes us suspicious of the manufacturers claim (but does NOT prove
that manufacturers claim is wrong !!!!!!!)
Shape of binomial probability histograms
e.g. b(x; 5, p)
b(x; n, p) will always be positively skewed for p < 0.5 (Tail on positive side)
will always be negatively skewed for p > 0.5 (Tail on negative side)
4.3 Hypergeometric probability distribution
In Bernoulli trials, one can get “s” with probability p and “f” with probability 1−p in every
trial (i.e. Bernoulli trials can be thought of as “sample with replacement”)
Consider a variation of the problem, in which there are total of only a outcomes available
that are successes (have RV values = “s”) and N − a outcomes that are failures. (e.g. there
are N radios, a of them are defective and N − a of them work.)
We want to run n trials, (e.g. in each trial we pick a radio), but outcomes are sampled
without replacement (that is, once a radio is picked, it is no longer available to be picked
again).
As we run each trial, we assume that whatever outcomes are left, whether having RV value
“s” or “f”, have the same chance of being selected in the next trial (i.e. we are assuming
classical probability – where the chance of being picking a particular value of a RV is in
proportion to the number of outcomes that have that RV value).
Thus, for x ≤ a, the probability of getting x successes in n trials if there will be a successes
in N trials is
Therefore
𝑛𝐶𝑥 𝑎𝑃𝑥 𝑁−𝑎𝑃𝑛−𝑥
𝑓(𝑥) =
𝑁𝑃𝑛
i.e.
𝑛! 𝑎! 𝑁−𝑎 !
𝑛 − 𝑥 ! 𝑥! 𝑎−𝑥 ! 𝑁−𝑎 − 𝑛−𝑥 !
𝑓 𝑥 =
𝑁!
𝑁−𝑛 !
𝑎! 𝑁−𝑎 !
𝑎 − 𝑥 ! 𝑥! 𝑁−𝑎 − 𝑛−𝑥 ! 𝑛−𝑥 !
=
𝑁!
𝑁 − 𝑛 ! 𝑛!
𝑎 𝑁−𝑎
𝑥 𝑛−𝑥
= ,
𝑁
𝑛
𝑎 𝑁−𝑎
𝑥 𝑛−𝑥 ,
ℎ 𝑥; 𝑛, 𝑎, 𝑁 = 𝑥 = 0, 1,2, … , 𝑎; 𝑛 ≤𝑁
𝑁
𝑛
e.g. PC has 20 identical car chargers, 5 are defective. PC will randomly ship 10. What is
the probability that 2 of those shipped will be defective?
5 15 5! 15!
ℎ 2; 10,5,20 = 2 8 = 3! 2! 7! 8! = 5! 15! 10! 10! = 5! 15! 10! 10!
20 20! 3! 2! 7! 8! 20! 3! 2! 20! 7! 8!
10 10! 10!
54 1 10 9 8 10 9 5 4 10 9 8 10 9 5 4 1 1 1 10 9
= = =
2 20 19 18 17 16 2 20 18 16 19 17 2 2 2 2 19 17
5 5 9
= = 0.348
2 19 17
e.g. redo using 100 car chargers and 25 defective
25 75
ℎ 2; 10,25,100 = 2 8 = 0.292
100
10
e.g. approximate this using the binomial distribution
10
b 2; 10, 𝑝 ≈ 25/100 = 0.252 0.758 = 0.282
2
The hypergeometric distribution ℎ 𝑥; 𝑛, 𝑎, 𝑁 approaches the binomial distribution
𝑎
𝑏(𝑥; 𝑛, 𝑝 = 𝑁) in the limit 𝑁 → ∞
𝑛 𝑛
𝑖=1 𝑥𝑖 1
𝑥= = 𝑥𝑖 ∙
𝑛 𝑛
𝑖=1
1
We can view each term in the RHS as 𝑥𝑖 ∙ 𝑓(𝑥𝑖 ) where 𝑓 𝑥𝑖 = 𝑛 is the probability
associated with each value (each value appears once in the list, and each is equally likely)
μ= 𝑥 ∙ 𝑓(𝑥)
all 𝑥
e.g. Mean value for the probability distribution of the number of heads obtained in 3
flips of a coin.
There are 23 = 8 outcomes. The RV “number of heads in 3 flips” has 4 possible values, 0
1, 2, and 3 heads having probabilities f(0) = 1/8; f(1) = 3/8; f(2) = 3/8; f(3) = 1/8.
Therefore the mean value is
1 3 3 1 3
μ=0∙ +1∙ +2∙ +3∙ =
8 8 8 8 2
e.g. PC has 20 identical car charges, 5 are defective. PC will randomly ship 10. On
average (over many trials of shipping 10), how many defective car chargers will be
included in the order.
We want the mean of ℎ(𝑥; 10,5,20). The mean value is μ = 10 · 5/20 = 2.5
𝑛
Recall from chapter 2, that the sum of the sample deviations 𝑖=1 𝑥𝑖 − 𝑥 = 0
Therefore, in analogy to the sample variance defined in Chapter 2, we define the variance of
the probability distribution f(x) as
𝜎 2 = 𝑎𝑙𝑙 𝑥 𝑥 − 𝜇 2 ∙ 𝑓(𝑥)
𝜎= 𝜎2 = 𝑥−𝜇 2 ∙ 𝑓(𝑥)
𝑎𝑙𝑙 𝑥
The variance for the binomial distribution 𝑏(𝑥; 𝑛, 𝑝)
𝝈𝟐 = 𝒏 ∙ 𝒑 ∙ 𝟏 − 𝒑 = 𝜇 ∙ (1 − 𝑝)
→1 as N →∞
e.g. The standard deviation for the number of defective car chargers in shipments of 10 is
5 5 20 − 10 75
𝜎= 10 1− = = 0.99
20 20 20 − 1 76
The moments of a probability distribution
The k’th moment about the origin (usually just called the k’th moment) of a probability
distribution is defined as
𝜇𝑘′ = 𝑥 𝑘 ∙ 𝑓(𝑥)
𝑎𝑙𝑙 𝑥
Note: the mean of a probability distribution is the 1’st moment (about the origin)
𝜇𝑘 = (𝑥 − 𝜇)𝑘 ∙ 𝑓(𝑥)
𝑎𝑙𝑙 𝑥
Notes:
the 1’st moment about the mean, 𝜇1 = 0
the 2’nd moment about the mean 𝜇2 is the variance
the 3’rd moment about the mean 𝜇3 /𝜎 3 is the skewness (describes the symmetry)
the 4’th moment about the mean 𝜇3 /𝜎 4 is the kurtosis (describes the “peakedness”)
Note:
𝜎2 = 𝑥−𝜇 2 ∙ 𝑓(𝑥) = (𝑥 2 −2𝑥𝜇 + 𝜇2 ) 𝑓(𝑥)
𝑎𝑙𝑙 𝑥 𝑎𝑙𝑙 𝑥
= 𝑥 2 𝑓 𝑥 − 2𝜇 𝑥 𝑓 𝑥 + 𝜇2 𝑓 𝑥 = 𝜇2′ − 2𝜇2 + 𝜇2
𝑎𝑙𝑙 𝑥 𝑎𝑙𝑙 𝑥 𝑎𝑙𝑙 𝑥
e.g. Consider the R.V. which is the number of points obtained on a single roll of a die.
The R.V. has values 1,2,3,4,5,6. What is the variance of the probability distribution behind
this RV?
The probability distribution is f(x) = 1/6 for each x.
Therefore the mean is
1 1 1 1 1 1 6∙7 7
𝜇 =1∙ +2∙ +3∙ +4∙ +5∙ +6∙ = =
6 6 6 6 6 6 2∙6 2
The second moment about the origin is
′ 2
1 2
1 2
1 2
1 2
1 2
1 91
𝜇2 = 1 ∙ + 2 ∙ + 3 ∙ + 4 ∙ + 5 ∙ + 6 ∙ =
6 6 6 6 6 6 6
91 49 35
Therefore 𝜎 2 = 6 − 4 = 12
4.5 Chebyshev’s Theorem
Theorem 4.1
If a probability distribution has mean μ and standard deviation σ,
1
then the probability of getting a value that deviates from μ by at least k σ is a most 𝑘 2
1
i.e. the probability P(x) for getting a result x such that |x ─μ| ≥ k σ satisfies 𝑃 𝑥 ≤ 𝑘 2
Chebyshev’s theorem quantifies the statement that the probability of getting a result x
decreases as x moves further away from μ
Chebyshev’s theorem holds for all probability distributions, but it works better for some
than for others (gives a “sharper” estimate).
4.6 Poisson distribution
𝑛 𝑥
𝑏 𝑥; 𝑛, 𝑝 = 𝑝 (1 − 𝑝)𝑛−𝑥
𝑥
λ𝑥 𝑒 −λ
𝑓 𝑥; λ = for 𝑥 = 0, 1, 2, 3, …
𝑥!
As derived, the Poisson distribution describes the probability distribution for an infinite (in
practice very large) number of Bernoulli trials when the probability of success in each trial
is vanishingly small (in practice – very small).
As the Poisson distribution describes probabilities for a sample space in which each
outcome is countably infinite in length, we have to technically modify the third Axiom
(property) that probabilities must obey to include such sample spaces. The third axiom
stated that the probability function is an additive set function. The appropriate
modification is
Axiom 3’ If 𝐴1 , 𝐴2 , 𝐴3 , ⋯ is a countably infinite sequence of mutually exclusive events in S,
then
𝑃 𝐴1 U𝐴2 ∪ 𝐴3 ∪ ⋯ = 𝑃 𝐴1 + 𝑃 𝐴2 + 𝑃 𝐴3 + ⋯
Note that the Poisson distribution satisfies 𝑎𝑙𝑙 𝑥 𝑓(𝑥; λ) = 1 Taylors series
Proof: expansion of 𝑒 λ
∞ 𝑥 −λ ∞ 𝑥
λ 𝑒 λ
= 𝑒 −λ = 𝑒 −λ 𝑒 λ = 1
𝑥! 𝑥!
𝑥=0 𝑥=0
𝑥
The cumulative Poisson distribution 𝐹 𝑥; λ = 𝑘=0 𝑓(𝑘; λ) is tabluated for select
values of x and λ in Appendix B (Table 2)
e.g. 5% of bound books have defective bindings. What is the probability that 2 out of 100
books will have defective bindings using (a) the binomial distribution, (b) the Poisson
distribution as an approximation
100
(a) b(2;100,0.05) = 0.052 0.9598 = 0.081
2
52 𝑒 −5
(b) λ = 0.05 ∙ 100 = 5. f 2; 5 = = 0.084
2!
e.g. There are 3,840 generators. The probability is 1/1,200 that any one will fail in a year.
What is the probability of finding 0, 1, 2, 3, 4, … failures in any given year
λ = 3840 /1200 = 3.2. We want the probabilities f(0; 3.2), f(1; 3.2), f(2; 3.2) etc.
Using the property 𝑓 𝑥; λ = 𝐹 𝑥; λ − 𝐹 𝑥 − 1; λ we can compute these probabilities
from Table 2 Appendix B
x 0 1 2 3 4 5 6 7 8
𝑓 𝑥; 3.2 0.041 0.130 0.209 0.223 0.178 0.114 0.060 0.028 0.011
The mean value for the Poisson probability distribution is 𝝁 = 𝝀
∞ ∞
λ𝑥 𝑒 −λ λ𝑥−1
Proof for mean: 𝜇= 𝑥 = λ𝑒 −λ
𝑥! (𝑥 − 1)!
𝑥=0 𝑥=1
Let 𝑦 = 𝑥 − 1
∞
λ𝑦
𝜇=λ 𝑒 −λ = λ 𝑒 −λ 𝑒 λ = λ
𝑦!
𝑦=0
The average λ is usually approximated by running many long (but finite) trials.
e.g. An average of 1.3 gamma rays per millisec is recorded coming from a radioactive
substance. Assuming the RV “number of gamma rays per millisec” has a probability
distribution that is Poisson (aka, is a Poisson process), what is the probability of seeing 1
or more gamma rays in the next millisec
1.30 𝑒 −1.3
λ = 1.3. Want 𝑃 𝑋 ≥ 1 = 1.0 − 𝑃 𝑋 = 0 = 1.0 − 0!
= 1.0 − 𝑒 −1.3 = 0.727
4.7 Poisson Processes
(a) 𝛼 = 6. λ= 6∙1
64 𝑒 −6
Therefore 𝑓(4; 6) = 4!
= 0.134
(b) 𝛼 = 6. λ = 6 ∙ 2 = 12
1210 𝑒 −12
Therefore 𝑓 10; 12 = = 𝐹 10; 12 − 𝐹 9; 12 = 0.134
10!
e.g. a process generates 0.2 imperfections per minute. Find probabilities of
(a) 1 imperfection in 3 minutes
(b) at least 2 imperfections in 5 minutes
(c) at most 1 imperfection in 15 minutes
(a) λ = 0.2 ∙ 3 = 0.6. Want 𝑓 1; 0.6 = 𝐹 1; 0.6 − 𝐹(0; 0.6)
Consider the sample space of outcomes for countably infinite Bernoulli trials
(i.e. the three Bernoulli assumptions hold)
In particular “s” occurs with probability p and “f” with probability 1-p
We want to know the probability that the first success occurs on the x’th trial.
𝐴2 𝑃(𝐴2 ) = 𝑝 1 − 𝑝
𝐴1
2
𝐴4 3
𝑃(𝐴4 ) = 𝑝 1 − 𝑝
𝑃(𝐴3 ) = 𝑝 1 − 𝑝
𝑃(𝐴1 ) = 𝑝
𝐴3 𝐴6 5
𝑃(𝐴6 ) = 𝑝 1 − 𝑝
4
𝐴5
𝑃(𝐴5 ) = 𝑝 1 − 𝑝
𝐴7 …
Since the sum of the probabilities of all outcomes must =1, from the diagram we see that
2 3
𝑃 𝐴1 + 𝑃 𝐴2 + 𝑃 𝐴3 + 𝑃 𝐴4 + ⋯ = 𝑝 + 𝑝 1 − 𝑝 + 𝑝 1 − 𝑝 +𝑝 1−𝑝 +⋯
∞
= 𝑝(1 − 𝑝)𝑥−1 = 1
𝑥=1
Let the sample space consist of outcomes each of which consists of infinitely countable
Bernoulli trials. Let p be the probability of success in each Bernoulli trial. Then the
geometric probability distribution
𝑔 𝑥; 𝑝 = 𝑝(1 − 𝑝)𝑥−1 , 𝑥 = 1, 2, 3, 4, …
describes the probability that the first success occurs on the x’th trial.
To have exactly r successes in x trials, the r’th success has to occur on trial x, and the
previous 𝑟 − 1 successes have to occur in the previous 𝑥 − 1 trials.
Therefore the probability that the r’th success occurs on the x’th trial must be
f(𝑥) = (probability of 𝑟 − 1 successes in 𝑥 − 1 trials) x (probability of “s” on trial x)
= 𝑏 𝑟 − 1; 𝑥 − 1, 𝑝 ∙ 𝑝
𝑥 − 1 𝑟−1 𝑥−1 𝑟
f(𝑥) = 𝑝 (1 − 𝑝)𝑥−𝑟 ∙ 𝑝 = 𝑝 (1 − 𝑝)𝑥−𝑟
𝑟−1 𝑟−1
This is the negative binomial probability distribution
𝑥−1 𝑟
𝑓 𝑥 = 𝑝 1 − 𝑝 𝑥−𝑟 for 𝑥 = 𝑟, 𝑟 + 1, 𝑟 + 2, …
𝑟−1
𝑛 𝑛
As = , the negative binomial probability distribution can also be written
𝑘 𝑛−𝑘
𝑥−1 𝑟
𝑓 𝑥 = 𝑝 1 − 𝑝 𝑥−𝑟
𝑥−𝑟
𝑥−1 −𝑥
It can be shown that = −1𝑥−𝑟 explaining the name “negative” binomial
𝑥−𝑟 𝑥−𝑟
distribution
Recap:
Sample space: outcomes are Bernoulli trials of fixed length n. Probability of “s” is p.
Probability of getting x outcomes in the n trials is given by the binomial distribution
𝑏 𝑥; 𝑛, 𝑝 , 𝑥 = 0,1, 2, 3, … , 𝑛
Sample space: outcomes are Bernoulli trials of countably infinite length. Probability of “s” is p.
Probability of getting the first success on the x’th trial is given by the geometric
distribution
𝑔 𝑥; 𝑝 , 𝑥 = 1, 2, 3, 4, … .
Sample space: Time recordings of a random process occurring over a continuous time interval
T. The random process produces only “s” or “f”.
Let 𝛼 denote the average number of “s” produced per unit time. Further assume
1. probability of “s” during small time interval ∆𝑡 is α∆𝑡
2. probability of more than one ‘s” in ∆𝑡 is negligible
3. probability of “s” in a later ∆𝑡 is independent of what occurs earlier
Then: Probability of x successes during time interval T is given by the Poisson distribution
𝑓 𝑥; λ where λ = 𝛼𝑇
4.9 The Multinomial Distribution
We assume:
1) Each trial has k possible distinct outcomes, type 1, type 2, type 3, …., type k
2) Outcome type i occurs with probability 𝑝𝑖 for each trail, where 𝑘𝑖=1 𝑝𝑖 = 1
3) The outcomes for different trials are independent.
(i.e. we assume “multinomial Bernoulli” trials.
In the n trials, we want to know the probability 𝑓(𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑘 ) that there are
𝑥1 outcomes of type 1
𝑥2 outcomes of type 2
…
𝑥𝑘 outcomes of type k
𝑘
where 𝑖=1 𝑥𝑖 =𝑛
For fixed values of 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑘 , there are
𝑛 𝑛 − 𝑥1 𝑛 − 𝑥1 − 𝑥2 𝑛 − 𝑥1 − 𝑥2 − ⋯ − 𝑥𝑘−1
𝑥1 𝑥2 𝑥3 ⋯ 𝑥𝑘
𝑛!
=
𝑥1 ! 𝑥2 ! 𝑥3 ! ⋯ 𝑥𝑘 !
outcomes that have these k values.
(AMS 301 students will recognize this as 𝑃(𝑛; 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑘 ), the number of ways to
arrange n objects, when there are 𝑥1 of type 1, 𝑥2 of type 2, … , and 𝑥𝑘 of type k )
8! 2 (0.5)5 (0.2)1 =
We want 𝑓 2,5,1 = (0.3) 0.0945
2! 5! 1!
4.10 Generating discrete random variables that obey different probability distributions
outcomes
Consider the RV “number of heads in 3 tosses of the dice”
The probability distribution for this RV is
x 0 1 2 3
f(x) 1/8=0.125 3/8=0.375 3/8=0.375 1/8=0.125
F(x) 0.125 0.500 0.875 1.000
𝐹(2)
𝐹(1)
𝐹(0)
𝑥1 = 0 𝑥2 = 1 𝑥3 = 2 𝑥4 = 3
i.e. all the outcomes 0 – 124 are assigned the RV 0
all the outcomes 125 – 499 are assigned the RV 1
all the outcomes 500 – 874 are assigned the RV 2
all the outcomes 875 – 999 are assigned the RV 3
Table 7 in Appendix B presents a long list of the integers 0, …, 9 generated with equal-
likelihood. One can use the table to randomly generate lists of 1-digit, 2-digit, 3-digit,
etc. outcomes (by taking non-overlapping combinations and starting in different places)
e.g. RV = number cars arriving at a toll booth per minute
x 0 1 2 3 4 5 6 7 8 9
f(x) 0.082 0.205 0.256 0.214 0.134 0.067 0.028 0.010 0.003 0.001
F(x) 0.082 0.287 0.543 0.757 0.891 0.958 0.986 0.996 0.999 1.000
𝐹(4)
𝐹(3)
𝐹(2)
𝐹(1)
𝐹(0)
0 1 2 3 4 …
Classical probability versus frequentist probability
Recall: classical probability counts outcomes and assumes all outcomes occur with equal
likelihood. Frequentist probability measures the frequency of occurrence of outcomes
from past “experiments”.
Frequentist probability:
distinct dice: The (unordered) outcome 1,2 has measured probability 2/36 in
agreement with classic probability
identical dice: The (unordered) outcome 1,2 has measured probability 2/36 (!!)
in disagreement with classic probability
For identical dice, the classic view of probability for throwing two identical dice assumes
all 21 outcomes occur with equal probability. This is not what occurs in practice. in
practice, each of the (unordered) outcomes i, j where i ≠ j occurs more frequently than the
outcomes i, i.
“Why” is the frequentist approach correct. Clearly the frequency of getting unordered
outcomes cannot depend on the color of dice being thrown (i.e. the color of the dice cannot
affect frequency of occurrence). Thus two identical dice must generate outcomes with the
same frequency as two differently-colored dice.
Note: That is not to say that the classic probability view is completely wrong.
The classic view correctly counts the number of different outcomes in each case (
identical and different dice). However it computes probability incorrectly for the identical
case.
The frequentist view concentrates on assigning probabilities to each outcome. In
the frequentist view, the number of outcomes for two identical dice is still 21, but the
probabilities assigned to i,i and i,j outcomes are different.