Mit18 05 s22 Class05-Prep-C
Mit18 05 s22 Class05-Prep-C
Mit18 05 s22 Class05-Prep-C
Class 5, 18.05
Jeremy Orloff and Jonathan Bloom
1 Learning Goals
1. Be able to give examples of what uniform, exponential and normal distributions are used
to model.
2. Be able to give the range and pdf’s of uniform, exponential and normal distributions.
2 Introduction
Here we introduce a few fundamental continuous distributions. These will play important
roles in the statistics part of the class. For each distribution, we give the range, the pdf,
the cdf, and a short description of situations that it models. These distributions all depend
on parameters, which we specify.
As you look through each distribution do not try to memorize all the details; you can always
look those up. Rather, focus on the shape of each distribution and what it models.
Although it comes towards the end, we call your attention to the normal distribution. It is
easily the most important distribution defined here.
When we studied discrete random variables we learned, for example, about the Bernoulli(𝑝)
distribution. The probability 𝑝 used to define the distribution is called a parameter and
Bernoulli(𝑝) is called a parametrized distribution. For example, tosses of fair coin follow a
Bernoulli distribution where the parameter 𝑝 = 0.5. When we study statistics one of the
key questions will be to estimate the parameters of a distribution. For example, if I have
a coin that may or may not be fair then I know it follows a Bernoulli(𝑝) distribution, but
I don’t know the value of the parameter 𝑝. I might run experiments and use the data to
estimate the value of 𝑝.
As another example, the binomial distribution Binomial(𝑛, 𝑝) depends on two parameters
𝑛 and 𝑝.
In the following sections we will look at specific parametrized continuous distributions.
The applet https://mathlets.org/mathlets/probability-distributions/ allows you
to visualize the pdf and cdf of these distributions and to dynamically change the parameters.
3 Uniform distribution
1. Parameters: 𝑎, 𝑏.
2. Range: [𝑎, 𝑏].
1
18.05 Class 5, Gallery of Continuous Random Variables, Spring 2022 2
6. Models: Situtations where all outcomes in the range have equal probability (more
precisely all outcomes have the same probability density).
Graphs:
1
𝑏−𝑎 𝑓(𝑥)
𝐹 (𝑥)
1
𝑥 𝑥
𝑎 𝑏 𝑎 𝑏
pdf and cdf for uniform(𝑎,𝑏) distribution.
Example 1. 1. Suppose we have a tape measure with markings at each millimeter. If we
measure (to the nearest marking) the length of items that are roughly a meter long, the
rounding error will be uniformly distributed between -0.5 and 0.5 millimeters.
2. Many board games use spinning arrows (spinners) to introduce randomness. When spun,
the arrow stops at an angle that is uniformly distributed between 0 and 2𝜋 radians.
3. In most pseudo-random number generators, the basic generator simulates a uniform
distribution and all other distributions are constructed by transforming the basic generator.
4 Exponential distribution
1. Parameter: 𝜆.
Example 2. If I step out to 77 Mass Ave after class and wait for the next taxi, my waiting
time in minutes is exponentially distributed. We will see that in this case 𝜆 is given by
1/(average number of taxis that pass per minute).
18.05 Class 5, Gallery of Continuous Random Variables, Spring 2022 3
Example 3. The exponential distribution models the waiting time until an unstable isotope
undergoes nuclear decay. In this case, the value of 𝜆 is related to the half-life of the isotope.
Memorylessness: There are other distributions that also model waiting times, but the
exponential distribution has the additional property that it is memoryless. Here’s what this
means in the context of Example 2: suppose that the probability that a taxi arrives within
the first five minutes is 𝑝. If I wait five minutes and, in this case, no taxi arrives, then the
probability that a taxi arrives within the next five minutes is still 𝑝. That is, my previous
wait of 5 minutes has no impact on the length of my future wait!
By contrast, suppose I were to instead go to Kendall Square subway station and wait for
the next inbound train. Since the trains are coordinated to follow a schedule (e.g., roughly
12 minutes between trains), if I wait five minutes without seeing a train then there is a far
greater probability that a train will arrive in the next five minutes. In particular, waiting
time for the subway is not memoryless, and a better model would be the uniform distribution
on the range [0,12].
The memorylessness of the exponential distribution is analogous to the memorylessness
of the (discrete) geometric distribution, where having flipped 5 tails in a row gives no
information about the next 5 flips. Indeed, the exponential distribution is precisely the
continuous counterpart of the geometric distribution, which models the waiting time for a
discrete process to change state. More formally, memoryless means that the probability of
waiting 𝑡 more minutes is independent of the amount of time already waited. In symbols,
since the event ‘waited at least 𝑠 minutes’ contains the event ’waited at least 𝑠 + 𝑡 minutes’.
Therefore the formula for conditional probability gives
𝑃 (𝑋 > 𝑠 + 𝑡) e−𝜆(𝑠+𝑡)
𝑃 (𝑋 > 𝑠 + 𝑡 | 𝑋 > 𝑠) = = = e−𝜆𝑡 = 𝑃 (𝑋 > 𝑡).
𝑃 (𝑋 > 𝑠) e−𝜆𝑠
The probability 𝑃 (𝑋 > 𝑠 + 𝑡) = e−𝜆(𝑠+𝑡) is the formula for the right tail probability given
above.
Graphs:
18.05 Class 5, Gallery of Continuous Random Variables, Spring 2022 4
5 Normal distribution
In 1809, Carl Friedrich Gauss published a monograph introducing several notions that have
become fundamental to statistics: the normal distribution, maximum likelihood estimation,
and the method of least squares (we will cover all three in this course). For this reason,
the normal distribution is also called the Gaussian distribution, and it is by far the most
important continuous distribution.
1. Parameters: 𝜇, 𝜎.
The standard normal distribution 𝑁 (0, 1) has mean 0 and variance 1. We reserve 𝑍 for a
1 2
standard normal random variable, 𝜙(𝑧) = √ e−𝑧 /2 for the standard normal density, and
2𝜋
Φ(𝑧) for the standard normal distribution.
Note: we will define mean and variance for continuous random variables next time. They
have the same interpretations as in the discrete case. As you might guess, the normal
distribution 𝑁 (𝜇, 𝜎2 ) has mean 𝜇, variance 𝜎2 , and standard deviation 𝜎.
Here are some graphs of normal distributions. Note that they are shaped like a bell curve.
Note also that as 𝜎 increases they become more spread out.
18.05 Class 5, Gallery of Continuous Random Variables, Spring 2022 5
The bell curve: First we show the standard normal probability density and cumulative
distribution functions. Below that is a selection of normal densities. Notice that the graph
is centered on the mean and the bigger the variance the more spread out the curve.
0.5 1.0
0.4 0.8
𝜙(𝑧) Φ(𝑧)
0.3 0.6
0.2 0.4
0.1 0.2
−4 −2 0 2 4
𝑧 𝑧
−4 −2 0 2 4
0.3
0.2
0.1
−4 −2 0 2 4 6 8 10
Notation note. In the figure above we use our notation√ 𝑁 (𝜇, 𝜎2 ). So, for example,
𝑁 (8, 0.5) has variance 0.5 and standard deviation 𝜎 = 0.5 ≈ 0.7071.
To make approximations it is useful to remember the following rule of thumb for three
approximate probabilities from the standard normal distribution:
within 1 ⋅ 𝜎 ≈ 68%
within 3 ⋅ 𝜎 ≈ 99%
68%
95%
99%
𝑧
−3𝜎 −2𝜎 −𝜎 𝜎 2𝜎 3𝜎
Symmetry calculations
We can use the symmetry of the standard normal distribution about 𝑧 = 0 to make some
calculations.
Example 4. The rule of thumb says 𝑃 (−1 ≤ 𝑍 ≤ 1) ≈ 0.68. Use this to estimate Φ(1).
Solution: Φ(1) = 𝑃 (𝑍 ≤ 1). In the figure, the two tails (in blue) have combined area
1 − 0.68 = 0.32. By symmetry the left tail has area 0.16 (half of 0.32), so 𝑃 (𝑍 ≤ 1) ≈
0.68 + 0.16 = 0.84.
𝑃 (−1 ≤ 𝑍 ≤ 1)
𝑃 (𝑍 ≤ −1) 𝑃 (𝑍 ≥ 1)
.34 .34
.16 .16
𝑧
−1 1
pnorm(0,0,1)
[1] 0.5
pnorm(1,0,2)
[1] 0.6914625
pnorm(1,0,1) - pnorm(-1,0,1)
[1] 0.6826895
pnorm(5,0,5) - pnorm(-5,0,5)
[1] 0.6826895
# Of course z can be a vector of values
pnorm(c(-3,-2,-1,0,1,2,3),0,1)
[1] 0.001349898 0.022750132 0.158655254 0.500000000 0.841344746 0.977249868 0.998650102
18.05 Class 5, Gallery of Continuous Random Variables, Spring 2022 7
Note: The R function pnorm(𝑥, 𝜇, 𝜎) uses 𝜎 whereas our notation for the normal distri-
bution N(𝜇, 𝜎2 ) uses 𝜎2 .
Here’s a table of values with fewer decimal points of accuracy
𝑧: -2 -1 0 0.3 0.5 1 2 3
Φ(𝑧): 0.0228 0.1587 0.5000 0.6179 0.6915 0.8413 0.9772 0.9987
In 18.05, we only have time to work with a few of the many wonderful distributions that are
used in probability and statistics. We hope that after this course you will feel comfortable
learning about new distributions and their properties when you need them. Wikipedia is
often a great starting point.
The Pareto distribution is one common, beautiful distribution that we will not have time
to cover in depth.
𝑚𝛼
𝐹 (𝑥) = 1 − , for 𝑥 ≥ 𝑚
𝑥𝛼
7. Models: The Pareto distribution models a power law, where the probability that
an event occurs varies as a power of some attribute of the event. Many phenomena
follow a power law, such as the size of meteors, income levels across a population, and
population levels across cities. See Wikipedia for loads of examples:
https://en.wikipedia.org/wiki/Pareto_distribution#Applications
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.