Skip to content

Commit f46622b

Browse files
authored
Merge pull request #289 from QuantEcon/review_prob
Updates to prob_dist lecture
2 parents 6db9dcd + f51796e commit f46622b

File tree

1 file changed

+82
-11
lines changed

1 file changed

+82
-11
lines changed

lectures/prob_dist.md

Lines changed: 82 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ kernelspec:
2323

2424
## Outline
2525

26-
In this lecture we give a quick introduction to data and probability distributions using Python
26+
In this lecture we give a quick introduction to data and probability distributions using Python.
2727

2828
```{code-cell} ipython3
2929
:tags: [hide-output]
@@ -42,7 +42,7 @@ import seaborn as sns
4242

4343
## Common distributions
4444

45-
In this section we recall the definitions of some well-known distributions and show how to manipulate them with SciPy.
45+
In this section we recall the definitions of some well-known distributions and explore how to manipulate them with SciPy.
4646

4747
### Discrete distributions
4848

@@ -61,7 +61,7 @@ $$ \mathbb P\{X = x_i\} = p(x_i) \quad \text{for } i= 1, \ldots, n $$
6161
The **mean** or **expected value** of a random variable $X$ with distribution $p$ is
6262

6363
$$
64-
\mathbb E X = \sum_{i=1}^n x_i p(x_i)
64+
\mathbb{E}[X] = \sum_{i=1}^n x_i p(x_i)
6565
$$
6666

6767
Expectation is also called the *first moment* of the distribution.
@@ -71,15 +71,15 @@ We also refer to this number as the mean of the distribution (represented by) $p
7171
The **variance** of $X$ is defined as
7272

7373
$$
74-
\mathbb V X = \sum_{i=1}^n (x_i - \mathbb E X)^2 p(x_i)
74+
\mathbb{V}[X] = \sum_{i=1}^n (x_i - \mathbb{E}[X])^2 p(x_i)
7575
$$
7676

7777
Variance is also called the *second central moment* of the distribution.
7878

7979
The **cumulative distribution function** (CDF) of $X$ is defined by
8080

8181
$$
82-
F(x) = \mathbb P\{X \leq x\}
82+
F(x) = \mathbb{P}\{X \leq x\}
8383
= \sum_{i=1}^n \mathbb 1\{x_i \leq x\} p(x_i)
8484
$$
8585

@@ -157,6 +157,75 @@ Check that your answers agree with `u.mean()` and `u.var()`.
157157
```
158158

159159

160+
#### Bernoulli distribution
161+
162+
Another useful (and more interesting) distribution is the Bernoulli distribution
163+
164+
We can import the uniform distribution on $S = \{1, \ldots, n\}$ from SciPy like so:
165+
166+
```{code-cell} ipython3
167+
n = 10
168+
u = scipy.stats.randint(1, n+1)
169+
```
170+
171+
172+
Here's the mean and variance
173+
174+
```{code-cell} ipython3
175+
u.mean(), u.var()
176+
```
177+
178+
The formula for the mean is $(n+1)/2$, and the formula for the variance is $(n^2 - 1)/12$.
179+
180+
181+
Now let's evaluate the PMF
182+
183+
```{code-cell} ipython3
184+
u.pmf(1)
185+
```
186+
187+
```{code-cell} ipython3
188+
u.pmf(2)
189+
```
190+
191+
192+
Here's a plot of the probability mass function:
193+
194+
```{code-cell} ipython3
195+
fig, ax = plt.subplots()
196+
S = np.arange(1, n+1)
197+
ax.plot(S, u.pmf(S), linestyle='', marker='o', alpha=0.8, ms=4)
198+
ax.vlines(S, 0, u.pmf(S), lw=0.2)
199+
ax.set_xticks(S)
200+
plt.show()
201+
```
202+
203+
204+
Here's a plot of the CDF:
205+
206+
```{code-cell} ipython3
207+
fig, ax = plt.subplots()
208+
S = np.arange(1, n+1)
209+
ax.step(S, u.cdf(S))
210+
ax.vlines(S, 0, u.cdf(S), lw=0.2)
211+
ax.set_xticks(S)
212+
plt.show()
213+
```
214+
215+
216+
The CDF jumps up by $p(x_i)$ and $x_i$.
217+
218+
219+
```{exercise}
220+
:label: prob_ex2
221+
222+
Calculate the mean and variance for this parameterization (i.e., $n=10$)
223+
directly from the PMF, using the expressions given above.
224+
225+
Check that your answers agree with `u.mean()` and `u.var()`.
226+
```
227+
228+
160229

161230
#### Binomial distribution
162231

@@ -170,7 +239,7 @@ Here $\theta \in [0,1]$ is a parameter.
170239

171240
The interpretation of $p(i)$ is: the number of successes in $n$ independent trials with success probability $\theta$.
172241

173-
(If $\theta=0.5$, this is "how many heads in $n$ flips of a fair coin")
242+
(If $\theta=0.5$, p(i) can be "how many heads in $n$ flips of a fair coin")
174243

175244
The mean and variance are
176245

@@ -215,12 +284,12 @@ plt.show()
215284

216285

217286
```{exercise}
218-
:label: prob_ex2
287+
:label: prob_ex3
219288
220289
Using `u.pmf`, check that our definition of the CDF given above calculates the same function as `u.cdf`.
221290
```
222291

223-
```{solution-start} prob_ex2
292+
```{solution-start} prob_ex3
224293
:class: dropdown
225294
```
226295

@@ -304,7 +373,7 @@ The definition of the mean and variance of a random variable $X$ with distributi
304373
For example, the mean of $X$ is
305374

306375
$$
307-
\mathbb E X = \int_{-\infty}^\infty x p(x) dx
376+
\mathbb{E}[X] = \int_{-\infty}^\infty x p(x) dx
308377
$$
309378

310379
The **cumulative distribution function** (CDF) of $X$ is defined by
@@ -328,7 +397,7 @@ This distribution has two parameters, $\mu$ and $\sigma$.
328397

329398
It can be shown that, for this distribution, the mean is $\mu$ and the variance is $\sigma^2$.
330399

331-
We can obtain the moments, PDF, and CDF of the normal density as follows:
400+
We can obtain the moments, PDF and CDF of the normal density as follows:
332401

333402
```{code-cell} ipython3
334403
μ, σ = 0.0, 1.0
@@ -659,7 +728,7 @@ x.mean(), x.var()
659728

660729

661730
```{exercise}
662-
:label: prob_ex3
731+
:label: prob_ex4
663732
664733
Check that the formulas given above produce the same numbers.
665734
```
@@ -700,6 +769,7 @@ The monthly return is calculated as the percent change in the share price over e
700769
So we will have one observation for each month.
701770

702771
```{code-cell} ipython3
772+
:tags: [hide-output]
703773
df = yf.download('AMZN', '2000-1-1', '2023-1-1', interval='1mo' )
704774
prices = df['Adj Close']
705775
data = prices.pct_change()[1:] * 100
@@ -777,6 +847,7 @@ Violin plots are particularly useful when we want to compare different distribut
777847
For example, let's compare the monthly returns on Amazon shares with the monthly return on Apple shares.
778848

779849
```{code-cell} ipython3
850+
:tags: [hide-output]
780851
df = yf.download('AAPL', '2000-1-1', '2023-1-1', interval='1mo' )
781852
prices = df['Adj Close']
782853
data = prices.pct_change()[1:] * 100

0 commit comments

Comments
 (0)