Lecture 14 1756137910 231018 104530
Lecture 14 1756137910 231018 104530
Lecture 14 1756137910 231018 104530
Correlation Coefficient
Correlation is a linear association between two variables X and Y, defined by the equation:
𝑉𝑎𝑟(𝑋)𝑉𝑎𝑟(𝑌) = 𝜎 𝜎 = (𝜎 𝜎 ) = 𝜎 𝜎
And the equality on the numerator was found by solving Q3 of Assignment #3:
𝐸[(𝑋 − 𝜇 )(𝑌 − 𝜇 )] = ⋯ = 𝐸(𝑋𝑌) − 𝜇 𝜇
No units of measurements, but captures the relative distance from the mean (e.g., ).
∗
−1 ≤ 𝜌 , ≤ 1
𝜌 , = 1: Perfect positive linear relationship between X and Y.
𝜌 , = −1: Perfect negative linear relationship between X and Y.
𝜌 , = 0: No linear relationship between X and Y.
Independent X and Y implies 𝜌 , = 0.
Does 𝜌 , = 0 imply independent X and Y? Why?
We will talk about correlation coefficient more in detail when we get into inferential statistics.
14.1 Affine Transformation of Gaussian Distribution Functions (Exercise #1)
We continue from example 1 from Section 10.5 of Lecture Note #10.
Then, we know can take the derivative of 𝐹 to compute the PDF 𝑓 (𝑦):
𝑑[𝐹 (𝑦)] 1 𝑦−𝑏
𝑓 (𝑦) = = 𝑓
𝑑𝑦 |𝑎| 𝑎
Now, recall that what’s inside of the parenthesis of the function is the variable:
1 𝑦−𝑏
𝑓 (𝑦) = 𝑓
|𝑎| 𝑎
( )
1 1
= 𝑒
|𝑎| 𝜎 √2𝜋
We manipulate what’s outside of 𝑒 (∙) first (the part inside the big brackets):
1 1
𝑓 (𝑦) = 𝑒 (∙)
|𝑎| 𝜎 √2𝜋
1 1 1
= 𝑒 (∙)
√𝑎 𝜎 √2𝜋
1 1
= 𝑒 (∙)
(𝑎𝜎 ) √2𝜋
1
= 𝑒 (∙)
(𝑎𝜎 )√2𝜋
( )
𝑒 ⇒
(𝑦 − 𝑏)
−𝜇 (𝑦 − 𝑏) − 𝑎𝜇
𝑎
− =−
2𝜎 2𝑎 𝜎
(𝑦 − 𝑏 − 𝑎𝜇 )
=−
2𝑎 𝜎
𝑦 − (𝑏 + 𝑎𝜇 )
=− ⇒
2𝑎 𝜎
( )
𝑒
Again, let’s compare parts concerning 𝑒 (∙) for 𝑓 (𝑥) and 𝑓 (𝑦):
( ) ( )
𝑓 (𝑥) = [∙]𝑒 𝑓 (𝑦) = [∙]𝑒
( ) ( )
1 1
𝑓 (𝑥) = 𝑒 𝑓 (𝑦) = 𝑒
𝜎 √2𝜋 (𝑎𝜎 )√2𝜋
Highlighted in yellow should indicate something about the variance or standard deviation.
Highlighted in green should indicate something about the mean.
Solution to Q1
Part (A): What is 𝑝 (5)?
Simple answer would be 𝑝 (5) = 0, since 5 ∉ 𝐵 ≡ {0,1,2,3,4}.
A more detailed answer would be confirming that result using the properties of probability:
𝑝 (𝑥) = 1
∈
2𝑥 + 1 1 + 3 + 5 + 7 + 9 25
⇒ = = =1
25 25 25
⇒ 𝑝 (𝑥) = 0 ∀𝑥 ∉ 𝐵.
…
Part (D): 𝑃(2 ≤ 𝑋 < 4) = 𝑝 (2) + 𝑝 (3).
Part (E): 𝑃(𝑋 > −10) = 𝑃(−10 < 𝑋 < 0) + 𝑃(0 ≤ 𝑋 ≤ 4) + 𝑃(𝑋 > 4).
Solution
Part (A):
We note that there are a total of 4 differently sized jumps: from 0 to 1, 1 to 2, 2 to 3, and 3 to 4.
Recall that the jump size indicates the probabilities.
Then, this can be translated using the following terminology:
𝑝 (𝑥) = 𝑃(𝑋 = 𝑥) = the size of a jump at 𝑥 = 0
So, in sum, we should have:
0.064, 𝑥=0
⎧
⎪ 0.288, 𝑥=1
𝑝 (𝑥) = 0.432, 𝑥=2
⎨0.216, 𝑥=3
⎪
⎩ 0, otherwise
Part (B):
Among all the PMFs that we discussed in class, only one can have the subset 𝐵 = {0,1,2,3} with
each element x having unequal probabilities (i.e., I am not talking about 𝜃, but 𝑝 (𝑥) when I say
probabilities).
This looks likes the probability mass function of binomial distribution.
To see if 𝑝 indeed follows a binomial distribution, we bring up the PMF from lecture 8:
𝑛 𝑛!
𝑝 (𝑥) = 𝑃(𝑋 = 𝑥) = 𝜃 (1 − 𝜃) = 𝜃 (1 − 𝜃)
𝑥 𝑥! (𝑛 − 𝑥)!
for 𝑥 = 0,1,2, … , 𝑛
In this case, 𝑛 = 3.
Now see if it makes sense by plugging in the values.
3! 64 4∗4∗4 2
𝑝 (0) = (𝜃) (1 − 𝜃) = (1 − 𝜃) = 0.064 = = =
0! 3! 1000 10 ∗ 10 ∗ 10 5
2 3
⇒ 1−𝜃 = ⇒𝜃= .
5 5
!
To confirm, calculate 𝑝 (1) = =3 = = 0.288.
! !
Therefore, we have,
3 3 2 3! 3 2
𝑝 (𝑥) = 𝑃(𝑋 = 𝑥) = =
𝑥 5 5 𝑥! (3 − 𝑥)! 5 5
Or, simply,
3
𝑋~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 .
5
Q3 (Relationship between Binomial and Poisson; Difficult Questions)
Review Q2 and Q3 from the Assignment #3.
Please take the time to review Section 8.5 of Lecture Note #8.
Solutions:
First, how do you know if X is a continuous random variable?
In this case, I kind of hinted that it is by putting this problem under the section PDF, but how
would you be able to tell if you weren’t given this piece of information?
The only way you would be able to say that X is indeed a continuous RV is by determining:
(𝑥 + 5)
𝐹 (𝑥) =
144
for 𝑥 = −5.
Why?
Because by the general definition of PDF of a continuous random variable, we should have a
point probability equal to 0, as given by 𝑓 (𝑥) = 𝑃(𝑋 = 𝑥) = 0 .
Are there any other jumps at the boundaries of −5 ≤ 𝑥 < 7? Try by plugging in -5 ~ 7.
Seems like there aren’t. So, it’s safe to say that there’s no discontinuity in 𝐹 (𝑥).
Part (A):
Note that 𝑓 (𝑥) can be found by just taking the derivative of 𝐹 (𝑥) for 𝑥 ∈ (−5,7) and for 𝑥 ∈
{0,7}.
But what about for 𝑥 = −5?
We should have the left and right derivates be equal to each other to clearly define that 𝑓 (𝑥)
exists at 𝑥 = −5.
Since we have already pointed that out earlier by observing that there’s no discontinuity in 𝐹 (𝑥)
at 𝑥 = −5, 𝑓 (𝑥) is defined as the following:
𝑑 𝑥+5
𝑓 (𝑥) = [𝐹 (𝑥)] = 72 , 𝑥 ∈ (−5, 7)
𝑑𝑥
0, otherwise
Part (B):
𝑥(𝑥 + 5)
𝐸(𝑋) = 𝑥𝑓(𝑥)𝑑𝑥 = 𝑑𝑥 = ⋯ = 3
72
Part (C):
You just need to compute,
𝑥 (𝑥 + 5)
𝐸(𝑋 ) = 𝑥 𝑓(𝑥)𝑑𝑥 = 𝑑𝑥 = ⋯ = 17
72
Part (D):
This is just testing whether you can compute the integral:
𝑥 (𝑥 + 5) 431
𝐸(𝑋 ) = 𝑥 𝑓(𝑥)𝑑𝑥 = 𝑑𝑥 = ⋯ =
72 5
Q2 (Uniform Distribution & Independence of Sub-intervals)
Please take the time to review Q3 of Assignment #3.
Φ(−𝑥) = 𝐹 (−𝑥) = 𝑃(𝑋 ≤ −𝑥) = 1 − 𝑃(𝑋 > −𝑥) = 1 − 𝑃(𝑋 ≥ −𝑥) = 1 − 𝑓 (𝑡) 𝑑𝑡
So, PDF of standard normal is an even function, which likely implies that:
= 1 − Φ(𝑥)