Lecture 4: Random Variables and Distributions
Lecture 4: Random Variables and Distributions
Lecture 4: Random Variables and Distributions
• Random Variables
-1 0 1
"
!
Two Types of Random Variables
A a
Continuous
! Let X be a continuous rv. Then the probability density function (pdf) of
X is a function f(x) such that for any two numbers a and b with a ≤ b:
b
P(a " X " b) = # f (x)dx
a
a b
Using CDFs to Compute Probabilities
x
Continuous rv: F(x) = P(X " x) = % f (y)dy
#$
pdf cdf
pdf cdf
Continuous
The expected or mean value of a continuous rv X with pdf f(x) is:
! $
µX = E[X] = % x " f (x)dx
#$
Variance of Random Variables
Discrete
Let X be a discrete rv with pmf f(x) and expected value µ. The
variance of X is:
" X2 = V[X] = % (x #µ ) 2
= E[(X # µ ) 2
]
x $D
Continuous
The variance of a continuous rv X with pdf f(x) and mean µ is:
!
%
" X2 = V[X] = & (x # µ ) 2
$ f (x)dx = E[(X # µ ) 2
]
#%
Example of Expectation and Variance
• Let L1, L2, …, Ln be a sequence of n nucleotides and define the rv
Xi :
1, if Li = A
Xi
0, otherwise
• E[X] = 1 x pA + 0 x (1 - pA) = pA
1. Binomial Distribution
2. Hypergeometric Distribution
3. Poisson Distribution
4. Normal Distribution
Binomial Distribution
• Experiment consists of n trials
– e.g., 15 tosses of a coin; 20 patients; 1000 people surveyed
• Trials are identical and each can result in
one of the same two outcomes
– e.g., head or tail in each toss of a coin
– Generally called “success” and “failure”
– Probability of success is p, probability of failure is 1 – p
• Trials are independent
• Constant probability for each observation
– e.g., Probability of getting a tail is the same each time we
toss the coin
Binomial Distribution
pmf:
n x n"x
P{X = x} = ( ) p (1" p)
x
cdf: x
n
P{X " x} = $ ( ) p (1# p)
y
y n#y
! y= 0
E(x) = np
! Var(x) = np(1-p)
Binomial Distribution: Example 1
• A couple, who are both carriers for a recessive
disease, wish to have 5 children. They want to know
the probability that they will have four healthy kids
5 4 1
P{X = 4} = ( )0.75 " 0.25
4
= 0.395
! p(x)
0 1 2 3 4 5
Binomial Distribution: Example 2
• Wright-Fisher model: There are i copies of the A allele
in a population of size 2N in generation t. What is the
distribution of the number of A alleles in generation t
+ 1?
j 2N ( j
2N " i % " i %
pij = j
$ ' $1( ' j = 0, 1, …, 2N
# 2N & # 2N &
!
Hypergeometric Distribution
20 white balls
out of
100 balls
m n
i k-i For i = 0, 1, 2, 3, …
P{X = i | n,m,k} =
m+n
k
Where,
k = Number of balls selected
Number of
Number of
genes with
genes of
annotation
interest
• E(X) = Var(X) = λ
Poisson RV: Example 1
di
P{X = i} = e"d
i!
P{X = 0} = e"d
!
P{X " 1} = 1# e#d
!
!
Poisson RV: Example 2
= 6.17 x 10-9
!
Poisson RV: Example 3
(10) 0 e"(10)
P(X = 0) = = .0000454
0!
(10)1 e"(10)
P(X = 1) = = .000454
1!
(10) 2 e"(10)
P(X = 2) = = .00227
2!
!
Normal Distribution
! 1 $z 2 / 2
f (z;0,1) = e
2"#
• cdf of Z:
z
X "µ
Z=
#
!
I Digress: Sampling Distributions
• Before data is collected, we regard observations as random
variables (X1,X2,…,Xn)
!
Behold The Power of the CLT
• Let X1,X2,…,Xn be an iid random sample from a distribution with mean µ and
standard deviation σ. If n is sufficiently large:
"
X ~N(µ n
)
,
!
!
Example
• If the mean and standard deviation of serum iron values from
healthy men are 120 and 15 mgs per 100ml, respectively, what is
the probability that a random sample of 50 normal men will yield a
mean between 115 and 125 mgs per 100ml?
= p("2.36 # z # 2.36)
= p( z " 2.36) # p( z " #2.36)
!
= 0.9909 # 0.0091
! = 0.9818
R
• Understand how to calculate probabilities from probability
distributions
Normal: dnorm and pnorm