1) Common Univariate Summaries: I) I) Iii) I) Ii)
1) Common Univariate Summaries: I) I) Iii) I) Ii)
x
= the variance-covariance matrix for x. |
x
|= the determinant of the matrix
x
14) Conditional Distributions
i) Slices correspond to conditional distributions.
ii) The conditional distribution of any multivariate normal random vector given the value
of any linear combination of its elements is again multivariate normal.
15) Marginal Distributions
The marginal distribution of any subset of a multinomial random vector is again
multivariate normal.
16) Transformations
If x is multivariate normal, then so is Mx + b.
Expectation E{Mx + b} = M E{x} + b.
Variance-covariance matrix
Mx + b
= M
x
M
T
17) Central Limit Theorem: Intuitive Interpretation
Random variables whose values are largely determined by a large number of roughly
independent and comparably influential factors will likely be close to normally
distributed.
Common examples where the theorem is typically relevant:
i) Sums or averages of observations that are generated from a distribution that is
not severely skewed.
ii) Linear measurements on individuals sampled from a homogeneous collection of
biological organisms.
iii) Linear measurements on output from a carefully controlled industrial process.
Common examples where the theorem is typically not relevant and the
distribution tends to be positively skewed:
i) Sizes of items that can grow or shrink (like cities) where larger items have a
competitive advantage.
ii) Numbers of parasites, pathogens, etc., on hosts (e.g., numbers of mountain pine
beetles caught in pheromone traps near Prince George, BC.
iii) Measurements of mass, volume, or area when the Central Limit Theorem applies
to linear measurements.
Multivariate Central Limit Theorem: Intuitive Interpretation
Random vector whose values are largely determined by a large number of roughly
independent and comparably influential factors will likely be close to normally
distributed.
Prototypical Examples of Normally Distributed Variables
i) Averages of values obtained by random sampling from a population distribution that
is not severely skewed.
ii) Measurement errors.
iii) Sizes of linear measurements (length, girth, e.g., but not area, volume or weight, e.g.,
of Body parts in homogeneous groups of biological organisms, or Parts produced by a
well-controlled industrial process.
Prototypical Examples of Normally Distributed Vectors
i) Averages of multivariate sets of values obtained by random sampling from a
population for which the distribution of each component value is not severely skewed.
ii) Multivariate measurement errors.
iii) Sizes of multivariate sets of linear measurements (length, girth, e.g., but not area,
volume or weight, e.g., of body parts in homogeneous groups of biological organisms, or
Parts produced by a well-controlled industrial process.
18) Typical Problems to Watch for
i) Component variables that tend to be skewed (like incomes, heights of mountains
within 200 km of Vancouver, and daily rainfall amounts in the fall semester at the top of
Burnaby Mountain).
ii) Component variables that are related nonlinearly, e.g., baby lengths and weights.
19) Distribution Shape May Vary for Different Components
Examples:
i) Samples of persons: IQ scores, height, income, net wealth, RRSP savings.
ii) Samples of viral strains: Number of deleted nucleotides in a key segment, virulence,
prevalence (proportion of infected individuals, intensity (average numbers in infected
individuals), abundance (average numbers in entire host population).
20) Some Components May Not Even Be Quantitative
1. Examples: i) samples of persons: Sex, Religion. ii) Samples of viral strains: Names of
particular deletions present.
2. Often deleted, e.g., in a principal component analysis.
3. Need special consideration in, e.g., cluster analysis.
21) Standardizing a Normal Random Vector
Univariate Case: Z = (X )/ or more commonly, Z
j
= (X
j
Xbar) / s
Multivariate Case i) Option1 ii) Standardize each component Z
i
= (X
i
-
i
)/
i
or
Z
ij
= (X
ij
- xbar)/ S
i
Standardizing a Normal Random Vector
i) Option 1 does not make z standard multivariate normal unless the components of x
are independent. ii) To make z something like the following: z = x / sqrt().
Quantile-Quantile Plots
Step 1 i) Order the values ii) Raw Data: X
1
, X
2
....X
n
iii) Ordered values: X
(1)
< .....< X
(n)
Step 2 i) Compare these ordered values, X
(1)
< .....< X
(n)
to what you would "expect" to
observe if the data were generate by a normal distribution. ii) A simple way to do this,
starting with the sample median.
What might you expect the sample median to be close to in a random sample
from a normal distribution? The simplest guess is the median for the normal
distribution (which equals the mean).
What might you expect the lower quartile to be close to in a random sample
from a normal distribution? The simplest guess is the lower quartile for the normal
distribution.
e.g. n = 99 k = 50: median value with k = (1+99)/ 2 = (1+n)/2
k= 25 the lower quartile with k = (1+99)/4 = (1+n)/4
Step 3 i) Plot the observed values vs. the expected values.
ii) They should follow a straight line.
Step 4 i) You can add such a line to help you to assess the plot.
ii) The R function, qqline(x) draws such a line.
iii) Sample applications in R are provided separately.
Three-Step Solution for Assessing Normality
1. Standardize the data.
2. Measure the distance of each point from the centre. (This is just the length of
standardized vectors, Z)
3. Plot these distances in a way that highlights potential outliers.
Standardizing Multivariate Data:
Step 1 i) Subtract off the mean. W = X
x
. Then
w
= 0, but
w
=
x
Step 2 Find some orthogonal matrix U that makes the components of Y = UW independent.
Then
y
= 0 and y = U
w
U
T
= D, with D diagonal. (That is just the length of
the standardized vectors, Z)
Step 3 rescale Y to Z = D
-1/2
Y. Then
z
=
D^(-1/2)Y
= D
-1/2
D D
(-1/2)
= I
Underlying Assumptions
1. Formal Statistical inference: i) Observations generated independently ii) With the
same, multivariate normal distribution,
2. Exploratory Analyses: No former requirements, but watch out influential outliers.