0% found this document useful (0 votes)
134 views5 pages

1) Common Univariate Summaries: I) I) Iii) I) Ii)

1. The document discusses methods for summarizing and visualizing univariate and bivariate data, including measures of center and dispersion, dot plots, boxplots, and scatterplots. 2. It also covers topics in multivariate analysis such as correlation, regression models, vectors, matrices, the normal distribution, transformations of multivariate normal distributions, and assessing normality. 3. Standardizing data, computing distances from the center, and plotting these distances are described as a three-step approach for assessing normality in multivariate data.

Uploaded by

wj228368867
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views5 pages

1) Common Univariate Summaries: I) I) Iii) I) Ii)

1. The document discusses methods for summarizing and visualizing univariate and bivariate data, including measures of center and dispersion, dot plots, boxplots, and scatterplots. 2. It also covers topics in multivariate analysis such as correlation, regression models, vectors, matrices, the normal distribution, transformations of multivariate normal distributions, and assessing normality. 3. Standardizing data, computing distances from the center, and plotting these distances are described as a three-step approach for assessing normality in multivariate data.

Uploaded by

wj228368867
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

1) Common Univariate Summaries

I) Measures of the Centre i) Mean ii) Median iii) Mode


II) Measures of Dispersion () i) Standard Deviation ii) Range iii) Interquartile Range
1.1) Common Univariate Displays i)Dot Plots ii)Boxplots
2)Bivariate Summaries
i) Usual univariate summaries ii)Plus Correlation AND Regression Equation
2.1)Bivariate Displays
i)Scatterplot
Scatter plots show how much one variable is affected by another. The relationship between two
variables is called their correlation.
Possibly including
Regression line, smoothed curve, or...?
A central point (point of averages or medians)
Some indication of the dispersion?
ii)Bivariate boxplot? (used to check assumptions of bivariate normality)
An extension of univariate boxplots indicating i)A centre point (replacing the median) ii)An
elliptical region containing roughly the middle half of the data (replacing the hinges or box)
iii)An outer elliptical region containing almost all the data (replacing the fence or whiskers)
Boxplot is based on medians and quartiles
Bivariate version based on similarly robust estimates

3) The regression model Y
i
= + x
i
+
i


4) Vector
An ordered list of numbers, e.g. i)Heights ii)Team statistics
Can be used to generate a point in a graph or An arrow from the origin to that point(
The usual geometric interpretation of a vector.).

5) The dot Product
x y = (x
1
, x
2
)
T
(y
1
, y
2
)
T
= (r cos(), r sin() )
T
(s cos(), s sin() )
T

= ||x||||y||cos(- )
Hence, (x y) / (||x||||y||) = the cosine of the angel between x and y
x and y are orthogonal to each other iff x y = 0.

6) Orthogonal Matrices
i) Columns have unit length, and are orthogonal to each other.
ii) Corresponds to rotations and/or reflections of axes (or sets of vectors).
iii) U
T
U = I (Taking the transpose turns row into columns. The entries of U
T
U are therefore dot
product of rows of U with each other. These are all 0 except for the entities on.)

7) Identity Matrix
i) Has no impact on a vector ii) the matrix equivalent of the number 1

8) Transpose of a Product: (AB)
T
= B
T
A
T

9) Inverse Matrix: M
-1
M = M M
-1
= I (Through it may not always exist)
Inverse of an orthogonal matrix is its transpose.

10) Eigenvalues and Eigenvectors
If M = c for some scalar c then:
i) c is called an eigenvalue of the matrix M and
ii) is called an eigenvector of the matrix M.

11) The Bivariate Standard Normal Probability Density Function
Each component is standard normal AND they are independent
f(z
1
, z
2
) =( e
z
1
2
/2
/sqrt(2pi) ) ( e
z
2
2
/2
/sqrt(2pi) ) derive the value of the
normalization constant 1/(2pi).

12) The General Bivariate Normal Distribution: Formula
i) Depends on the means and standard derivations of the two variables and their
correlations - and nothing else.
Formula:

ii) Note that the distribution is more concentrated as either of the standard deviations
becomes smaller, or the correlation is closer to 1.

13) Matrix-Vector Formulation for Bivariate Normal Probability Density Function
Where
x
= E(x) ;

x
= the variance-covariance matrix for x. |
x
|= the determinant of the matrix
x

14) Conditional Distributions
i) Slices correspond to conditional distributions.
ii) The conditional distribution of any multivariate normal random vector given the value
of any linear combination of its elements is again multivariate normal.
15) Marginal Distributions
The marginal distribution of any subset of a multinomial random vector is again
multivariate normal.

16) Transformations
If x is multivariate normal, then so is Mx + b.
Expectation E{Mx + b} = M E{x} + b.
Variance-covariance matrix
Mx + b
= M
x
M
T


17) Central Limit Theorem: Intuitive Interpretation
Random variables whose values are largely determined by a large number of roughly
independent and comparably influential factors will likely be close to normally
distributed.
Common examples where the theorem is typically relevant:
i) Sums or averages of observations that are generated from a distribution that is
not severely skewed.
ii) Linear measurements on individuals sampled from a homogeneous collection of
biological organisms.
iii) Linear measurements on output from a carefully controlled industrial process.
Common examples where the theorem is typically not relevant and the
distribution tends to be positively skewed:
i) Sizes of items that can grow or shrink (like cities) where larger items have a
competitive advantage.
ii) Numbers of parasites, pathogens, etc., on hosts (e.g., numbers of mountain pine
beetles caught in pheromone traps near Prince George, BC.
iii) Measurements of mass, volume, or area when the Central Limit Theorem applies
to linear measurements.
Multivariate Central Limit Theorem: Intuitive Interpretation
Random vector whose values are largely determined by a large number of roughly
independent and comparably influential factors will likely be close to normally
distributed.
Prototypical Examples of Normally Distributed Variables
i) Averages of values obtained by random sampling from a population distribution that
is not severely skewed.
ii) Measurement errors.
iii) Sizes of linear measurements (length, girth, e.g., but not area, volume or weight, e.g.,
of Body parts in homogeneous groups of biological organisms, or Parts produced by a
well-controlled industrial process.
Prototypical Examples of Normally Distributed Vectors
i) Averages of multivariate sets of values obtained by random sampling from a
population for which the distribution of each component value is not severely skewed.
ii) Multivariate measurement errors.
iii) Sizes of multivariate sets of linear measurements (length, girth, e.g., but not area,
volume or weight, e.g., of body parts in homogeneous groups of biological organisms, or
Parts produced by a well-controlled industrial process.

18) Typical Problems to Watch for
i) Component variables that tend to be skewed (like incomes, heights of mountains
within 200 km of Vancouver, and daily rainfall amounts in the fall semester at the top of
Burnaby Mountain).
ii) Component variables that are related nonlinearly, e.g., baby lengths and weights.

19) Distribution Shape May Vary for Different Components
Examples:
i) Samples of persons: IQ scores, height, income, net wealth, RRSP savings.
ii) Samples of viral strains: Number of deleted nucleotides in a key segment, virulence,
prevalence (proportion of infected individuals, intensity (average numbers in infected
individuals), abundance (average numbers in entire host population).

20) Some Components May Not Even Be Quantitative
1. Examples: i) samples of persons: Sex, Religion. ii) Samples of viral strains: Names of
particular deletions present.
2. Often deleted, e.g., in a principal component analysis.
3. Need special consideration in, e.g., cluster analysis.

21) Standardizing a Normal Random Vector
Univariate Case: Z = (X )/ or more commonly, Z
j
= (X
j
Xbar) / s
Multivariate Case i) Option1 ii) Standardize each component Z
i
= (X
i
-
i
)/
i
or
Z
ij
= (X
ij
- xbar)/ S
i

Standardizing a Normal Random Vector
i) Option 1 does not make z standard multivariate normal unless the components of x
are independent. ii) To make z something like the following: z = x / sqrt().

Quantile-Quantile Plots
Step 1 i) Order the values ii) Raw Data: X
1
, X
2
....X
n
iii) Ordered values: X
(1)
< .....< X
(n)
Step 2 i) Compare these ordered values, X
(1)
< .....< X
(n)
to what you would "expect" to
observe if the data were generate by a normal distribution. ii) A simple way to do this,
starting with the sample median.
What might you expect the sample median to be close to in a random sample
from a normal distribution? The simplest guess is the median for the normal
distribution (which equals the mean).
What might you expect the lower quartile to be close to in a random sample
from a normal distribution? The simplest guess is the lower quartile for the normal
distribution.
e.g. n = 99 k = 50: median value with k = (1+99)/ 2 = (1+n)/2
k= 25 the lower quartile with k = (1+99)/4 = (1+n)/4
Step 3 i) Plot the observed values vs. the expected values.
ii) They should follow a straight line.
Step 4 i) You can add such a line to help you to assess the plot.
ii) The R function, qqline(x) draws such a line.
iii) Sample applications in R are provided separately.




Three-Step Solution for Assessing Normality
1. Standardize the data.
2. Measure the distance of each point from the centre. (This is just the length of
standardized vectors, Z)
3. Plot these distances in a way that highlights potential outliers.

Standardizing Multivariate Data:
Step 1 i) Subtract off the mean. W = X
x
. Then
w
= 0, but
w
=
x

Step 2 Find some orthogonal matrix U that makes the components of Y = UW independent.
Then
y
= 0 and y = U
w
U
T
= D, with D diagonal. (That is just the length of
the standardized vectors, Z)
Step 3 rescale Y to Z = D
-1/2
Y. Then
z
=
D^(-1/2)Y
= D
-1/2
D D
(-1/2)
= I

Underlying Assumptions
1. Formal Statistical inference: i) Observations generated independently ii) With the
same, multivariate normal distribution,
2. Exploratory Analyses: No former requirements, but watch out influential outliers.

You might also like