Stat Notes
Stat Notes
Stat Notes
Mode
median
mean
Mode
Consider this dataset showing the retirement age of 11 people, in whole years: 54, 54, 54, 55, 56, 57,
57, 58, 58, 60, 60
This table shows a simple frequency distribution of the retirement age data.
The most commonly occurring value is 54, therefore the mode of this distribution is 54 years.
Median
The median is the middle value in distribution when the values are arranged in ascending or
descending order.
Looking at the retirement age distribution (which has 11 observations), the median is the middle value, which
is 57 years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
When the distribution has an even number of observations, the median value is the mean of the two middle
values. In the following distribution, the two middle values are 56 and 57, therefore the median equals 56.5
years:
52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
Mean
The mean is the sum of the value of each observation in a dataset divided by the number of
observations.
Looking at the retirement age distribution again:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
The mean is calculated by adding together all the values (54+54+54+55+56+57+57+58+58+60+60 = 623)
and dividing by the number of observations (11) which equals 56.6 years.
For example, in our sample of test scores (10, 8, 10, 8, 8, and 4) the mean or mathematical
average was 8.
10 - 8 = 2; 8 - 8 = 0, 10 - 8 = 2, 8 - 8 = 0, 8 - 8 = 0, and 4 - 8 = -4.
Step 2 : Square all of the numbers from each of the subtractions you just did.
Step 3 : Add the squared numbers together. This figure is called the sum of squares.
Step 4 : Divide the sum of squares by (n-1). Remember, n is how many numbers are in your sample. Doing this
step will provide the variance. The reason to use n-1 is to have sample variance and population variance
unbiased.
Coefficient of Variation
the ratio of the standard deviation to the mean
21, 23, 19, 17, 12, 15, 15, 17, 17, 19, 23, 23, 21, 23, 25, 25, 21, 19, 19, 19
Probability
ratio of the number of favorable outcomes to the total number of outcomes of an event.
For an experiment having 'n' number of outcomes, the number of favorable outcomes can be denoted
by x. The formula to calculate the probability of an event is as follows.
Z Test
The z-test can be performed on one sample, two samples, or on proportions for hypothesis
testing.
It checks if the means of two large samples are different or not when the population variance is
known.
A z-test can further be classified into left-tailed, right-tailed, and two-tailed hypothesis tests
depending on the parameters of the data.
For this purpose, the null hypothesis and the alternative hypothesis must be set up and the value
of the z test statistic must be calculated. The decision criterion is based on the z critical value.
One-Sample Z Test
The algorithm to set a one sample z test based on the z test statistic is given as follows:
Decision Criteria: If the z statistic > z critical value then reject the null hypothesis.
Decision Criteria: If the z statistic > z critical value then reject the null hypothesis.
The two-sample z test can be set up in the same way as the one-sample test. However, this test will be used to
compare the means of the two samples. For example, the null hypothesis is given as H0: μ1=μ2
Correlation
correlation tells us how related two variables are
Positive correlations – occur when both variables move in the same direction (e.g., as SAT scores
increase, so to do GPAs).
Negative Correlations – occur when one variable increases, the other decreases (e.g., as age
increases, the number of speeding tickets decrease).