Stat Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Central Tendency and Data Distribution

 There are three main measures of central tendency:

Mode

median

mean

 Mode

The mode is the most commonly occurring value in a distribution.

Consider this dataset showing the retirement age of 11 people, in whole years: 54, 54, 54, 55, 56, 57,
57, 58, 58, 60, 60

This table shows a simple frequency distribution of the retirement age data.

The most commonly occurring value is 54, therefore the mode of this distribution is 54 years.

 Median

The median is the middle value in distribution when the values are arranged in ascending or
descending order.

Looking at the retirement age distribution (which has 11 observations), the median is the middle value, which
is 57 years:

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
When the distribution has an even number of observations, the median value is the mean of the two middle
values. In the following distribution, the two middle values are 56 and 57, therefore the median equals 56.5
years:
52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

 Mean

The mean is the sum of the value of each observation in a dataset divided by the number of
observations.
Looking at the retirement age distribution again:

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

The mean is calculated by adding together all the values (54+54+54+55+56+57+57+58+58+60+60 = 623)
and dividing by the number of observations (11) which equals 56.6 years.

 Finding the Variance In Your Sample


The variance is a figure that represents how far the data in your sample is clustered around the
mean.
Step 1 : Subtract the mean from each of your numbers in your sample. This will give you a figure
of how much each data point differs from the mean.

For example, in our sample of test scores (10, 8, 10, 8, 8, and 4) the mean or mathematical
average was 8.
10 - 8 = 2; 8 - 8 = 0, 10 - 8 = 2, 8 - 8 = 0, 8 - 8 = 0, and 4 - 8 = -4.

Step 2 : Square all of the numbers from each of the subtractions you just did.

Step 3 : Add the squared numbers together. This figure is called the sum of squares.

Step 4 : Divide the sum of squares by (n-1). Remember, n is how many numbers are in your sample. Doing this
step will provide the variance. The reason to use n-1 is to have sample variance and population variance
unbiased.

 Calculating the Standard Deviation


Step 1 : Find you variance figure
Step 2 : Find the square root of you variance

 Coefficient of Variation
the ratio of the standard deviation to the mean

general steps to find the coefficient of variation are as follows:

 Step 1: Check for the sample set.


 Step 2: Calculate standard deviation and mean.
 Step 3: Put the values in the coefficient of variation formula, CV =σ/μ × 100, μ≠0,
Answer: Hence plant D has greater variability in individual wages.

 Frequency distribution of ungrouped data:

Given below are marks obtained by 20 students in Math out of 25.

21, 23, 19, 17, 12, 15, 15, 17, 17, 19, 23, 23, 21, 23, 25, 25, 21, 19, 19, 19
 Probability
ratio of the number of favorable outcomes to the total number of outcomes of an event.

For an experiment having 'n' number of outcomes, the number of favorable outcomes can be denoted
by x. The formula to calculate the probability of an event is as follows.

Probability(Event) = Favorable Outcomes/Total Outcomes = x/n


Z Test
 The z-test can be performed on one sample, two samples, or on proportions for hypothesis
testing.
 It checks if the means of two large samples are different or not when the population variance is
known.
 A z-test can further be classified into left-tailed, right-tailed, and two-tailed hypothesis tests
depending on the parameters of the data.
 For this purpose, the null hypothesis and the alternative hypothesis must be set up and the value
of the z test statistic must be calculated. The decision criterion is based on the z critical value.

 One-Sample Z Test

The formula for the z test statistic is given as follows:

The algorithm to set a one sample z test based on the z test statistic is given as follows:

Left Tailed Test:

Null Hypothesis: H0𝐻0 : μ=μ0𝜇=𝜇0


Alternate Hypothesis: H1𝐻1 : μ<μ0𝜇<𝜇0
Decision Criteria: If the z statistic < z critical value then reject the null hypothesis.

Right Tailed Test:

Null Hypothesis: H0𝐻0 : μ=μ0𝜇=𝜇0


Alternate Hypothesis: H1𝐻1 : μ>μ0𝜇>𝜇0

Decision Criteria: If the z statistic > z critical value then reject the null hypothesis.

Two Tailed Test:

Null Hypothesis: H0𝐻0 : μ=μ0𝜇=𝜇0


Alternate Hypothesis: H1𝐻1 : μ≠μ0𝜇≠𝜇0

Decision Criteria: If the z statistic > z critical value then reject the null hypothesis.

 Two Sample Z Test

The two-sample z test can be set up in the same way as the one-sample test. However, this test will be used to
compare the means of the two samples. For example, the null hypothesis is given as H0: μ1=μ2
 Correlation
correlation tells us how related two variables are
Positive correlations – occur when both variables move in the same direction (e.g., as SAT scores
increase, so to do GPAs).
Negative Correlations – occur when one variable increases, the other decreases (e.g., as age
increases, the number of speeding tickets decrease).

You might also like