Parametric Tests

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

Parametric Tests

Koach Scholar
Content
● Parametric and Non-Parametric

● Data Types

● t- Test

● ANOVA
Parametric & Non-Parametric Tests
Hypothesis Testing
Ratio Ordinal
Interval Nominal

Parametric Tests Non-Parametric Tests

One Sample Two >2 Sample One Sample Two/ More


Sample Sample

Z Test ANOVA Chi-Square


t Test
Independent Paired Independent Paired
Sample Sample Sample Sample

Z Test t Test Chi-Square Wilcoxon Rank


Mann Whitney Sum
t Test
(2 groups)
Data Types
Data Types - Summary
Difference between measurements
Ratio and True zero exist.

Difference between measurements


Interval but no True zero

Ordered Categorical (ranks,


Ordinal order or scaling)

Categorical (No Ordering or


Nominal direction
Data Types
Four levels of measurement in research and statistics:

Nominal Scale :
Naming scale, where variables are simply named/ labeled, with
no specific order. Categorical. Used to classify. Countable.
Eg - Gender, location, telephone numbers, names etc.

Ordinal Scale :
Has all its variables in a specific order, beyond just naming
them. In addition to nominal capabilities it can be ranked.
Eg - Raking, Order or Scale.
Data Types
Four levels of measurement in research and statistics:
Interval Scale :

● Scale in which distances between the consecutive numbers


have meaning and data are always numerical.
● ‘Interval’ indicates ‘distance between two entities’ which
is what Interval scale helps in achieving.
● Central tendency in this scale are Mean, median, or mode
● The only drawback of this scale is that there no
pre-decided starting point or a true zero value.

Eg : temperature
Data Types
Four levels of measurement in research and statistics:
Ratio Scale :
Variable measurement scale that not only produces the order of variables
but also makes the difference between variables known along with
information on the value of true zero. Calculated by assuming that the
variables have an option for zero, the difference between the two
variables is the same and there is a specific order between the options.

With the option of true zero, varied inferential, and descriptive


analysis techniques can be applied to the variables.

Eg : Height, Weight are best examples.

In market research, ratio scale is used to calculate market share, annual


sales, price of an upcoming product, the number of consumers, etc.
t-Test
Student’s t-Test

● Invented by statistician - William


S. Gosset (1867-1937) while working
for a for a quality control job in
Guinness brewery

● Since he invented this while


working in a job, he preferred to
keep his name anonymous and
published the results under the pen
name ‘Student’
Student’s t-Test - when used?
● T-distribution is a series of
distribution as every sample
size (being small sample) has
a different distribution

● Used in 2 scenarios -
● When population Standard
Deviation is unknown and
population is normal
(irrespective of sample
size) OR
● When the sample size < 30
Characteristics of t-distribution
● Symmetric, unimodal and a
family of curves

● Flatter from the middle and


more area in the tails

● t distribution approaches the


Standard Normal Curve (z) as n
becomes large
Assumption of t-distribution
● Scale of measurement applied to the data
follows a continuous or ordinal scale

● The data is collected from a


representative, randomly selected portion
of the total population.

● Data, when plotted is reasonably large


and results in a normal distribution,
bell-shaped distribution curve.

● Homogeneity of Variance. Homogeneous, or


equal, variance exists when the standard
deviations of samples are approximately
equal.
Calculation of t-statistics
t = Student's t-test

= Sample Mean

𝛍 = Population mean

s = Sample Standard Deviation

n = Sample size
Types of t-Test
t -Test Define Example
Compare the mean of One group A researcher wants to
against the population mean determine if the average
One Sample (which is a theoretical eating time for a (standard
t-Test value). size) burger differs from a
set value i.e. 10 minutes.

The two-sample t-test is used Compare the average height of


Independent to compare the means of two the male employees to the
Two Sample different samples average height of the females.
t-Test

Used to compare the related Comparing the productivity


Dependent observations. We compare levels before and after the
/Paired separate means for a group at training programs in a
Sample two different times or under company.
t-Test two different conditions
Example - 1 One Sample t-Test - Mobile screen size
A mobile manufacturing company has taken a sample of mobiles
of the same model from the previous month’s data. They want
to check whether the average screen size of the sample
differs from the desired length of 10 cm.

9.941146 10.02583 Dataset to be used -


9.980651 9.933906
screen_size-data_onesam
ple_t-test
9.924156 10.00131

10.09911 10.00514

9.971731 10.02588
Example - 1 One Sample t-Test - Mobile screen size

The mean (𝛍) screen


H0 : size is equal to 10
H0 : 𝛍 = 10

The mean size of screen


H1 : is not equal to 10
H1 : 𝛍 ≠ 10
In RStudio - Rcmdr approach
In RStudio - Rcmdr approach
Output in RStudio

Rejection Rejection
Region Region

0.025
0.025

Since the p-value > 0.05 i.e. p does not lie in the Rejection region.
Therefore, we ‘Fail to Reject’ the null hypothesis at a 95% confidence
interval.
Example -2 Independent Sample t-Test

We want to evaluate if Monthly Mobile bill of Males and


Females are different. Data is collect for the Gender and
their Monthly Mobile bill. Conduct an Independent Sample
t-test for testing the Bill amount of Males and Females to
test if they are any different.

Dataset to be used -
Monhtly_Mobile_Bill_Independent_Sample_t-Test
Example -2 Independent Sample t-Test

The average monthly mobile


H0 : bills for the males and females H0 : 𝛍1 = 𝛍2
are not significantly different

The average monthly mobile


H1 : bills for the males and females H1 : 𝛍1 ≠ 𝛍2
are significantly different
Output in RStudio

Interpretation

Since the p > 0.05, we ‘Fail to Reject’ the Null Hypothesis


thus we conclude that there is no ‘Significant difference’
in the average mobile bills of Males and Females respondents
In RStudio - Rcmdr Approach
In RStudio - Rcmdr Approach
In RStudio - Rcmdr Approach

Interpretation

Since the p > 0.05, we ‘Fail to Reject’ the Null Hypothesis


thus we conclude that there is no ‘Significant difference’
in the average mobile bills of Males and Females respondents
Example - 3 Paired Sample t-Test

Company wants to measure the efficacy of a technical


training. It collected the Scores of the randomly selected
25 employees in a company. The scores were give out of 10 in
a technical skill.

The first set of scores were collected before the training


happened and the second set of scores were collected after
the company arranged for the technical training.

Dataset to be used - Paired_Sample_t-test


Example - 3 Paired Sample t-Test

The training did not make any


H0 : significant impact on the H0 : 𝛍b = 𝛍a
technical skills knowledge of
the employees

The training not make any


H1 : significant impact on the H1 : 𝛍b ≠ 𝛍a
technical skills knowledge of
the employees
Example - 3 Paired Sample t-Test - Testing Assumptions
Example - 3 Paired Sample t-Test

Interpretation

Since the p < 0.05, we Reject the Null Hypothesis thus we


conclude that there is ‘Significant difference’ in the
training scores before and after the training.
ANOVA
Analysis of Variance (ANOVA)
Suppose we want to compare 3 Sample
Means to see if a difference exists
among them.

Interpreting this statistically -

● Do all the three means come


from a common population?

● Is one mean away from the other two


so that it is likely to come from a
different population?
● All are the three are so apart that
they are likely to come from
different populations?
Why ANOVA?
Why ANOVA?

● We have been comparing 2 populations


till now

● What if we want to compare the means


of more than 2 populations?

● What if we wish to compare


populations each containing several
levels or sub-groups?

We need ANalysis Of VAriance


Analysis of Variance (ANOVA)
We take all the data points
and create a larger
distribution from them.

We need to check where is


each sample mean relative to
the over all dataset (in the
background)
Analysis of Variance (ANOVA) Variability
BETWEEN the Means

We analyse the variability


of each of the Sample Mean
from the Mean of the
combined population dataset
Analysis of Variance (ANOVA)

If a Sample Mean is way away.

So we can conclude that the


third mean does not belong to
the overall population as the
other two
Analysis of Variance (ANOVA)
They all belong to
their distinct
different population.
Analysis of Variance (ANOVA) - Variability Between
Null Hypothesis is
whether of not these
3 means likely come
from the same overall
population.

Thus we want to test


if the means of the
populations (𝛍) from
which these come from
is different?

Variability
BETWEEN the Means
Analysis of Variance (ANOVA) - Variability Within
Variability
WITHIN
the Means
ANOVA: Analysis of Variance is the Variability Ratio

Variability
BETWEEN the
Means

Variability
WITHIN
the Means

** the WITHIN variability is also called the Error Variance


ANOVA: Analysis of Variance is the Variability Ratio

Variance BETWEEN

Variance WITHIN
ANOVA: Analysis of Variance is the Variability Ratio

Variance BETWEEN
F statistic = Components of the
Total Variance
Variance WITHIN

Variance BETWEEN Variance WITHIN Total Variance


ANOVA: Analysis of Variance is the Variability Ratio
In Nutshell

If the Variability BETWEEN the means (distance from the


overall means) in the numerator is relatively large compared
to the variance WITHIN the samples (internal spread) in the
denominator, the ratio will be much larger than 1.

Indicate that the Samples most likely do NOT come from a


common population and we REJECT THE NULL (H0) and conclude
that not all the means come from the same population.
ANOVA: Analysis of Variance is the Variability Ratio

Variance BETWEEN
2
Variance WITHIN

3
ANOVA: Analysis of Variance is the Variability Ratio
Variance BETWEEN Variance WITHIN Total Variance
Example of ANOVA
We are analysing the stress levels of individuals who were
laid off. We have measured the stress levels in 3 situations
- normal situation, when the layoff are announced and at the
time of layoff. We want to understand of the is a difference
in the stress levels during the 3 situations if they are
significantly different.

Dataset used - ANOVA Test


Example of ANOVA - Setting the Hypothesis

The stress levels of the


employees are not different at H0: 𝛍1= 𝛍2= 𝛍3
H0 : the different situations i.e.
they come from the same
population

At least one of the sample is


H1 : different from others.
Results of the ANOVA in R

F statistic = F(2,12) = 22.59


p Value = 0.0000854
P < 0.05

Result of ANOVA - Since the p Value is less than the


Significance level i.e. 0.05, we REJECT the Null.
Thus we conclude that at least one of the sample
come from a different population.
Rcmdr Approach
Rcmdr Approach
F statistic = F(2,12) = 22.59

p Value = 0.0000854

P < 0.05

Result of ANOVA - Since the p Value is less than the


Significance level i.e. 0.05, we REJECT the Null.
Thus we conclude that at least one of the sample
come from a different population.
Appendix - z-Test
Z Test
● Statistical test to determine if
two population means are
different when the variances are
known and the sample size is
large.

● Used to test hypotheses in which


the z-test follows a normal
distribution.

● Also, t-tests assume the standard deviation is unknown,


while z-tests assume it is known.
Z Distribution/ Normal Distribution/ Bell Curve
Z Test - When used?

● Samples are drawn at random

● Samples are taken from an


independent population

● Population variance is known

● Sample size is large n ≥ 30


Assumption of z-distribution
● Scale of measurement applied to the data
follows a continuous or ordinal scale
● The data is collected from a
representative, randomly selected
portion of the total population.

● Data, when plotted is reasonably large


and results in a normal distribution,
bell-shaped distribution curve.

● Homogeneity of Variance. Homogeneous, or


equal, variance exists when the standard
deviations of samples are approximately
equal.
Calculation of z-statistics
z = Student's t-test

= Sample Mean

z 𝛍 = Population mean

��
𝛔 = Population Standard Deviation

n = Sample size
Thank You!

Kabiir Nagpal/ 8368886140/ touchstone.in@gmail.com

You might also like