Hypothesis Testing
Hypothesis Testing
Hypothesis Testing
TESTING
WHAT IS HYPOTHESIS TESTING ?
• Hypothesis testing is a form of statistical inference that uses data from a sample to draw
conclusions about a population parameter or a population probability distribution. First, a
tentative assumption is made about the parameter or distribution. This assumption is called the
null hypothesis and is denoted by H0. An alternative hypothesis (denoted Ha), which is the
opposite of what is stated in the null hypothesis, is then defined. The hypothesis-testing
procedure involves using sample data to determine whether or not H0 can be rejected. If H0 is
rejected, the statistical conclusion is that the alternative hypothesis Ha is true.
PARAMETRIC VS NON-PARAMETRIC
TESTING TESTING
The non-parametric test does not require
• In Statistics, a parametric test is a kind of
any population distribution, which is
hypothesis test which gives generalizations
meant by distinct parameters. It is also a
for generating records regarding the mean
kind of hypothesis test, which is not based
of the primary/original population. The t-
on the underlying hypothesis. In the case
test is carried out based on the students’ t-
of the non-parametric test, the test is based
statistic, which is often used in that value.
on the differences in the median. So this
• The t-statistic test holds on the underlying
kind of test is also called a distribution-
hypothesis, which includes the normal free test. The test variables are determined
distribution of a variable. In this case, the on the nominal or ordinal level. If the
mean is known, or it is considered to be
independent variables are non-metric, the
known. For finding the sample from the
non-parametric test is usually performed.
population, population variance is
identified. It is hypothesized that the
variables of concern in the population are
estimated on an interval scale.
What are the examples of parametric test?
T-test and Z-test are the examples of parametric test, in statistics
Q5
What are the examples of non-parametric test?
Kruskal-Wallis and Mann-Whitney
Definition of Z-Test
Z-test is the statistical test used to analyze whether
two population means are different or not when the
variances are known, and the sample size is large.
The z-test is based on the normal distribution.
The assumptions for Z-test are:
•All observations are independent.
•The size of the sample should be more than 30.
•The Z distribution is normal when the mean is 0, and
the variance is 1.
The test statistic is defined by:
Xbar is the sample mean
• In statistics, a Type I error is a false positive conclusion, while a Type II error is a false
negative conclusion.
• Making a statistical decision always involves uncertainties, so the risks of making these errors
are unavoidable in hypothesis testing.
• The probability of making a Type I error is the significance level, or alpha (α), while the
probability of making a Type II error is beta (β). These risks can be minimized through careful
planning in your study design.
• Example: Type I vs Type II error
• You decide to get tested for COVID-19 based on mild symptoms. There are two errors that
could potentially occur:
• Type I error (false positive): the test result says you have coronavirus, but you actually don’t.
• Type II error (false negative): the test result says you don’t have coronavirus, but you
actually do.
P-VALUE , SIGNIFICANCE LEVEL, AND
CONFIDENCE INTERVAL
• The p-value, significance level, and confidence interval are vital concepts in statistical hypothesis
testing. The p-value measures the strength of evidence against a null hypothesis. A small p-value
(typically less than 0.05) suggests strong evidence against the null hypothesis, indicating that the
observed data is unlikely if the null hypothesis is true. The significance level, often denoted as
alpha (α), is predetermined and represents the threshold for determining statistical significance.
It's the probability of rejecting the null hypothesis when it's actually true. Confidence intervals, on
the other hand, provide a range of values within which we are confident that the true population
parameter lies. Typically, a 95% confidence interval is used, implying that if the experiment were
repeated many times, 95% of the intervals would contain the true parameter. These statistical
tools collectively guide researchers in drawing conclusions from data analysis, helping to ensure
the reliability and validity of research findings.
NORMALITY TESTS
• Normality tests, such as the Shapiro-Wilk and Kolmogorov-Smirnov tests, assess whether a
dataset follows a normal distribution, a fundamental assumption in many statistical analyses.
The Shapiro-Wilk test is sensitive to deviations from normality in smaller sample sizes, while
the Kolmogorov-Smirnov test is useful for larger datasets. These tests help researchers
determine if their data can be appropriately analyzed using methods that assume normality, like
parametric tests. If the data fails the normality test, alternative approaches may be necessary,
ensuring accurate and reliable conclusions from statistical analysis.
CENTRAL LIMIT THEOREM
• The Central Limit Theorem (CLT) is a key statistical principle stating that regardless of the
population's distribution, the distribution of sample means tends to approximate a normal
distribution with a large enough sample size. This theorem enables reliable inference about
population parameters from sample data, even when the population distribution is unknown or
non-normal. It forms the foundation for many statistical analyses, allowing us to make accurate
conclusions and decisions based on sample data.