Module 3-4

Module 3
Hypothesis
Introduction to Hypothesis
In the context of research, a hypothesis is a testable statement or proposition
that serves as a starting point for investigation. It is a tentative assumption or
prediction about the relationship between two or more variables.
A hypothesis is an assumption that is made based on some evidence. This is
the initial point of any investigation that translates the research questions into
predictions. It includes components like variables, population and the relation
between the variables.
What is hypothesis
It is a foundational concept in scientific research and inquiry. A hypothesis
serves as a starting point for investigation and is formulated based on
existing knowledge, observations, and theories
Types of Hypothesis
• Simple Hypothesis: A simple hypothesis predicts a
relationship between two variables. For example,
"Increasing sunlight exposure leads to higher plant
growth.“
• Complex Hypothesis: A complex hypothesis predicts
relationships among multiple variables. For example,
"The interaction between sleep quality, stress levels,
and dietary habits affects academic performance.“
• Directional Hypothesis: This type of hypothesis
predicts the direction of the relationship between
variables. For example, "Higher levels of exercise
lead to decreased body weight.“
• Non-Directional Hypothesis: Also known as a two-tailed
hypothesis, it predicts a relationship between variables without
specifying the direction. For example, "There is a relationship
between caffeine consumption and sleep quality.“
• Associative Hypothesis: This hypothesis predicts a correlation
or association between variables without necessarily implying
a cause-and-effect relationship. For example, "There is an
association between smartphone use and eye strain.“
• Causal Hypothesis: A causal hypothesis proposes a cause-and-
effect relationship between variables. For example, "Increased
sugar intake causes higher instances of dental cavities.“
Major two types of Hypothesis
• Null Hypothesis (H0): This is a statement of no effect
or no relationship. It suggests that there is no
significant difference or connection between variables.
Researchers often aim to test the null hypothesis to
determine whether there is enough evidence to reject it
in favor of an alternative hypothesis.
• Alternative Hypothesis (H1 or Ha): Also known as the
research hypothesis, this is the statement that suggests
a specific effect or relationship between variables. It is
the hypothesis researchers are trying to support with
their data.
Source of Hypothesis
• The resemblance between the phenomenon.
• Observations from past studies, present-day experiences
and from the competitors.
• Scientific theories.
• General patterns that influence the thinking process of
people.
• Personal Experience
• Imagination & Thinking
• Previous Study
• Culture
Steps involved to Test Hypothesis
• State the Null Hypothesis and Alternate Hypothesis
• Set the criteria for the decision (Level of Significance):-
 It is the probability of rejecting the null hypothesis when it is true. Also
called as Type I error
 It is set prior to conducting the hypothesis testing
 It can be set at 5% or lower.
 For eg, Significance level of 5% indicates a 5% risk of concluding that a
difference exists when there is no actual difference
 Lower significance level indicates that strong evidence is required before
rejecting the null hypothesis
 It is denoted by alpha.
• Collect the data (Select your sample)
• Decide which test is to be performed
• Compute the test statistics/ Value
• Find the critical value:-
 Critical value is the cutoff value which is to be compared with the test
value to take a decision about the null hypothesis
 It divides the graph into two sections: Rejection area and Acceptance area
 It test value falls into the rejection area then reject the null hypothesis
 It is derived from the level of significance of the test.
 It is the table value of level of siginificance
• Compare the critical value with the test statistics/value:-
 If the value of test statistics is greater than the critical value “Reject the
null hypothesis”.
 If the value of test statistics is less than the critical value “Do not reject the
null hypothesis.
• Make a decision to either reject or not reject the null hypothesis
Errors in Hypothesis Testing
Decision Accept Ho Reject Ho
Ho is True Correct Decision (No error) Type 1 error (alpha error)
Probability (1-alpha) Probability (alpha)
Ho is False Type II error (beta error) Correct Decision (No error)
Probability (beta) Probability (1-beta)
Univariate and Bivariate Data Analysis:
Univariate data :
This type of data consists of only one variable. The
analysis of Univariate data is thus the simplest form of analysis
since the information deals with only one quantity that changes. It
does not deal with causes or relationships and the main purpose of
the analysis is to describe the data and find patterns that exist
within it.
The example of a univariate data can be height.

Bivariate data:
This type of data involves two different variables. The analysis of this
type of data deals with causes and relationships and the analysis is
done to find out the relationship among the two variables. Example
of bivariate data can be temperature and ice cream sales in summer
season.
What is Hypothesis testing?
The theory, methods, and practice of testing a
hypothesis by comparing it with the null
hypothesis. The null hypothesis is only rejected if
its probability falls below a predetermined
significance level, in which case the hypothesis
being tested is said to have that level of
significance.
Example: - A teacher assumes that 60% of his
college's students come from lower-middle-class
families.
Hypothesis Testing
Statistical Tests:-
• Statistical Tests are conducted to test the hypothesis and to find the
inference about the population.
• For that samples are selected and various tests are performed on them to
find the inference about the population under study.
• These are of two types: Parametric Tests and Non Parametric Tests.
Parametric Tests:-
• Parameters tests are applied under the circumstances where the population
is normally distributed or is assumed to be normally distributed.
• Parameters like mean, standard deviation etc are used
• For example:- T – Test, Z – Test, F – Test, ANOVA.
• These are applied where the data is quantitative.
• These are applied where the scale of measurement is either an interval or
ratio scale.
Non – Parametric Tests:-
• Non Parametric tests are applied under the circumstances where the
population is not normally distributed or is not assumed to be normally
distributed.
• Where parametric tests cannot be applied, then non parametric tests come
into play.
• These tests are also called as distribution free tests.
• Parameters like mean, standard deviation etc are not used
• For example, Chi – Square test, U – Test (Mann Whitney Test), Spearman's
Rank Correlation Test.
• These are applied where the data is qualitative
• These are applied where the scale of measurement is either an ordinal or a
nominal scale
Difference between Parametric and Non Parametric Test
Parametric Test Non Parametric Test
Assumes the distribution to be normal Does not assume the distribution to be
normal
Make assumptions about the populations Does not make any assumptions about the
population
Parameters such as mean, Standard No such parameters are used
Deviation etc are used
Applied in case of quantitative data Applied in case of qualitative data
Scale of measurement is either interval or Scale of measurement is either ordinal or
ratio nominal
More powerful (As they possess the Less powerful than a parametric tests
ability to reject the null hypothesis, when
it is false)
Types of Hypothesis Testing:
Parametric Test:-
 T Test
 Z Test
 F Test
 ANOVA
T Test
• It is a parametric test of hypothesis testing based on Students T
distribution.
• It was developed by William Sealy Gosset.
• It is essentially, testing the significance of the difference of the mean
values when the sample size is small (i.e. less than 30) and when
population standard deviation is not available.
• It assumes:
 Population distribution is normal
 Samples are random and independent
 Sample size is small
 Population standard deviation is not known.
Z Test:-
 It is a parametric test of hypothesis testing.
 It is used to determine whether the means are different when the
population variance is known and the sample size is large (i.e. greater than
30)
• It assumes:
 Population distribution is normal and
 Sample size is large
 Population Standard deviation is known.
F Test
 It is a parametric test of hypothesis testing based on Snedecor F
distribution.
 F test is named after its test statistics, F which was named in the honour of
Sir Ronald Fisher.
 It is a test for the null hypothesis that two normal populations have the
same variance.
 An F test is regarded as a comparison of equality of sample variances.
 F statistics is simply a ratio of two variances.
 By changing the variance in the ratio, F test becomes a very flexible test. It
can then be used to:
• Test the overall significance for a regression model.
• To compare the fits of different models and
• To test the equality of means.
 It assumes
• Population distribution is normal and
• Samples are drawn randomly and independently.
ANOVA
• Also called as (Analysis of variance), it is a parametric test of hypothesis
testing.
• It was developed by Ronald Fisher, also referred to as Fishers ANOVA.
• It is an extension of T Test and Z Test
• It is used to test the significance of the differences of the mean values
among more than two sample groups.
• It uses F Test to statistically test the equality of means and the relative
variances between them.
It assumes
 Population distribution is noraml and
 Homegeneity of sample variance.
• One way ANOVA and Two way ANOVA are its types.
Limitations of Hypothesis Test :
• Assumptions
• Sample Size
• Type I and Type II Errors
• Choice of Test
• Multiple Testing
• Data Quality
• Publication Bias
• External Validity
Statistical analysis introduction :
Statistical analysis, or statistics, is the process
of collecting and analysing data to identify patterns and
trends, remove bias and inform decision-making. It's an
aspect of business intelligence that involves the
collection and scrutiny of business data and the
reporting of trends.
The results acquired from research project are
meaningless raw data unless analysed with statistical
tools. Therefore, determining statistics in research is of
utmost necessity to justify research findings.
Importance of statistical analysis:
• Use the proper methods to collect the data

• Employ the correct analyses
• Effectively present the results
• Account for variation and differences
• Perform high quality research analysis
• To draw meaningful interpretations
Bivariate analysis:
Bivariate analysis is a statistical method
examining how two different things are related. The
bivariate analysis aims to determine if there is a
statistical link between the two variables and, if so,
how strong and in which direction that link.
This study explores the relationship of two
variables as well as the depth of this relationship to
figure out if there are any discrepancies between two
variables and any causes of this difference. Some of the
examples are percentage table, scatter plot, etc.
Importance of bivariate analysis:
• Bivariate analysis helps identify cause and

effect relationships
• Helps researchers make predictions
• Bivariate analysis helps identify trends and
patterns
Multivariate analysis
Meaning :
Multivariate analysis is a statistical technique used to
analyze data that involves multiple variables simultaneously.
It allows researchers to understand the relationships between
several variables and how they interact with each other.
There are two main types of variables in

multivariate analysis:
1. Dependent variables and
2. Independent variables.
1.Dependent Variables:
Dependent variables are the outcomes or responses that
researchers are interested in studying. They are the variables that are
measured or observed to assess the effects of the independent variables. In
other words, they depend on the independent variables for their values.
Dependent variables are often represented on the y-axis of a graph
Example: Let's consider a study examining the relationship between
studying time and exam scores. In this case, the exam scores would be the
dependent variable, as they depend on the amount of time spent studying.
Researchers would collect data on both studying time and exam scores for
each participant, and then analyze the relationship between these variables
using multivariate techniques.
2. Independent Variables:
Independent variables are the factors that are
believed to influence or affect the dependent variables. These are the
variables that researchers manipulate or observe to see how they impact
the outcome of interest. Independent variables are often represented on
the x-axis of a graph.
Example: In the same study mentioned earlier, the amount of time

spent studying is the independent variable. Researchers would vary the
studying time across different groups of participants (e.g., some study
for 1 hour, others for 3 hours, etc.) and then assess how this
manipulation of the independent variable affects the exam scores
(dependent variable).
Statistical analysis through correlation
and regression:
Correlation:
Correlation is a statistical measure that expresses the
extent to which two variables are linearly related (meaning they
change together at a constant rate). It's a common tool for
describing simple relationships without making a statement about
cause and effect.
Types:
1. Spearman correlation coefficient
2. Karl Pearson correlation coefficient
Spearmen’s correlation coefficient:
• It is a measure of association between two sets
of ranks.
• Use Spearman rank correlation when you have
two ranked variables, and you want to see
whether the two variables covary; whether, as
one variable increases, the other variable tends
to increase or decrease.
Formula to calculate correlation:
Karl Pearson correlation coefficient:
• It is used when degree of association between two metric-
scaled(interval or ratio) variables is to be examined.
• Karl Pearson's coefficient of correlation is an extensively used
mathematical method in which the numerical representation is
applied to measure the level of relation between linearly
related variables. The coefficient of correlation is expressed by
“r”.
Formula:
• It ranges from greater than “+0.8” to “+1.0”. A correlation of
+1 indicates that the variables are perfectly positively
correlated. It means if one variable moves by 10%, other
variables will also move by 10% in the same direction. So, it
gives both strength and direction.
• Correlation coefficients whose magnitude are between 0.5
and 0.7 indicate variables which can be considered
moderately correlated. Correlation coefficients whose
magnitude are between 0.3 and 0.5 indicate variables which
have a low correlation.
Regression:
• A regression is a statistical technique that relates a dependent variable
to one or more independent (explanatory) variables. A regression
model is able to show whether changes observed in the dependent
variable are associated with changes in one or more of the explanatory
variables.
• Regression analysis is a reliable method of identifying which
variables have impact on a topic of interest. The process of
performing a regression allows you to confidently determine which
factors matter most, which factors can be ignored, and how these
factors influence each other.
Types:
1. Simple regression analysis
2. Multiple regression analysis

Module 3-4

Uploaded by

Copyright:

Available Formats

Module 3-4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 3-4

Uploaded by

Copyright:

Available Formats

Module 3

This type of data consists of only one variable. The

analysis of Univariate data is thus the simplest form of analysis

The example of a univariate data can be height.

• Use the proper methods to collect the data

• Bivariate analysis helps identify cause and

There are two main types of variables in

Example: In the same study mentioned earlier, the amount of time

You might also like