Statistical Tests

Published on Explorable.com (https://explorable.
com)
Statistical Tests
Table of Contents
1 Statistical Hypothesis Testing
2 Relationship Between Variables
2.1 Linear Relationship
2.2 Non-Linear Relationship
3 Statistical Correlation
3.1 Pearson Product-Moment Correlation
3.2 Spearman Rank Correlation Coefficient
3.3 Partial Correlation Analysis
3.4 Correlation and Causation
4 Regression
4.1 Linear Regression Analysis
4.2 Multiple Regression Analysis
4.3 Correlation and Regression
5 Student’s T-Test
5.1 Independent One-Sample T-Test
5.2 Independent Two-Sample T-Test
5.3 Dependent T-Test for Paired Samples
5.4 Student’s T-Test (II)
6 ANOVA
6.1 One-Way ANOVA
6.2 Two-Way ANOVA
6.3 Factorial Anova
6.4 Repeated Measures ANOVA
7 Nonparametric Statistics
7.1 Cohen’s Kappa
7.2 Mann-Whitney U-Test
7.3 Wilcoxon Signed Rank Test
8 Other Ways to Analyse Data

8.1 Chi Square Test
1
8.2 Z-Test
8.3 F-Test
8.4 Factor Analysis
8.5 ROC Curve Analysis
8.6 Meta Analysis
2
Copyright Notice
Copyright © Explorable.com 2014. All rights reserved, including the right of reproduction in
whole or in part in any form. No parts of this book may be reproduced in any form without
written permission of the copyright owner.
Notice of Liability
The author(s) and publisher both used their best efforts in preparing this book and the
instructions contained herein. However, the author(s) and the publisher make no warranties of
any kind, either expressed or implied, with regards to the information contained in this book,
and especially disclaim, without limitation, any implied warranties of merchantability and
fitness for any particular purpose.
In no event shall the author(s) or the publisher be responsible or liable for any loss of profits or
other commercial or personal damages, including but not limited to special incidental,
consequential, or any other damages, in connection with or arising out of furnishing,
performance or use of this book.
Trademarks
Throughout this book, trademarks may be used. Rather than put a trademark symbol in every
occurrence of a trademarked name, we state that we are using the names in an editorial
fashion only and to the benefit of the trademark owner with no intention of infringement of the
trademarks. Thus, copyrights on individual photographic, trademarks and clip art images
reproduced in this book are retained by the respective owner.
Information
Published by Explorable.com.
Cover design by Explorable / Gilli.me.
3
1 Statistical Hypothesis Testing
Statistical hypothesis testing is used to determine whether an experiment conducted

provides enough evidence to reject a proposition.
It is also used to remove the chance process in an experiment and establish its validity and
relationship with the event under consideration.
For example, suppose you want to study the effect of smoking on the
occurrence of lung cancer cases. If you take a small group, it may
happen that there appears no correlation at all, and you find that there
are many smokers with healthy lungs and many non-smokers with lung
cancer.
However, it can just happen that this is by chance, and in the overall population this isn't true.
In order to remove this element of chance and increase the reliability of our hypothesis, we
use statistical hypothesis testing.
In this, you will first assume a hypothesis that smoking and lung cancer are unrelated. This is
called the 'null hypothesis', which is central to any statistical hypothesis testing.
You should therefore first choose a distribution for the experimental group. Normal distribution
is one of the most common distributions encountered in nature, but it can be different in
different special cases.
The Critical Value

There should then be limits set on the critical value, beyond which you can assume that the
experiment proves that the null hypothesis is false and therefore using statistical hypothesis
testing, the experiment shows there is enough evidence to reject the null hypothesis. This is
generally set at 5% or 1% chance probability.
This means that if the experiment suggests that the probability of a chance event in the
experiment is less than this critical value, then the null hypothesis can be rejected.
4
If the null hypothesis is rejected, then we need to look for an alternative hypothesis that is in
line with the experimental observations.
There is also the gray area in between, like at the 15-20% level, in which it is hard to say
whether the null hypothesis can be rejected. In such cases, we can say that there is reason
enough to doubt the validity of the null hypothesis but there isn't enough evidence to suggest
that we reject the null hypothesis altogether.
A result in the gray area often leads to more exploration before concluding anything.
Accepting a Hypothesis
The other thing with statistical hypothesis testing is that there can only be an experiment
performed that doubts the validity of the null hypothesis, but there can be no experiment that
can somehow demonstrate that the null hypothesis is actually valid. This because of the
falsifiability-principle in the scientific method.
Therefore it is a tricky situation for someone who wants to show the independence of the two
events, like smoking and lung cancer in our previous example.
This problem can be overcome using a confidence interval and then arguing that the
experimental data reveals that the first event has a negligible (as much as the confidence
interval) effect, if at all, on the second event.
In the figure below, we can see that one can argue the independence is within 0.05 times the
standard deviation.
5
How to cite this article:
Siddharth Kalla (Nov 15, 2009). Statistical Hypothesis Testing. Retrieved from
Explorable.com: https://explorable.com/statistical-hypothesis-testing
6
2 Relationship Between Variables
It is very important to understand relationship between variables to draw the right

conclusion from a statistical analysis. The relationship between variables determines
how the right conclusions are reached. Without an understanding of this, you can fall
into many pitfalls that accompany statistical analysis and infer wrong results from your
data.
There are several different kinds of relationships between variables. Before drawing a
conclusion, you should first understand how one variable changes with the other. This means
you need to establish how the variables are related - is the relationship linear or quadratic or
inverse or logarithmic or something else?
Suppose you measure a volume of a gas in a cylinder and measure its pressure. Now you
start compressing the gas by pushing a piston all while maintaining the gas at the room
temperature. The volume of gas decreases while the pressure increases. You note down
different values on a graph paper.
If you take enough measurements, you can see a shape of a parabola defined by
xy=constant. This is because gases follow Boyle's law that says when temperature is
constant, PV = constant. Here, by taking data you are relating the pressure of the gas with its
volume. Similarly, many relationships are linear in nature.
Relationships in Physical and Social Sciences

Relationships between variables need to be studied and analyzed before drawing conclusions
based on it. In natural science and engineering, this is usually more straightforward as you
can keep all parameters except one constant and study how this one parameter affects the
result under study.
However, in social sciences, things get much more complicated because parameters may or
may not be directly related. There could be a number of indirect consequences and deducing
cause and effect can be challenging.
7
Only when the change in one variable actually causes the change in another parameter is
there a causal relationship. Otherwise, it is simply a correlation. Correlation doesn't imply
causation. There are ample examples and various types of fallacies in use.
A famous example to prove the point: Increased ice-cream sales shows a strong correlation to
deaths by drowning. It would obviously be wrong to conclude that consuming ice-creams
causes drowning. The explanation is that more ice-cream gets sold in the summer, when
more people go to the beach and other water bodies and therefore increased deaths by
drowning.
Positive and Negative Correlation

Correlation between variables can be positive or negative. Positive correlation implies an
increase of one quantity causes an increase in the other whereas in negative correlation, an
increase in one variable will cause a decrease in the other.
It is important to understand the relationship between variables to draw the right conclusions.
Even the best scientists can get this wrong and there are several instances of how studies get
correlation and causation mixed up.
Siddharth Kalla (Jul 26, 2011). Relationship Between Variables. Retrieved from
Explorable.com: https://explorable.com/relationship-between-variables
8
2.1 Linear Relationship
A linear relationship is one where increasing or decreasing one variable n times will
cause a corresponding increase or decrease of n times in the other variable too. In
simpler words, if you double one variable, the other will double as well.
Some Examples of Linear Relationships

First, let us understand linear relationships. These relationships between variables are such
that when one quantity doubles, the other doubles too.
For example:
For a given material, if the volume of the material is doubled, its weight will also double.
This is a linear relationship. If the volume is increased 10 times, the weight will also
increase by the same factor.
If you take the perimeter of a square and its side, they are linearly related. If you take a
square that has sides twice as large, the perimeter will also become twice larger.
The cost of objects is usually linear. If a notebook costs $1, then ten notebooks will cost
$10.
The force of gravity between the earth and an object is linear in nature. If the mass of
the object doubles, the force of gravity acting on it will also be double.
As can be seen from the above examples, a number of very important physical phenomena
can be described by a linear relationship.
Apart from these physical processes, there are many correlations between variables that can
be approximated by a linear relationship. This greatly simplifies a problem at hand because a
linear relationship is much simpler to study and analyze than a non-linear one.
Constant of Proportionality
The constant of proportionality is an important concept that emerges from a linear
relationship. By using this constant, we can formulate the actual formula that describes one
variable in terms of the other.
9
For example, in our first example, the constant of proportionality between mass and volume is
called density. Thus we can mathematically write:
Mass = density x volume
The constant of proportionality, the density, is defined from the above equation - it is the mass
per unit volume of the material.
If you plot these variables on a graph paper, the slope of the straight line is the constant of
proportionality.
In this example, if you plot mass on the y-axis and volume on the x-axis, you will find that the
slope of the line thus formed gives the density.
Linear relationships are not limited to physical phenomena but are frequently encountered in
all kinds of scientific research and methodologies. An understanding of linear relationships is
essential to understand these relationships between variables.
Siddharth Kalla (Jan 10, 2011). Linear Relationship. Retrieved from Explorable.com:
https://explorable.com/linear-relationship
10
2.2 Non-Linear Relationship
Non-linear relationship is fundamental to most physical and statistical phenomena and

their study is important to fully understand the world around.
Linear relationships are the easiest to understand and study and a number of very important
physical phenomena are linear. However, it doesn't cover the whole ambit of our
mathematical techniques and non-linear relationships are fundamental to a number of most
important and intriguing physical and social phenomena around.
Examples of Non-Linear Relationships

As their name suggest, non-linear relationships are not linear, which means by doubling one
variable, the other variable will not double.
There are an endless variety of non-linear relationships that one can encounter. However,
most of them can still fit into other categories, like polynomial, logarithmic, etc.
Examples:
The side of a square and its area are not linear. In fact, this is a quadratic relationship. If
you double the side of a square, its area will increase 4 times.
While charging a capacitor, the amount of charge and time are non-linearly dependent.
Thus the capacitor is not twice as charged after 2 seconds as it was after 1 second. This
is an exponential relationship.
Studying Non-Linear Relationships

Even though non-linear relationships are much more complicated than linear ones, they can
be studied in their own right. If you are studying these, you should first see if they fit any
standard shapes like parabolas or exponential curves. These are commonly occurring
relationships between variables.
For example, the pressure and volume of nitrogen during an isentropic expansion are related
as PV1.4 which is highly non-linear but fits neatly into this equation.
Next, a number of non-linear relationships are monotonic in nature. This means they do not
oscillate and steadily increase or decrease. This is good to study because they behave
11
qualitatively like linear relationships for a number of cases.
Approximations
A linear relationship is the simplest to understand and therefore can serve as the first
approximation of a non-linear relationship. The limits of validity need to be well noted. In fact,
a number of phenomena were thought to be linear but later scientists realized that this was
only true as an approximation.
Consider special theory of relativity that redefined our perceptions of space and time. It gives
the full non-linear relationship between variables. They can very well be approximated to be
linear in Newtonian mechanics as a first approximation at lower speeds. If you consider
momentum, in Newtonian mechanics it is linearly dependent on velocity. If you double the
velocity, the momentum will double. However, at speeds approaching those of light, this
becomes a highly non-linear relationship.
Some of the greatest scientific challenges need the study of non-linear relationships. The
study of turbulence, which is one of the greatest unsolved problems in science and
engineering, needs the study of a non-linear differential equation.
Siddharth Kalla (Feb 17, 2011). Non-Linear Relationship. Retrieved from Explorable.com:
https://explorable.com/non-linear-relationship
12
3 Statistical Correlation
Statistical correlation is a statistical technique which tells us if two variables are

related.
For example, consider the variables family income and family expenditure. It is well known
that income and expenditure increase or decrease together. Thus they are related in the
sense that change in any one variable is accompanied by change in the other variable.
Again price and demand of a commodity are related variables; when price increases demand
will tend to decreases and vice versa.
If the change in one variable is accompanied by a change in the other, then the variables are
said to be correlated. We can therefore say that family income and family expenditure, price
and demand are correlated.
Relationship Between Variables

Correlation can tell you something about the relationship between variables. It is used to
understand:
1. whether the relationship is positive or negative

2. the strength of relationship.
Correlation is a powerful tool that provides these vital pieces of information.
In the case of family income and family expenditure, it is easy to see that they both rise or fall
together in the same direction. This is called positive correlation.
In case of price and demand, change occurs in the opposite direction so that increase in one
is accompanied by decrease in the other. This is called negative correlation.
Coefficient of Correlation
Statistical correlation is measured by what is called coefficient of correlation (r). Its numerical
value ranges from +1.0 to -1.0. It gives us an indication of the strength of relationship.
In general, r > 0 indicates positive relationship, r < 0 indicates negative relationship while r = 0
13
indicates no relationship (or that the variables are independent and not related). Here r = +1.0
describes a perfect positive correlation and r = -1.0 describes a perfect negative correlation.
Closer the coefficients are to +1.0 and -1.0, greater is the strength of the relationship between
the variables.
As a rule of thumb, the following guidelines on strength of relationship are often useful (though
many experts would somewhat disagree on the choice of boundaries).
14
Value of r Strength of relationship
-1.0 to -0.5 or 1.0 to 0.5 Strong
-0.5 to -0.3 or 0.3 to 0.5 Moderate
-0.3 to -0.1 or 0.1 to 0.3 Weak
-0.1 to 0.1 None or very weak
Correlation is only appropriate for examining the relationship between meaningful quantifiable
data (e.g. air pressure, temperature) rather than categorical data such as gender, favorite
color etc.
Disadvantages
While 'r' (correlation coefficient) is a powerful tool, it has to be handled with care.
1. The most used correlation coefficients only measure linear relationship. It is therefore
perfectly possible that while there is strong non linear relationship between the variables
, r is close to 0 or even 0. In such a case, a scatter diagram can roughly indicate the
existence or otherwise of a non linear relationship.
2. One has to be careful in interpreting the value of 'r'. For example, one could compute 'r'
between the size of shoe and intelligence of individuals, heights and income.
Irrespective of the value of 'r', it makes no sense and is hence termed chance or non-
sense correlation.
3. 'r' should not be used to say anything about cause and effect relationship. Put
differently, by examining the value of 'r', we could conclude that variables X and Y are
related. However the same value of 'r' does not tell us if X influences Y or the other way
round. Statistical correlation should not be the primary tool used to study causation,
because of the problem with third variables.
Explorable.com (May 2, 2009). Statistical Correlation. Retrieved from Explorable.com:

https://explorable.com/statistical-correlation
15
3.1 Pearson Product-Moment Correlation
Pearson Product-Moment Correlation is one of the measures of correlation which

quantifies the strength as well as direction of such relationship. It is usually denoted by
Greek letter ρ.
In the study of relationships, two variables are said to be correlated if change in one variable
is accompanied by change in the other - either in the same or reverse direction.
Conditions
This coefficient is used if two conditions are satisfied
1. the variables are in the interval or ratio scale of measurement

2. a linear relationship between them is suspected
Positive and Negative Correlation

The coefficient (ρ) is computed as the ratio of covariance between the variables to the product
of their standard deviations. This formulation is advantageous.
First, it tells us the direction of relationship. Once the coefficient is computed, ρ > 0 will indicate
positive relationship, ρ < 0 will indicate negative relationship while ρ = 0 indicates non existence
of any relationship.
Second, it ensures (mathematically) that the numerical value of ρ range from -1.0 to +1.0. This
enables us to get an idea of the strength of relationship - or rather the strength of linear
relationship between the variables. Closer the coefficients are to +1.0 or -1.0, greater is the
strength of the linear relationship.
As a rule of thumb, the following guidelines are often useful (though many experts could
somewhat disagree on the choice of boundaries).
Range of Ρ
16
Value of ρ Strength of relationship
-1.0 to -0.5 or 1.0 to 0.5 Strong
-0.5 to -0.3 or 0.3 to 0.5 Moderate
-0.3 to -0.1 or 0.1 to 0.3 Weak
-0.1 to 0.1 None or very weak
Properties of Ρ
This measure of correlation has interesting properties, some of which are enunciated below:
1. It is independent of the units of measurement. It is in fact unit free. For example, ρ

between highest day temperature (in Centigrade) and rainfall per day (in mm) is not
expressed either in terms of centigrade or mm.
2. It is symmetric. This means that ρ between X and Y is exactly the same as ρ between Y
and X.
3. Pearson's correlation coefficient is independent of change in origin and scale. Thus ρ
between temperature (in Centigrade) and rainfall (in mm) would numerically be equal to ρ
between temperature (in Fahrenheit) and rainfall (in cm).
4. If the variables are independent of each other, then one would obtain ρ = 0. However, the
converse is not true. In other words ρ = 0 does not imply that the variables are
independent - it only indicates the non existence of a non-linear relationship.
Caveats and Warnings

While ρ is a powerful tool, it is a much abused one and hence has to be handled carefully.
1. People often tend to forget or gloss over the fact that ρ is a measure of linear relationship.
Consequently a small value of ρ is often interpreted to mean non existence of relationship
when actually it only indicates non existence of a linear relationship or at best a very
weak linear relationship.
Under such circumstances it is possible that a non linear relationship exists.
A scatter diagram can reveal the same and one is well advised to observe the same
before firmly concluding non existence of a relationship. If the scatter diagram points to
a non linear relationship, an appropriate transformation can often attain linearity in which
case ρ can be recomputed.
2. One has to be careful in interpreting the value of ρ.
17
For example, one could compute ρ between size of a shoe and intelligence of individuals,
heights and income. Irrespective of the value of ρ, such a correlation makes no sense
and is hence termed chance or non-sense correlation.
3. ρ should not be used to say anything aboutcause and effect relationship. Put differently,
by examining the value of ρ, we could conclude that variables X and Y are related.
However the same value of ρ does not tell us if X influences Y or the other way round - a
fact that is of grave import in regression analysis.
Explorable.com (Oct 8, 2009). Pearson Product-Moment Correlation. Retrieved from

Explorable.com: https://explorable.com/pearson-product-moment-correlation
18
3.2 Spearman Rank Correlation Coefficient
Spearman Rank Correlation Coefficient is a non-parametric measure of correlation,

using ranks to calculate the correlation.
Spearman Rank
Correlation Coefficient
uses ranks to calculate
correlation.
Whenever we are interested to know if two variables are related to each other, we use a
statistical technique known as correlation. If the change in one variable brings about a change
in the other variable, they are said to be correlated.
A well known measure of correlation is the Pearson product moment correlation coefficient
which can be calculated if the data is in interval/ ratio scale.
It is also known as the

"spearman rho" or
"spearman r correlation".
The Spearman Rank Correlation Coefficient is its analogue when the data is in terms of ranks.
One can therefore also call it correlation coefficient between the ranks. The correlation
coefficient is sometimes denoted by rs.
Example
As an example, let us consider a musical (solo vocal) talent contest where 10 competitors are
evaluated by two judges, A and B. Usually judges award numerical scores for each contestant
after his/her performance.
A product moment correlation coefficient of scores by the two judges hardly makes sense
here as we are not interested in examining the existence or otherwise of a linear relationship
between the scores.
19
What makes more sense is correlation between ranks of contestants as judged by the two
judges. Spearman Rank Correlation Coefficient can indicate if judges agree to each other's
views as far as talent of the contestants are concerned (though they might award different
numerical scores) - in other words if the judges are unanimous.
Interpretation of Numerical Values

The numerical value of the correlation coefficient, rs, ranges between -1 and +1. The
correlation coefficient is the number indicating the how the scores are relating.
rs = correlation coefficient
In general,
rs > 0 implies positive agreement among ranks

rs < 0 implies negative agreement (or agreement in the reverse direction)
rs = 0 implies no agreement
Closer rs is to 1, better is the agreement while rs closer to -1 indicates strong agreement in

the reverse direction.
Assigning Ranks
In order to compute Spearman Rank Correlation Coefficient, it is necessary that the data be
ranked. There are a few issues here.
Suppose that scores of the judges (out of 10 were as follows):
20
Contestant No. 1 2 3 4 5 6 7 8 9 10
Score by Judge A 5 9 3 8 6 7 4 8 4 6
Score by Judge B 7 8 6 7 8 5 10 6 5 8
Ranks are assigned separately for the two judges either starting from the highest or from the
lowest score. Here, the highest score given by Judge A is 9.
If we begin from the highest score, we assign rank 1 to contestant 2 corresponding to the
score of 9.
The second highest score is 8 but two competitors have been awarded the score of 8. In this
case both the competitors are assigned a common rank which is the arithmetic mean of ranks
2 and 3. In this way, scores of Judge A can be converted into ranks.
Similarly, ranks are assigned to the scores awarded by Judge B and then difference between
ranks for each contestant are used to evaluate rs. For the above example, ranks are as
follows.
21
Contestant No. 1 2 3 4 5 6 7 8 9 10
Ranks of scores by Judge A 7 1 10 2.5 5.5 4 8.5 2.5 8.5 5.5
Ranks of scores by Judge B 5.5 3 7.5 5.5 3 9.5 1 7.5 9.5 3
Spearman Rank
Correlation Coefficient is a
non-parametric measure of
correlation.
Spearman Rank Correlation Coefficient tries to assess the relationship between ranks without
making any assumptions about the nature of their relationship.
Hence it is a non-parametric measure - a feature which has contributed to its popularity and
wide spread use.
Advantages and Caveats

Other measures of correlation are parametric in the sense of being based on possible
relationship of a parameterized form, such as a linear relationship.
Another advantage with this measure is that it is much easier to use since it does not matter
which way we rank the data, ascending or descending. We may assign rank 1 to the smallest
value or the largest value, provided we do the same thing for both sets of data.
The only requirement is that data should be ranked or at least converted into ranks.
Explorable.com (Mar 20, 2009). Spearman Rank Correlation Coefficient. Retrieved from
Explorable.com: https://explorable.com/spearman-rank-correlation-coefficient
22
3.3 Partial Correlation Analysis
Partial correlation analysis involves studying the linear relationship between two
variables after excluding the effect of one or more independent factors.
Simple correlation does not prove to be an all-encompassing technique especially under the
above circumstances. In order to get a correct picture of the relationship between two
variables, we should first eliminate the influence of other variables.
For example, study of partial correlation between price and demand would involve studying
the relationship between price and demand excluding the effect of money supply, exports, etc.
What Correlation does not Provide

Generally, a large number of factors simultaneously influence all social and natural
phenomena. Correlation and regression studies aim at studying the effects of a large number
of factors on one another.
In simple correlation, we measure the strength of the linear relationship between two
variables, without taking into consideration the fact that both these variables may be
influenced by a third variable.
For example, when we study the correlation between price (dependent variable) and demand (
independent variable), we completely ignore the effect of other factors like money supply,
import and exports etc. which definitely have a bearing on the price.
23
Range
The correlation co-efficient between two variables X1 and X2, studied partially after eliminating
the influence of the third variable X3 from both of them, is the partial correlation co-efficient r
12.3.
Simple correlation between two variables is called the zero order co-efficient since in simple
correlation, no factor is held constant. The partial correlation studied between two variables by
keeping the third variable constant is called a first order co-efficient, as one variable is kept
constant. Similarly, we can define a second order co-efficient and so on. The partial
correlation co-efficient varies between -1 and +1. Its calculation is based on the simple
correlation co-efficient.
The partial correlation analysis assumes great significance in cases where the phenomena
under consideration have multiple factors influencing them, especially in physical and
experimental sciences, where it is possible to control the variables and the effect of each
variable can be studied separately. This technique is of great use in various experimental
designs where various interrelated phenomena are to be studied.
Limitations
However, this technique suffers from some limitations some of which are stated below.
The calculation of the partial correlation co-efficient is based on the simple correlation co-
efficient. However, simple correlation coefficient assumes linear relationship. Generally
this assumption is not valid especially in social sciences, as linear relationship rarely
exists in such phenomena.
As the order of the partial correlation co-efficient goes up, its reliability goes down.
Its calculation is somewhat cumbersome - often difficult to the mathematically uninitiated
(though software's have made life a lot easier).
Multiple Correlation
Another technique used to overcome the drawbacks of simple correlation is multiple
regression analysis.
Here, we study the effects of all the independent variables simultaneously on a dependent
variable. For example, the correlation co-efficient between the yield of paddy (X1) and the
other variables, viz. type of seedlings (X2), manure (X3), rainfall (X4), humidity (X5) is the
multiple correlation co-efficient R1.2345 . This co-efficient takes value between 0 and +1.
24
The limitations of multiple correlation are similar to those of partial correlation. If multiple and
partial correlation are studied together, a very useful analysis of the relationship between the
different variables is possible.
Explorable.com (Oct 9, 2010). Partial Correlation Analysis. Retrieved from Explorable.com:

https://explorable.com/partial-correlation-analysis
25
3.4 Correlation and Causation
Correlation and causation, closely related to confounding variables, is the incorrect

assumption that because something correlates, there is a causal relationship.
Causality is the area of statistics that is most commonly misused, and misinterpreted, by non-
specialists. Media sources, politicians and lobby groups often leap upon a perceived
correlation, and use it to 'prove' their own beliefs. They fail to understand that, just because
results show a correlation, there is no proof of an underlying causality.
Many people assume that because a poll, or a statistic, contains many numbers, it must be
scientific, and therefore correct.
Patterns of Causality in the Mind

Unfortunately, the human mind is built to try and subconsciously establish links between many
contrasting pieces of information. The brain often tries to construct patterns from randomness,
so jumps to conclusions, and assumes that a relationship exists.
Overcoming this tendency is part of academic training of students and academics in most
fields, from physics to the arts. The ability to evaluate data objectively, is absolutely crucial to
academic success.
The Sensationalism of the Media

The best way to look at the misuse of correlation and causation is by looking at an example:
A survey, as reported in a British newspaper, involved questioning a group of teenagers about

their behavior, and establishing whether their parents smoked. The newspaper reported, as
fact, that children whose parents smoked were more likely to exhibit delinquent behavior.
The results seemed to show a correlation between the two variables, so the paper printed the
headline; "Parental smoking causes children to misbehave." The Professor leading the
investigation stated that cigarette packets should carry warnings about social issues alongside
the prominent health warnings.
(Source http://www.criticalthinking.org.uk/smokingparents/)
26
However, there are a number of problems with this assumption. The first is that correlations
can often work in reverse. For example, it is perfectly possible that the parents smoked
because of the stress of looking after delinquent children.
Another cause may be that social class causes the correlation; the lower classes are usually
more likely to smoke and are more likely to have delinquent children. Therefore, parental
smoking and delinquency are both symptoms of the problem of poverty and may well have no
direct link between them.
Emotive Bias Influences Causality

This example highlights another reason behind correlation and causation errors, because the
Professor was strongly anti-smoking. He was hoping to find a link that would support his own
agenda. This is not to say that his results were useless, because they showed that there is a
root cause behind the problems of delinquency and the likelihood of smoking. This, however,
is not the same as a cause and effect relationship, and he allowed his emotions to cloud his
judgment. Smoking is a very emotive subject, but academics must remain aloof and unbiased
if internal validity is to remain intact.
The Cost of Disregarding Correlation and Causation

The principle of incorrectly linking correlation and causation is closely linked to post-hoc
reasoning, where incorrect assumptions generate an incorrect link between two effects.
27
The principle of correlation and causation is very important for anybody working as a scientist
or researcher. It is also a useful principle for non-scientists, especially those studying politics,
media and marketing. Understanding causality promotes a greater understanding, and honest
evaluation of the alleged facts given by pollsters.
Imagine an expensive advertising campaign, based around intense market research, where
misunderstanding a correlation could cost a lot of money in advertising, production costs, and
damage to the company's reputation.
Bibliography
Coon, D. & Mitterer, J.O. (2009). Psychology: A Journey (4th Ed.). Belmont, CA: Cengage
Learning
Kassin, S.M., Fein, S., Markus, H.R. (2011). Social Psychology, Belmont, CA: Wadsworth
Cengage Learning
Kornblum, W. (2003). Sociology in a Changing World (6th Ed.). Belmont, CA: Wadsworth
Cengage Learning
Smoking Parents Cause Teenage Delinquency, (2006, August 21). Criticalthinking.org.uk.

Retrieved Feb. 26. 2008 from: http://www.criticalthinking.org.uk/smokingparents/
Martyn Shuttleworth (Feb 26, 2008). Correlation and Causation. Retrieved from
Explorable.com: https://explorable.com/correlation-and-causation
28
4 Regression
4.1 Linear Regression Analysis
Linear regression analysis is a powerful technique used for predicting the unknown
value of a variable from the known value of another variable.
More precisely, if X and Y are two related variables, then linear regression analysis helps us
to predict the value of Y for a given value of X or vice verse.
For example age of a human being and maturity are related variables.
Then linear regression analyses can predict level of maturity given age
of a human being.
Dependent and Independent Variables

By linear regression, we mean models with just one independent and one dependent variable.
The variable whose value is to be predicted is known as the dependent variable and the one
whose known value is used for prediction is known as the independent variable.
Two Lines of Regression

There are two lines of regression- that of Y on X and X on Y. The line of regression of Y on X
is given by Y = a + bX where a and b are unknown constants known as intercept and slope of
the equation. This is used to predict the unknown value of variable Y when value of variable X
is known.
Y = a + bX
On the other hand, the line of regression of X on Y is given by X = c + dY which is used to

predict the unknown value of variable X using the known value of variable Y. Often, only one
29
of these lines make sense.
Exactly which of these will be appropriate for the analysis in hand will depend on labeling of
dependent and independent variable in the problem to be analyzed.
Choice of Line of Regression
For example, consider two variables crop yield (Y) and rainfall (X). Here
construction of regression line of Y on X would make sense and would
be able to demonstrate the dependence of crop yield on rainfall. We
would then be able to estimate crop yield given rainfall.
Careless use of linear regression analysis could mean construction of regression line of X on
Y which would demonstrate the laughable scenario that rainfall is dependent on crop yield;
this would suggest that if you grow really big crops you will be guaranteed a heavy rainfall.
Regression Coefficient
The coefficient of X in the line of regression of Y on X is called the regression coefficient of Y
on X. It represents change in the value of dependent variable (Y) corresponding to unit
change in the value of independent variable (X).
For instance if the regression coefficient of Y on X is 0.53 units, it would indicate that Y will
increase by 0.53 if X increased by 1 unit. A similar interpretation can be given for the
regression coefficient of X on Y.
Once a line of regression has been constructed, one can check how good it is (in terms of
predictive ability) by examining the coefficient of determination (R2). R2 always lies between 0
and 1. All software provides it whenever regression procedure is run.
R2 - coefficient of determination
The closer R2 is to 1, the better is the model and its prediction. A related question is whether
the independent variable significantly influences the dependent variable. Statistically, it is
equivalent to testing the null hypothesis that the regression coefficient is zero. This can be
done using t-test.
Assumption of Linearity
30
Linear regression does not test whether data is linear. It finds the slope and the intercept
assuming that the relationship between the independent and dependent variable can be best
explained by a straight line.
One can construct the scatter plot to confirm this assumption. If the scatter plot reveals
non linear relationship, often a suitable transformation can be used to attain linearity.
Explorable.com (Mar 7, 2009). Linear Regression Analysis. Retrieved from Explorable.com:

https://explorable.com/linear-regression-analysis
31
4.2 Multiple Regression Analysis
Multiple regression analysis is a powerful technique used for predicting the unknown
value of a variable from the known value of two or more variables- also called the
predictors.
More precisely, multiple regression analysis helps us to predict the value of Y for given values
of X1, X2, …, Xk.
For example the yield of rice per acre depends upon quality of seed,
fertility of soil, fertilizer used, temperature, rainfall. If one is interested to
study the joint affect of all these variables on rice yield, one can use this
technique.
An additional advantage of this technique is it also enables us to study
the individual influence of these variables on yield.
Dependent and Independent Variables

By multiple regression, we mean models with just one dependent and two or more
independent (exploratory) variables. The variable whose value is to be predicted is known as
the dependent variable and the ones whose known values are used for prediction are known
independent (exploratory) variables.
The Multiple Regression Model

In general, the multiple regression equation of Y on X1, X2, …, Xk is given by:
Y = b0 + b1 X1 + b2 X2 + …………………… + bk Xk
Interpreting Regression Coefficients

Here b0 is the intercept and b1, b2, b3, …, bk are analogous to the slope in linear regression
equation and are also called regression coefficients. They can be interpreted the same way as
slope. Thus if bi = 2.5, it would indicates that Y will increase by 2.5 units if Xi increased by 1
32
unit.
The appropriateness of the multiple regression model as a whole can be tested by the F-test
in the ANOVA table. A significant F indicates a linear relationship between Y and at least one
of the X's.
How Good Is the Regression?

Once a multiple regression equation has been constructed, one can check how good it is (in
terms of predictive ability) by examining the coefficient of determination (R2). R2 always lies
between 0 and 1.
R2 - coefficient of determination
All software provides it whenever regression procedure is run. The closer R2 is to 1, the better
is the model and its prediction.
A related question is whether the independent variables individually influence the dependent
variable significantly. Statistically, it is equivalent to testing the null hypothesis that the
relevant regression coefficient is zero.
This can be done using t-test. If the t-test of a regression coefficient is significant, it indicates
that the variable is in question influences Y significantly while controlling for other independent
explanatory variables.
Assumptions
Multiple regression technique does not test whether data are linear. On the contrary, it
proceeds by assuming that the relationship between the Y and each of Xi's is linear. Hence as
a rule, it is prudent to always look at the scatter plots of (Y, Xi), i= 1, 2,…,k. If any plot
suggests non linearity, one may use a suitable transformation to attain linearity.
Another important assumption is non existence of multicollinearity- the independent variables

are not related among themselves. At a very basic level, this can be tested by computing the
correlation coefficient between each pair of independent variables.
Other assumptions include those of homoscedasticity and normality.
33
Multiple regression analysis is used when one is interested in predicting a continuous
dependent variable from a number of independent variables. If dependent variable is
dichotomous, then logistic regression should be used.
Explorable.com (Jun 18, 2009). Multiple Regression Analysis. Retrieved from

Explorable.com: https://explorable.com/multiple-regression-analysis
34
4.3 Correlation and Regression
Correlation and linear regression are the most commonly used techniques for
investigating the relationship between two quantitative variables.
The goal of a correlation analysis is to see whether two measurement variables co vary, and
to quantify the strength of the relationship between the variables, whereas regression
expresses the relationship in the form of an equation.
For example, in students taking a Maths and English test, we could use correlation to
determine whether students who are good at Maths tend to be good at English as well, and
regression to determine whether the marks in English can be predicted for given marks in
Maths.
What a Scatter Diagram Tells Us

The starting point is to draw a scatter of points on a graph, with one variable on the X-axis and
the other variable on the Y-axis, to get a feel of the relationship (if any) between the variables
as suggested by the data. The closer the points are to a straight line, the stronger the
linear relationship between two variables.
Why Use Correlation?

We can use the correlation coefficient, such as the Pearson Product Moment Correlation
Coefficient, to test if there is a linear relationship between the variables. To quantify the
strength of the relationship, we can calculate the correlation coefficient (r). Its numerical value
ranges from +1.0 to -1.0. r > 0 indicates positive linear relationship, r < 0 indicates negative
linear relationship while r = 0 indicates no linear relationship.
A Caveat
It must, however, be considered that there may be a third variable related to both of the
variables being investigated, which is responsible for the apparent correlation. Correlation
does not imply causation. Also, a nonlinear relationship may exist between two variables that
would be inadequately described, or possibly even undetected, by the correlation coefficient.
35
Why Use Regression
In regression analysis, the problem of interest is the nature of the relationship itself between
the dependent variable (response) and the (explanatory) independent variable.
The analysis consists of choosing and fitting an appropriate model, done by the method of
least squares, with a view to exploiting the relationship between the variables to help estimate
the expected response for a given value of the independent variable. For example, if we are
interested in the effect of age on height, then by fitting a regression line, we can predict the
height for a given age.
Assumptions
Some underlying assumptions governing the uses of correlation and regression are as follows.
The observations are assumed to be independent. For correlation, both variables should be
random variables, but for regression only the dependent variable Y must be random. In
carrying out hypothesis tests, the response variable should follow Normal distribution and the
variability of Y should be the same for each value of the predictor variable. A scatter diagram
of the data provides an initial check of the assumptions for regression.
Uses of Correlation and Regression

There are three main uses for correlation and regression.
One is to test hypotheses about cause-and-effect relationships. In this case, the

experimenter determines the values of the X-variable and sees whether variation in X
causes variation in Y. For example, giving people different amounts of a drug and
measuring their blood pressure.
36
The second main use for correlation and regression is to see whether two variables are
associated, without necessarily inferring a cause-and-effect relationship. In this case,
neither variable is determined by the experimenter; both are naturally variable. If an
association is found, the inference is that variation in X may cause variation in Y, or
variation in Y may cause variation in X, or variation in some other factor may affect both
X and Y.
The third common use of linear regression is estimating the value of one variable
corresponding to a particular value of the other variable.
Explorable.com (Jan 18, 2010). Correlation and Regression. Retrieved from Explorable.com:
https://explorable.com/correlation-and-regression
37
5 Student’s T-Test
The student's t-test is a statistical method that is used to see if two sets of data differ
significantly.
The method assumes that the results follow the normal distribution (also called student's t-
distribution) if the null hypothesis is true. This null hypothesis will usually stipulate that there is
no significant difference between the means of the two data sets.
It is best used to try and determine whether there is a difference between two independent
sample groups. For the test to be applicable, the sample groups must be completely
independent, and it is best used when the sample size is too small to use more advanced
methods.
Before using this type of test it is essential to plot the sample data from the two samples and
make sure that it has a reasonably normal distribution, or the student's t test will not be
suitable. It is also desirable to randomly assign samples to the groups, wherever possible.
Example
You might be trying to determine if there is a significant difference in test scores between two
groups of children taught by different methods.
The null hypothesis might state that there is no significant difference in the mean test scores
of the two sample groups and that any difference down to chance.
The student's t test can then be used to try and disprove the null hypothesis.
Restrictions
The two sample groups being tested must have a reasonably normal distribution. If the
distribution is skewed, then the student's t test is likely to throw up misleading results. The
distribution should have only one main peak (= mode) near the mean of the group.
If the data does not adhere to the above parameters, then either a large data sample is
needed or, preferably, a more complex form of data analysis should be used.
38
Results
The student's t test can let you know if there is a significant difference in the means of the two
sample groups and disprove the null hypothesis. Like all statistical tests, it cannot prove
anything, as there is always a chance of experimental error occurring. But the test can support
a hypothesis.
However, it is still useful for measuring small sample populations and determining if there is a
significant difference between the groups.
Martyn Shuttleworth (Feb 19, 2008). Student’s T-Test. Retrieved from Explorable.com:
https://explorable.com/students-t-test
39
5.1 Independent One-Sample T-Test
An independent one-sample t-test is used to test whether the average of a sample differ
significantly from a population mean, a specified value μ0.
When you compare each sample to a "known truth", you would use the (independent) one-
sample t-test. If you are comparing two samples not strictly related to each other, the
independent two-sample t-test is used.
Any single sample statistical test that uses t-distribution can be called a 'one-sample t-test'.
This test is used when we have a random sample and we want to test if it is significantly
different from a population mean.
Hypothesis to Be Tested
Generally speaking, this test involves testing the null hypothesis H0: μ = μ0 against the
alternative hypothesis, H1: μ ≠ μ0 where μ is the population mean and μ0 is a specific value of the
population mean that we would like to test for acceptance.
An example may clarify the calculation and hypothesis testing of the independent one-sample
t-test better.
An Example
Suppose that the teacher of a school claims that an average student of his school studies 8
hours per day during weekends and we desire to test the truth of this claim.
The statistical methodology for this purpose requires that we begin by first specifying the
hypothesis to be tested.
In this case, the null hypothesis would be H0: μ = 8, which essentially states that mean hours of
study per day is no different from 8 hours. And the alternative hypothesis is, H1: μ ≠ 8, which
is negation of the teacher's claim.
Collecting Samples
In the next step, we take a sample of say 10 students of the school and collect data on how
40
long they study during weekends.
These 10 different study hours is our data.
Suppose that the sample mean turns out to be 6.5 hours.
We cannot infer anything directly from this mean - as to whether the claim is to be accepted or
rejected as it could very well have happened that by sheer luck (even though the sample was
drawn randomly). Students included in the sample may have been those who studied fewer
than 8 hours.
On the other hand, it could also be the case that the claim was indeed inappropriate.
To draw a scientifically valid conclusion, we can perform an independent one-sample t-test

which helps us to either accept or reject the null hypothesis.
If the null hypothesis is rejected, it means that the sample came from a population with mean
study hours significantly different from 8 hours.
On the other hand if the null hypothesis is accepted, it means that there is no evidence to
suggest that average study hours were significantly different from 8 hours - thereby
establishing evidence of the claim.
Assumptions
This test is one of the most popular small sample test widely used in all disciplines - medicine,
behavioral science, physical science etc. However, this test can be used only if the
background assumptions are satisfied.
The population from which the sample has been drawn should be normal - appropriate
statistical methods exist for testing this assumption (For example the Kolmogorov
Smirnov non parametric test). It has however been shown that minor departures from
normality do not affect this test - this is indeed an advantage.
The population standard deviation is not known.
Sample observations should be random.
A Small Sample Test

This test is a small sample test. It is difficult to draw the clearest line of demarcation between
large and small samples. Statisticians have generally agreed that a sample may be
considered small if its size is < 30 (less than 30).
The test used for dealing with problems relating the large samples are different from the one
41
used for small samples. We often use z-test for large samples.
Explorable.com (Jan 18, 2009). Independent One-Sample T-Test. Retrieved from

Explorable.com: https://explorable.com/independent-one-sample-t-test
42
5.2 Independent Two-Sample T-Test
The independent two-sample t-test is used to test whether population means are
significantly different from each other, using the means from randomly drawn samples.
Any statistical test that uses two samples drawn independently of each other and using t-
distribution, can be called a 'two-sample t-test'.
Hypothesis Testing
Generally speaking, this test involves testing the null hypothesis H0: μ(x) = μ(y) against the
alternative research hypothesis, H1: μ(x) ≠ μ(y) where μ(x) and μ(y) are respectively the
population mean of the two populations from which the two samples have been drawn.
Hypothesis testing is frequently used for the scientific method.
An Example
Suppose that a school has two buildings - one for girls and the other for boys. Suppose that
the principal want to know if the pupils of the two buildings are working equally hard, in the
sense that they put in equal number of hours in studies on the average.
Statistically speaking, the principal is interested in testing whether the average number of
hours studied by boys is significantly different from the average for girls.
Steps
1. To calculate, we begin by specifying the hypothesis to be tested.
In this case, the null hypothesis would be H0: μ(boys) = μ(girls), which essentially states
that mean study hours for boys and girls are no different.
The alternative research hypothesis is H1: μ(boys) ≠ μ(girls).
2. In the second step, we take a sample of say 10 students from the boy's building and 15
from girl's building and collect data on how long they study daily. These 10 and 15
different study hours are our two samples.
43
It is not difficult to see that the two samples have been drawn independent of each
other - an essential requirement of the independent two-sample t-test.
Suppose that the sample mean turns out to be 7.25 hours for boys and 8.5 for girls. We
cannot infer anything directly from these sample means - specifically as to whether boys
and girls were equally hard working as it could very well have happened by sheer luck
(even though the samples were drawn randomly) that boys included in the boy's sample
were those who studied fewer hours.
On the other hand, it could also be the case that girls were indeed working harder than
boys.
3. The third step would involve performing the independent two-sample t-test which helps
us to either accept or reject the null hypothesis.
If the null hypothesis is rejected, it means that two buildings were significantly different in
terms of number of hours of hard work.
On the other hand if the null hypothesis is accepted, one can conclude that there is no
evidence to suggest that the two buildings differed significantly and that boys and girls
can be said to be at par.
Assumptions
Along with the independent single sample t-test, this test is one of the most widely tests.
However, this test can be used only if the background assumptions are satisfied.
The populations from which the samples have been drawn should be normal -
appropriate statistical methods exist for testing this assumption (For example, the
Kolmogorov Smirnov non-parametric test). One needs to note that the normality
assumption has to be tested individually and separately for the two samples. It has
however been shown that minor departures from normality do not affect this test - this is
indeed an advantage.
The standard deviation of the populations should be equal i.e. σX2 = σY2 = σ2, where σ2 is
unknown. This assumption can be tested by the F-test.
Samples have to be randomly drawn independent of each other. There is however no
requirement that the two samples should be of equal size - often times they would be
unequal though the odd case of equal size cannot be ruled out.
44
Explorable.com (Oct 12, 2009). Independent Two-Sample T-Test. Retrieved from
Explorable.com: https://explorable.com/independent-two-sample-t-test
45
5.3 Dependent T-Test for Paired Samples
The dependent t-test for paired samples is used when the samples are paired. This
implies that each individual observation of one sample has a unique corresponding
member in the other sample.
one sample has been tested twice (repeated measures)
or,
two samples have been "matched" or "paired", in some way. (matched subjects design)
The emphasis being on pairing of observations, it is obvious that the samples are dependent -
hence the name.
Any statistical test involving paired samples and using t-distribution can be called 't-test for
paired samples'.
An Example
Let us illustrate the meaning of a paired sample. Suppose that we are required to examine if a
newly developed intervention program for disadvantaged students has an impact. For this
purpose, we need to obtain scores from a sample of n such students in a standardized test
before administering the program.
After the program is over, the same test needs to be administered to the same group of
students and scores obtained again.
There are two samples: 1) the sample of prior intervention scores (pretest) and, 2) the post
intervention scores (posttest). The samples are related in the sense that each pretest has a
corresponding posttest as both were obtained from the same student.
If the score of each student (ith) before and after the program is xi and yi respectively, then
the pair (xi, yi) corresponds to the same subject (student in this case).
46
This is what is meant by paired sample. It is very important that two scores for each individual
student be correctly identified and labeled as the differences di =│ xi - yi │are used to determine
the test statistic and consequently the p-value.
Steps
1. With the above framework, the null hypothesis would be H0: there is no significant
difference between pre and post intervention scores, which essentially states that the
intervention program was not effective. The alternative hypothesis is H1: there is
significant difference between pre and post intervention scores.
2. Once the hypotheses have been framed, the second step involves taking the sample of
pre and post intervention scores and determining the sum, ∑│ xi - yi │. Logically speaking,
a small sum could indicate truth of the null hypothesis.
However nothing concrete can be interpreted from it - specifically as to whether

intervention program did have an impact as it could very well have happened by sheer
luck (even though the students were drawn randomly) that for this sample of students,
the scores did not change much.
On the other hand, it could also be the case that the program was indeed useful.
3. The third step involves performing the dependent t-test for paired samples which helps
us to either accept or reject the null hypothesis. If the null hypothesis is rejected, one
can infer that the program was useful.
On the other hand if the null hypothesis is accepted, one can conclude that there is no
evidence to suggest the program did have an impact.
Assumptions
This test has a few background assumptions which need to be satisfied.
1. The sample of differences (di's) should be normal - an assumption that can be tested -
for instance by the Kolmogorov Smirnov non-parametric test.
It has however been shown that minor departures from normality do not affect this test -
this is indeed an advantage.
2. The samples should be dependent and it should be possible to identify specific pairs.
3. An obvious requirement is that the two samples should be of equal size.
47
For Small Samples
This test is a small sample test. It is difficult to draw the clearest line of demarcation between
large and small sample.
Statisticians have generally agreed that a sample may be considered small if its size is < 30
(below 30).
Explorable.com (Feb 26, 2009). Dependent T-Test for Paired Samples. Retrieved from
Explorable.com: https://explorable.com/dependent-t-test-for-paired-samples
48
5.4 Student’s T-Test (II)
Any statistical test that uses t distribution can be called a t-test, or the "student's t-
test". It is basically used when the sample size is small i.e. n<30.
For example, if a person wants to test the hypothesis that mean height
of student's of a college is not different from 150 cms, he can take a
sample of size say 20 from the college. From the mean height of these
students, he can test the hypothesis. The test to be used for this
purpose is t-test.
Student's T-test for Different Purposes

There are different types of t-test each for different purpose. Some of the popular types are
outlined below:
1. Student's t-test for single mean is used to test a hypothesis on a specific value of the
population mean. Statistically speaking, we test the null hypothesis H0: μ =0μagainst the
alternative hypothesis H1: μ >< μ0 where μ is the population mean and
0 isμa specific value
of the population that we would like to test for acceptance.
The example on heights of students explained above requires this test. In that example,
μ0 = 150.
2. The t-test for difference of means is used to test the hypothesis that two populations
have the same mean.
For example suppose one is interested to test if there is any significant difference
between the mean height of male and female students in a particular college. In such a
situation, t-test for difference of means can be applied. One would have to take two
independent samples from the college- one from males and the other from females in
order to perform this test.
An additional assumption of this test is that the variance of the two populations is equal.
3. A paired t-test is usually used when the two samples are dependent- this happens
when each individual observation of one sample has a unique relationship with a
49
particular member of the other sample.
For example we may wish to test if a newly developed intervention program for
disadvantaged students is useful. For this, we need to obtain scores from say 22
students in a standardized test before administering the program. After the program is
over, the same test needs to be administered again on the same group of 22 students
and scores obtained.
The two samples- the sample of prior intervention scores and the sample of post
intervention scores are related as each student has two scores. The samples are
therefore dependent. The paired t-test can is applicable in such scenarios.
4. A t-test for correlation coefficient is used for testing an observed sample correlation
coefficient (r).
For example suppose a random sample of 27 pairs of observation from a normal

population gave a correlation coefficient 0.2. Notice that this is the sample correlation
coefficient obtained from a sample of observations. One may be interested to know
whether the variables are correlated in the population. In this case we can use t-test for
correlation coefficient
5. A t-test for testing significance of regression coefficient is used to test the

significance of regression coefficients in linear and multiple regression setup.
Assumptions
Irrespective of the type of t-test used, two assumptions have to be met.
1. the populations from which the samples are drawn are normal.
2. the population standard deviation is not known.
Student's t-test is a small sample test. It is difficult to drawn a clearest line of demarcation
between large and small sample. Statisticians have generally agreed that a sample may be
considered small if its size is < 30.
The test used for dealing with problems relating the large samples are different from the one
used for small samples. We often use z-test for large samples.
Explorable.com (Jul 20, 2009). Student’s T-Test (II). Retrieved from Explorable.com:
https://explorable.com/students-t-test-2
50
6 ANOVA
Analysis Of Variance
The Analysis Of Variance, popularly known as the ANOVA, can be used in cases where
there are more than two groups.
When we have only two samples we can use the t-test to compare the means of the samples
but it might become unreliable in case of more than two samples. If we only compare two
means, then the t-test (independent samples) will give the same results as the ANOVA.
It is used to compare the means of more than two samples. This can be understood better
with the help of an example.
One Way Anova
EXAMPLE: Suppose we want to test the effect of five different

exercises. For this, we recruit 20 men and assign one type of exercise to
4 men (5 groups). Their weights are recorded after a few weeks.
We may find out whether the effect of these exercises on them is
significantly different or not and this may be done by comparing the
weights of the 5 groups of 4 men each.
The example above is a case of one-way balanced ANOVA.
It has been termed as one-way as there is only one category whose effect has been studied
and balanced as the same number of men has been assigned on each exercise. Thus the
basic idea is to test whether the samples are all alike or not.
Why Not Multiple T-Tests?

As mentioned above, the t-test can only be used to test differences between two means.
When there are more than two means, it is possible to compare each mean with each other
mean using many t-tests.
But conducting such multiple t-tests can lead to severe complications and in such
circumstances we use ANOVA. Thus, this technique is used whenever an alternative
51
procedure is needed for testing hypotheses concerning means when there are several
populations.
One Way and Two Way Anova

Now some questions may arise as to what are the means we are talking about and why
variances are analyzed in order to derive conclusions about means. The whole procedure can
be made clear with the help of an experiment.
Let us study the effect of fertilizers on yield of wheat. We apply five

fertilizers, each of different quality, on five plots of land each of wheat.
The yield from each plot of land is recorded and the difference in yield
among the plots is observed. Here, fertilizer is a factor and the different
qualities of fertilizers are called levels.
This is a case of one-way or one-factor ANOVA since there is only one factor, fertilizer. We
may also be interested to study the effect of fertility of the plots of land. In such a case we
would have two factors, fertilizer and fertility. This would be a case of two-way or two-factor
ANOVA. Similarly, a third factor may be incorporated to have a case of three-way or three-
factor ANOVA.
Chance Cause and Assignable Cause

In the above experiment the yields obtained from the plots may be different and we may be
tempted to conclude that the differences exist due to the differences in quality of the fertilizers.
But this difference may also be the result of certain other factors which are attributed to
chance and which are beyond human control. This factor is termed as “error”. Thus, the
differences or variations that exist within a plot of land may be attributed to error.
Thus, estimates of the amount of variation due to assignable causes (or variance between the
samples) as well as due to chance causes (or variance within the samples) are obtained
separately and compared using an F-test and conclusions are drawn using the value of F.
Assumptions
There are four basic assumptions used in ANOVA.
the expected values of the errors are zero

the variances of all errors are equal to each other
52
the errors are independent
they are normally distributed
Explorable.com (Jun 6, 2009). ANOVA. Retrieved from Explorable.com:

https://explorable.com/anova
53
6.1 One-Way ANOVA
A One-Way ANOVA (Analysis of Variance) is a statistical technique by which we can

test if three or more means are equal. It tests if the value of a single variable differs
significantly among three or more levels of a factor.
We can say we have a framework for one-way ANOVA when we have a single factor with
three or more levels and multiple observations at each level.
In this kind of layout, we can calculate the mean of the observations within each level of our
factor.
The concepts of factor, levels and multiple observations at each level can be best understood
by an example.
Factor and Levels - An Example

Let us suppose that the Human Resources Department of a company desires to know if
occupational stress varies according to age.
The variable of interest is therefore occupational stress as measured by a scale.
The factor being studied is age. There is just one factor (age) and hence a situation
appropriate for one-way ANOVA.
Further suppose that the employees have been classified into three groups (levels):
less than 40
40 to 55
above 55
These three groups are the levels of factor age - there are three levels here. With this design,
we shall have multiple observations in the form of scores on Occupational Stress from a
number of employees belonging to the three levels of factor age. We are interested to know
whether all the levels i.e. age groups have equal stress on the average.
Non-significance of the test statistic (F-statistic) associated with this technique would imply
that age has no effect on stress experienced by employees in their respective occupations.
54
On the other hand, significance would imply that stress afflicts different age groups differently.
Hypothesis Testing
Formally, the null hypothesis to be tested is of the form:
H0: All the age groups have equal stress on the average or 1μ = 2μ = μ
3,
where 1μ, 2μ, 3μ are mean stress scores for the three age groups.
The alternative hypothesis is:
H1: The mean stress of at least one age group is significantly different.
One-way Anova and T-Test

The one-way ANOVA is an extension of the independent two-sample t-test.
In the above example, if we considered only two age groups, say below 40 and above 40,
then the independent samples t-test would have been enough although application of ANOVA
would have also produced the same result.
In the example considered above, there were three age groups and hence it was necessary to
use one-way ANOVA.
Often the interest is on acceptance or rejection of the null hypothesis. If it is rejected, this
technique will not identify the level which is significantly different. One has to perform t-tests
for this purpose.
This implies that if there exists difference between the means, we would have to carry out 3C2
independent t-tests in order to locate the level which is significantly different. It would be kC2
number of t-tests in the general one-way ANOVA design with k levels.
Advantages
One of the principle advantages of this technique is that the number of observations need not
be the same in each group.
Additionally, layout of the design and statistical analysis is simple.
55
Assumptions
For the validity of the results, some assumptions have been checked to hold before the
technique is applied. These are:
Each level of the factor is applied to a sample. The population from which the sample
was obtained must be normally distributed.
The samples must be independent.
The variances of the population must be equal.
Replication and Randomization

In general, ANOVA experiments need to satisfy three principles - replication, randomization
and local control.
Out of these three, only replication and randomization have to be satisfied while designing
and implementing any one-way ANOVA experiment.
Replication refers to the application of each individual level of the factor to multiple subjects.
In the above example, in order to apply the principle of replication, we had obtained
occupational stress scores from more than one employee in each level (age group).
Randomization refers to the random allocation of the experimental units. In our example,
employees were selected randomly for each of the age groups.
Explorable.com (Mar 21, 2009). One-Way ANOVA. Retrieved from Explorable.com:

https://explorable.com/one-way-anova
56
6.2 Two-Way ANOVA
A Two-Way ANOVA is useful when we desire to compare the effect of multiple levels of
two factors and we have multiple observations at each level.
One-Way ANOVA compares three or more levels of one factor. But some experiments
involve two factors each with multiple levels in which case it is appropriate to use Two-Way
ANOVA.
Let us discuss the concepts of factors, levels and observation through an example.
Factors and Levels - An Example
A Two-Way ANOVA is a
design with two factors.
Let us suppose that the Human Resources Department of a company desires to know if
occupational stress varies according to age and gender.
The variable of interest is therefore occupational stress as measured by a scale.
There are two factors being studied - age and gender.
Further suppose that the employees have been classified into three groups or levels:
age less than 40,

40 to 55
above 55
In addition employees have been labeled into gender classification (levels):
male
female
In this design, factor age has three levels and gender two. In all, there are 3 x 2 = 6 groups or
cells. With this layout, we obtain scores on occupational stress from employee(s) belonging to
57
the six cells.
Testing for Interaction

There are two versions of the Two-Way ANOVA.
The basic version has one observation in each cell - one occupational stress score from one
employee in each of the six cells.
The second version has more than one observation per cell but the number of observations in
each cell must be equal. The advantage of the second version is it also helps us to test if
there is any interaction between the two factors.
For instance, in the example above, we may be interested to know if there is any interaction
between age and gender.
This helps us to know if age and gender are independent of each other - they are independent
if the effect of age on stress remains the same irrespective of whether we take gender into
consideration.
Hypothesis Testing
In the basic version there are two null hypotheses to be tested.
H01: All the age groups have equal stress on the average
H02: Both the gender groups have equal stress on the average.
In the second version, a third hypothesis is also tested:
H03: The two factors are independent or that interaction effect is not present.
The computational aspect involves computing F-statistic for each hypothesis.
Assumption
The assumptions in both versions remain the same - normality, independence and equality of
variance.
Advantages
An important advantage of this design is it is more efficient than its one-way counterpart.
58
There are two assignable sources of variation - age and gender in our example - and
this helps to reduce error variation thereby making this design more efficient.
Unlike One-Way ANOVA, it enables us to test the effect of two factors at the same time.
One can also test for independence of the factors provided there are more than one
observation in each cell. The only restriction is that the number of observations in each
cell has to be equal (there is no such restriction in case of one-way ANOVA).
Replication, Randomization and Local Control

An Two-Way ANOVA satisfies all three principles of design of experiments namely replication,
randomization and local control.
Principles of replication and randomization need to be satisfied in a manner similar to One-

Way ANOVA.
The principle of local control means to make the observations as homogeneous as possible
so that error due to one or more assignable causes may be removed from the experimental
error.
In our example if we divided the employees only according to their age, then we would have
ignored the effect of gender on stress which would then accumulate with the experimental
error.
But we divided them not only according to age but also according to gender which would help
in reducing the error - this is application of the principle of local control for reducing error
variation and making the design more efficient.
Explorable.com (Oct 19, 2009). Two-Way ANOVA. Retrieved from Explorable.com:

https://explorable.com/two-way-anova
59
6.3 Factorial Anova
Experiments where the effects of more than one factor are considered together are
called 'factorial experiments' and may sometimes be analyzed with the use of factorial
ANOVA.
For instance, the academic achievement of a student depends on study habits of the student
as well as home environment. We may have two simple experiments, one to study the effect
of study habits and another for home environment.
Independence of Factors
But these experiments will not give us any information about the dependence or
independence of the two factors, namely study habit and home environment.
In such cases, we resort to Factorial ANOVA which not only helps us to study the effect of two
or more factors but also gives information about their dependence or independence in the
same experiment. There are many types of factorial designs like 22, 23, 32 etc. The simplest
of them all is the 22 or 2 x 2 experiment.
An Example
In these experiments, the factors are applied at different levels. In a 2 x 2 factorial design,
there are 2 factors each being applied in two levels.
Let us illustrate this with the help of an example. Suppose that a new drug has been
developed to control hypertension.
We want to test the effect of quantity of the drug taken and the effect of gender. Here, the
quantity of the drug is the first factor and gender is the second factor (or vice versa).
Suppose that we consider two quantities, say 100 mg and 250 mg of the drug (1 / 2). These
two quantities are the two levels of the first factor.
Similarly, the two levels of the second factor are male and female (A / B).
60
Thus we have two factors each being applied at two levels. In other words, we have a 2 x 2
factorial design.
Here we have 4 different treatment groups, one for each combination of levels of factors - by
convention, the groups are denoted by A1, A2, B1, B2. These groups mean the following.
A1 : 100mg of the drug applied on male patients

A2 : 250mg of the drug applied on male patients
B1 : 100mg of the drug applied on female patients
B2 : 250mg of the drug applied on female patients.
Here, the quantity of the drug and gender are the independent variables whereas reduction of
hypertension after one month is the dependent variable.
Main Effects and Interaction

A main effect is an outcome that can show consistent difference between levels of a factor.
In our example, there are two main effects - quantity and gender.
Factorial ANOVA also enables us to examine the interaction effect between the factors. An
interaction effect is said to exist when differences on one factor depend on the level of other
factor.
However, it is important to remember that interaction is between factors and not levels. We
know that there is no interaction between the factors when we can talk about the effect of one
factor without mentioning the other factor.
Hypothesis Testing
In the above example, there are three hypotheses to be tested. These are:
H01: Main effect 'quantity' is not significant

H02: Main effect 'gender' is not significant
H03: Interaction effect is not present.
For main effect gender, the null hypothesis means that there is no significant difference in
reduction of hypertension in males and females.
61
The null hypothesis for the main effect quantity means that there is no significant difference in
reduction of hypertension whether the patients are given 100 mg or 250 mg of the drug.
For the interaction effect, the null hypothesis means that the two main effects gender and
quantity are independent. The computational aspect involves computing F-statistic for each
hypothesis.
Advantages
Factorial design has several important features.
Factorial designs are the ultimate designs of choice whenever we are interested in
examining treatment variations.
Factorial designs are efficient. Instead of conducting a series of independent studies, we
are effectively able to combine these studies into one.
Factorial designs are the only effective way to examine interaction effects.
The assumptions remain the same as with other designs - normality, independence and
equality of variance.
Explorable.com (Nov 22, 2009). Factorial Anova. Retrieved from Explorable.com:

https://explorable.com/factorial-anova
62
6.4 Repeated Measures ANOVA
Repeated Measures ANOVA is a technique used to test the equality of means.
It is used when all the members of a random sample are tested under a number of conditions.
Here, we have different measurements for each of the sample as each sample is exposed to
different conditions.
However, it is used when all the members of a random sample are tested under a number of
conditions. Here, we have different measurements for each of the sample as each sample is
exposed to different conditions.
In other words, the measurement of the dependent variable is repeated. It is not possible to
use the standard ANOVA in such a case as such data violates the assumption of
independence of data and as such it will not be able to model the correlation between the
repeated measures.
Not Multivariate Design

However, it must be noted that a repeated measures design is very much different from a
multivariate design.
For both, samples are measured on several occasions, or trials, but in the repeated measures
design, each trial represents the measurement of the same characteristic under a different
condition.
For example, repeated measures ANOVA can be used to compare the number of oranges
produced by an orange grove in years one, two and three. The measurement is the number of
oranges and the condition that changes is the year.
But in a multivariate design, each trial represents the measurement of a different

characteristic.
Thus, to compare the number, weight and price of oranges repeated measures ANOVA
cannot be used. The three measurements are number, weight, and price, and these do not
represent different conditions, but different qualities.
63
Why Use Repeated Measures Design?
Repeated measures design is used for several reasons:
By collecting data from the same participants under repeated conditions the individual
differences can be eliminated or reduced as a source of between group differences.
Also, the sample size is not divided between conditions or groups and thus inferential
testing becomes more powerful.
This design also proves to be economical when sample members are difficult to recruit
because each member is measured under all conditions.
Assumption
This design is based on the assumption of Sphericity, which means that the variance of the
population difference scores for any two conditions should be the same as the variance of the
population difference scores for any other two conditions.
But this condition is only relevant to the one-way repeated measures ANOVA and in other
cases this assumption is commonly violated.
Hypothesis
The null hypothesis to be tested here is:
H0: There are no differences between population means.
Some differences will occur in the sample. It is desired to draw conclusions about the
population from which it was taken, not about the sample. The F-ratios are used for the
analysis of variance and conclusions are drawn accordingly.
Within-Subject Design
The repeated measures design is also known as a within-subject design.
The data presented in this design includes a measure repeated over time, a measure
repeated across more than one condition or several related and comparable measures.
64
Possible Designs for Repeated Measures
One-way repeated measures
Two-way repeated measures
Two-way mixed split-plot design (SPANOVA)
Explorable.com (May 25, 2009). Repeated Measures ANOVA. Retrieved from

Explorable.com: https://explorable.com/repeated-measures-anova
65
7 Nonparametric Statistics
Nonparametric statistics are those data that do not assume a prior distribution. When
an experiment is performed or data collected for some purpose, it is usually assumed
that it fits some given probability distribution, typically the normal distribution. This is
the basis on which the data is interpreted. When these assumptions are not made, it
becomes nonparametric statistics.
There are several advantages of using nonparametric statistics. As can be expected, since
there are fewer assumptions that are made about the sample being studied, nonparametric
statistics are usually wider in scope as compared to parametric statistics that actually assume
a distribution. This is mainly the case when we do not know a lot about the sample we are
studying and making a priori assumptions about data distributions might not give us accurate
results and interpretations. This directly translates into an increase in robustness.
However, there are also some disadvantages of nonparametric statistics. The main
disadvantage is that the degree of confidence is usually lower for these types of studies. This
means for the same sample under consideration, the results obtained from nonparametric
statistics have a lower degree of confidence than if the results were obtained using parametric
statistics. Of course, this is assuming that the study is such that it is valid to assume a
distribution for the sample.
There are many experimental scenarios in which we can assume a normal distribution. For
example if an experiment looks at the correlation between a healthy morning breakfast and
IQ, the experimenter can assume beforehand that the IQs of the sample size follow a normal
distribution within the sample, assuming the sample is chosen randomly from thepopulation.
On the other hand, if this assumption is not made, then the experimenter is following
nonparametric statistics methods.
66
However, there could be another experiment that measures the resistance of the human body
to a strain of bacteria. In such a case, it is not possible to determine if the data will be normally
distributed. It might happen that all people are resistant to the strain of bacteria under study or
perhaps no one is. Again, there could be other considerations as well. It could be that people
of a particular ethnicity are born with that resistance while none of the others are. In such
cases, it is not right to assume a normal distribution of data. These are the situations in which
nonparametric statistics should be used. There are many tests that tell us whether the data
can be assumed to be normally distributed or not.
Explorable.com (Apr 26, 2010). Nonparametric Statistics. Retrieved from Explorable.com:

https://explorable.com/nonparametric-statistics
67
7.1 Cohen’s Kappa
Cohen's Kappa is an index that measures interrater agreement for categorical

(qualitative) items.
The items are indicators of the extent to which two raters who are examining the same set of
categorical data, agree while assigning the data to categories, for example, classifying a
tumor as 'malignant' or 'benign'.
Comparison between the level of agreement between two sets of dichotomous scores or
ratings (an alternative between two choices, e.g. accept or reject) assigned by two raters to
certain qualitative variables can be easily accomplished with the help of simple percentages,
i.e. taking the ratio of the number of ratings for which both the raters agree to the total number
of ratings. But despite the simplicity involved in its calculation, percentages can be misleading
and does not reflect the true picture since it does not take into account the scores that the
raters assign due to chance.
Using percentages can result in two raters appearing to be highly reliable and completely in
agreement, even if they have assigned their scores completely randomly and they actually do
not agree at all. Cohen's Kappa overcomes this issue as it takes into account agreement
occurring by chance.
How to Compute Cohen's Kappa

The formula for Cohen's Kappa is:
68
Pr(a) - Pr(e)
К = 1 - Pr(e)
Where:
Pr(a) = Observed percentage of agreement,
Pr(e) = Expected percentage of agreement.
The observed percentage of agreement implies the proportion of ratings where the raters
agree, and the expected percentage is the proportion of agreements that are expected to
occur by chance as a result of the raters scoring in a random manner. Hence Kappa is the
proportion of agreements that is actually observed between raters, after adjusting for the
proportion of agreements that take place by chance.
Let us consider the following 2×2 contingency table, which depicts the probabilities of two
raters classifying objects into two categories.
69
Rater 1 Total
Rater 2 Category 1 2
1 P11 P12 P10
2 P21 P22 P20
Total P01 P02 1
Then
Pr(a) = P01 + P10
Pr(e) = P02 + P20
Interpretation
The value of К ranges between -1 and +1, similar to Karl Pearson's co-efficient ofcorrelation 'r'.
In fact, Kappa and r assume similar values if they are calculated for the same set of
dichotomous ratings for two raters.
A value of kappa equal to +1 implies perfect agreement between the two raters, while that of -
1 implies perfect disagreement. If kappa assumes the value 0, then this implies that there is
no relationship between the ratings of the two raters, and any agreement or disagreement is
due to chance alone. A kappa value of 0.70 is generally considered to be satisfactory.
However, the desired reliability level varies depending on the purpose for which kappa is
being calculated.
Caveats
Kappa is very easy to calculate given the software's available for the purpose and is
appropriate for testing whether agreement exceeds chance levels. However, some questions
arise regarding the proportion of chance, or expected agreement, which is the proportion of
times the raters would agree by chance alone. This term is relevant only in case the raters are
independent, but the clear absence of independence calls its relevance into question.
Also, kappa requires two raters to use the same rating categories. But it cannot be used in
case we are interested to test the consistency of ratings for raters that use different
categories, e.g. if one uses the scale 1 to 5, and the other 1 to 10.
Explorable.com (Mar 15, 2010). Cohen’s Kappa. Retrieved from Explorable.com:

https://explorable.com/cohens-kappa
70
7.2 Mann-Whitney U-Test
Mann-Whitney-Wilcoxon (MWW)
Wilcoxon Rank-Sum Test
Non-parametric tests are basically used in order to overcome the underlying

assumption of normality in parametric tests. Quite general assumptions regarding the
population are used in these tests.
A case in point is the Mann-Whitney U-test (Also known as the Mann-Whitney-Wilcoxon

(MWW) or Wilcoxon Rank-Sum Test). Unlike its parametric counterpart, the t-test for two
samples, this test does not assume that the difference between the samples is normally
distributed, or that the variances of the two populations are equal. Thus when the validity of
the assumptions of t-test are questionable, the Mann-Whitney U-Test comes into play and
hence has wider applicability.
The Method
The Mann-Whitney U-test is used to test whether two independent samples of observations
are drawn from the same or identical distributions. An advantage with this test is that the two
samples under consideration may not necessarily have the same number of observations.
This test is based on the idea that the particular pattern exhibited when 'm' number of X
random variables and 'n' number of Y random variables are arranged together in increasing
order of magnitude provides information about the relationship between their parent
populations.
The Mann-Whitney test criterion is based on the magnitude of the Y's in relation to the X's, i.e.
the position of Y's in the combined ordered sequence. A sample pattern of arrangement
where most of the Y's are greater than most of the X's or vice versa would be evidence
against random mixing. This would tend to discredit the null hypothesis of identical distribution.
Assumptions
The test has two important assumptions. First the two samples under consideration are
random, and are independent of each other, as are the observations within each sample.
Second the observations are numeric or ordinal (arranged in ranks).
71
How to Calculate the Mann-Whitney U
In order to calculate the U statistics, the combined set of data is first arranged in ascending
order with tied scores receiving a rank equal to the average position of those scores in the
ordered sequence.
Let T denote the sum of ranks for the first sample. The Mann-Whitney test statistic is then
calculated using U = n1 n2 + {n1 (n1 + 1)/2} - T , where n1 and n2 are the sizes of the first
and second samples respectively.
An Example
An example can clarify better. Consider the following samples.
72
Sample A
Observation 25 25 19 21 22 19 15
Rank 15.5 15.5 9.5 13 14 9.5 3.5
73
Sample B
Observation 18 14 13 15 17 19 18 20 19
Rank 6.5 2 1 3.5 5 9.5 6.5 12 9.5
Here, T = 80.5, n1 = 7, n2 = 9.Hence, U = (7 * 9) + [{7 * (7+1)}/2] - 80.5 = 10.5.
We next compare the value of calculated U with the value given in the Tables of Critical
Values for the Mann-Whitney U-test, where the critical values are provided for given n1 and
n2 , and accordingly accept or reject the null hypothesis. Even though the distribution of U is
known, the normal distribution provides a good approximation in case of large samples.
Hypothesis On Equality of Medians

Often this statistic is used to compare a hypothesis regarding equality of medians. The logic is
simple - since the U statistic tests if two samples are drawn from identical populations,
equality of median follow.
As a Counterpart of T-Test
The Mann-Whitney U test is truly the non parametric counterpart of the two sample t-test. To
see this, one needs to recall that the t-test tests for equality of means when the underlying
assumptions of normality and equality of variance are satisfied. Thus the t-test tests if the two
samples have been drawn from identical normal population. The Mann-Whitney U test is its
generalization.
Explorable.com (Apr 27, 2009). Mann-Whitney U-Test. Retrieved from Explorable.com:

https://explorable.com/mann-whitney-u-test
74
7.3 Wilcoxon Signed Rank Test
The Wilcoxon Signed Rank Test is a non-parametric statistical test for testing
hypothesis on median.
The test has two versions: "single sample" and "paired samples / two samples".
Single Sample
The first version is the analogue of independent one sample t-test in the non parametric
context. It uses a single sample and is recommended for use whenever we desire to test a
hypothesis about population median.
m0 = the specific value of population median
The null hypothesis here is of the form H0 : m = m0 , where m0 is the specific value of
population median that we wish to test against the alternative hypothesis H1 : m ≠ m0 .
For example, let us suppose that the manager of a boutique claims that median income his
clients is $24,000/- per annum. To test if this is tenable, the analyst will obtain the yearly
income of a sample of his clients and test the null hypothesis H0 : m0 = 24,000.
Paired Samples
The second version of the test uses paired samples and is the non parametric analogue of
dependent t-test for paired samples.
This test uses two samples but it is necessary that they should be paired. Paired samples
imply that each individual observation of one sample has a unique corresponding member in
the other sample.
An Example - Paired Samples
For example, suppose that we have a sample of weights of n obese adults before they are
subjected to a change of diet.
75
After a lapse of six months, we would like to test whether there has been any significant loss
in weight as a result of change in diet. One could be tempted to straightaway use the
dependent t-test for paired samples here.
However that test has certain assumption notable among them being normality. If this
normality assumption is not satisfied, one would have to go for the non parametric Wilcoxon
Signed Rank Test.
The null hypothesis then would be that there has been no significant reduction in median
weight after six months against the alternative that medians before and after significantly differ.
Normality Assumptions is not Required

Most of the standard statistical techniques can be used provided certain standard
assumptions such as independence, normality etc. are satisfied.
Often these techniques cannot be used if the normality assumption is not satisfied. Among
others, the t-test requires this assumption and it is not advisable to use it if this assumption is
violated.
Advantages
The advantage with Wilcoxon Signed Rank Test is that it neither depends on the form of the
parent distribution nor on its parameters. It does not require any assumptions about the shape
of the distribution.
For this reason, this test is often used as an alternative to t test's whenever the population
cannot be assumed to be normally distributed. Even if the normality assumption holds, it has
been shown that the efficiency of this test compared to t-test is almost 95%.
Let us illustrate how signed ranks are created in one-sample case by considering the example
explained above. Assume that a sample of yearly incomes of 10 customers was collected.
The null hypothesis to be tested is H0 : m = 24,000.
We first calculate the deviations of the given observations from 24,000 and then rank them in
order of magnitude. This has been done in the following table:
76
Income Deviation Signed Ranks
23,928 -72 -1
24,500 500 5.5
23,880 -120 -2
24,675 675 7
21,965 -2035 -10
22,900 -1100 -9
23,500 -500 -5,5
24,450 450 4
22,998 -1002 -8
23,689 -311 -3
The deviations are ranked in increasing order of absolute magnitude and then the ranks are
given the signs of the corresponding deviations.
In the above table the difference 500 occurs twice. In such a case, we assign a common rank
which is the arithmetic mean of their respective ranks. Hence 500 was assigned the rank
which is the arithmetic mean of 5 and 6.
In a two sample case, the ranks are assigned in a similar way. The only difference is that in a
two sample case we first find out the differences between the corresponding observations of
the samples and then rank them in increasing order of magnitude.
The ranks are then given the sign of the corresponding differences.
Explorable.com (Mar 15, 2009). Wilcoxon Signed Rank Test. Retrieved from Explorable.com:
https://explorable.com/wilcoxon-signed-rank-test
77
8 Other Ways to Analyse Data
8.1 Chi Square Test
Any statistical test that uses the chi square distribution can be called chi square test. It
is applicable both for large and small samples-depending on the context.
For example suppose a person wants to test the hypothesis that

success rate in a particular English test is similar for indigenous and
immigrant students.
If we take random sample of say size 80 students and measure both indigenous/immigrant as
well as success/failure status of each of the student, the chi square test can be applied to test
the hypothesis.
There are different types of chi square test each for different purpose. Some of the popular
types are outlined below.
Tests for Different Purposes

1. Chi square test for testing goodness of fit is used to decide whether there is any
difference between the observed (experimental) value and the expected (theoretical)
value.
For example given a sample, we may like to test if it has been drawn from a normal
population. This can be tested using chi square goodness of fit procedure.
2. Chi square test for independence of two attributes. Suppose N observations are
considered and classified according two characteristics say A and B. We may be
interested to test whether the two characteristics are independent. In such a case, we
can use Chi square test for independence of two attributes.
78
The example considered above testing for independence of success in the English test
vis a vis immigrant status is a case fit for analysis using this test.
3. Chi square test for single variance is used to test a hypothesis on a specific value of the
population variance. Statistically speaking, we test the null hypothesis H0: σ = σ0 against
the research hypothesis H1: σ # σ0 where σ is the population mean and σ0 is a specific value
of the population variance that we would like to test for acceptance.
In other words, this test enables us to test if the given sample has been drawn from a
population with specific variance σ0. This is a small sample test to be used only if sample
size is less than 30 in general.
Assumptions
The Chi square test for single variance has an assumption that the population from which the
sample has been is normal. This normality assumption need not hold for chi square goodness
of fit test and test for independence of attributes.
However while implementing these two tests, one has to ensure that expected frequency in
any cell is not less than 5. If it is so, then it has to be pooled with the preceding or succeeding
cell so that expected frequency of the pooled cell is at least 5.
Non Parametric and Distribution Free

It has to be noted that the Chi square goodness of fit test and test for independence of
attributes depend only on the set of observed and expected frequencies and degrees of
freedom. These two tests do not need any assumption regarding distribution of the parent
population from which the samples are taken.
Since these tests do not involve any population parameters or characteristics, they are also
termed as non parametric or distribution free tests. An additional important fact on these two
tests is they are sample size independent and can be used for any sample size as long as the
assumption on minimum expected cell frequency is met.
Explorable.com (Sep 24, 2009). Chi Square Test. Retrieved from Explorable.com:
https://explorable.com/chi-square-test
79
8.2 Z-Test
Z-test is a statistical test where normal distribution is applied and is basically used for
dealing with problems relating to large samples when n ≥ 30.
n = sample size
For example suppose a person wants to test if both tea & coffee are equally popular in a
particular town. Then he can take a sample of size say 500 from the town out of which
suppose 280 are tea drinkers. To test the hypothesis, he can use Z-test.
Z-Test's for Different Purposes

There are different types of Z-test each for different purpose. Some of the popular types are
outlined below:
1. z test for single proportion is used to test a hypothesis on a specific value of the
population proportion.
Statistically speaking, we test the null hypothesis H0: p = p0 against the alternative
hypothesis H1: p >< p0 where p is the population proportion and p0 is a specific value of
the population proportion we would like to test for acceptance.
The example on tea drinkers explained above requires this test. In that example, p0 =
0.5. Notice that in this particular example, proportion refers to the proportion of tea
drinkers.
2. z test for difference of proportions is used to test the hypothesis that two populations
have the same proportion.
For example suppose one is interested to test if there is any significant difference in the
habit of tea drinking between male and female citizens of a town. In such a situation, Z-
test for difference of proportions can be applied.
One would have to obtain two independent samples from the town- one from males and
the other from females and determine the proportion of tea drinkers in each sample in
80
order to perform this test.
3. z -test for single mean is used to test a hypothesis on a specific value of the population
mean.
Statistically speaking, we test the null hypothesis H0: μ =0μagainst the alternative
hypothesis H1: μ ><0μwhere μ is the population mean and0μis a specific value of the
population that we would like to test for acceptance.
Unlike the t-test for single mean, this test is used if n ≥ 30 and population standard
deviation is known.
4. z test for single variance is used to test a hypothesis on a specific value of the
population variance.
Statistically speaking, we test the null hypothesis H0: σ =0σagainst H1: σ ><0σwhere σ is the
population mean and σ0 is a specific value of the population variance that we would like
to test for acceptance.
In other words, this test enables us to test if the given sample has been drawn from a
population with specific variance σ0. Unlike the chi square test for single variance, this
test is used if n ≥ 30.
5. Z-test for testing equality of variance is used to test the hypothesis of equality of two
population variances when the sample size of each sample is 30 or larger.
Assumption
Irrespective of the type of Z-test used it is assumed that the populations from which the
samples are drawn are normal.
Explorable.com (Aug 13, 2009). Z-Test. Retrieved from Explorable.com:

https://explorable.com/z-test
81
8.3 F-Test
Any statistical test that uses F-distribution can be called a F-test. It is used when the
sample size is small i.e. n < 30.
For example suppose one is interested to test if there is any significant

difference between the mean height of male and female students in a
particular college. In such a situation, t-test for difference of means can
be applied.
However one assumption of t-test is that the variance of the two
populations is equal- here two populations are the population of heights
of male and female students. Unless this assumption is true, the t-test
for difference of means cannot be carried out.
The F-test can be used to test the hypothesis that the population variances are equal.
F-test's for Different Purposes

There are different types of t-tests each for different purpose. Some of the popular types are
outlined below.
1. F-test for testing equality of variance is used to test the hypothesis of equality of two
population variances. The example considered above requires the application of this test.
2. F-test for testing equality of several means. Test for equality of several means is carried
out by the technique named ANOVA.
For example suppose that the efficacy of a drug is sought to be tested at three levels
say 100mg, 250mg and 500mg. A test is conducted among fifteen human subjects taken
at random- with five subjects being administered each level of the drug.
To test if there are significant differences among the three levels of the drug in terms of
efficacy, the ANOVA technique has to be applied. The test used for this purpose is the F-
test.
3. F-test for testing significance of regression is used to test the significance of the
regression model. The appropriateness of the multiple regression model as a whole can
be tested by this test. A significant F indicates a linear relationship between Y and at
82
least one of the X's.
Assumptions
Irrespective of the type of F-test used, one assumption has to be met. The populations from
which the samples are drawn have to be normal. In the case of F-test for equality of variance,
a second assumption has to be satisfied in that the larger of the sample variances has to be
placed in the numerator of the test statistic.
Like t-test, F-test is also a small sample test and may be considered for use if sample size is <
30.
Deciding
In attempting to reach decisions, we always begin by specifying the null hypothesis against a
complementary hypothesis called alternative hypothesis. The calculated value of the F-test
with its associated p-value is used to infer whether one has to accept or reject a null
hypothesis.
All software's provide these p-values. If the associated p-value is small i.e. (<0.05) we say that
the test is significant at 5% and one may reject the null hypothesis and accept the alternative
one.
On the other hand if associated p-value of the test is >0.05, one may accept the null
hypothesis and reject the alternative. Evidence against the null hypothesis will be considered
very strong if p-value is less than 0.01. In that case, we say that the test is significant at 1%.
Explorable.com (Jul 5, 2010). F-Test. Retrieved from Explorable.com:

https://explorable.com/f-test
83
8.4 Factor Analysis
Factor analysis is a statistical approach that can be used to analyze large number of
interrelated variables and to categorize these variables using their common aspects.
The approach involves finding a way of representing correlated variables together to form a
new smaller set of derived variables with minimum loss of information. So, it is a type of a
data reduction tool and it removes redundancy or duplication from a set of correlated variables.
Also, factors are formed that are relatively independent of one another. But since it require the
data to be correlated, so all assumptions that apply to correlation are relevant here.
Main Types
There are two main types of factor analysis. The two main types are:
Principal component analysis - this method provides a unique solution so that the
original data can be reconstructed from the results. Thus, this method not only provides
a solution but also works the other way round, i.e., provides data from the solution. The
solution generated includes as many factors as there are variables.
Common factor analysis - this technique uses an estimate of common difference or
variance among the original variables to generate the solution. Due to this, the number
of factors will always be less than the number of original factors. So, factor analysis
actually refers to common factor analysis.
Main Uses
The main uses of factor analysis can be summarized as given below. It helps us in:
Identification of underlying factors- the aspects common to many variables can be

identified and the variables can be clustered into homogeneous sets. Thus, new sets of
variables can be created. This allows us to gain insight to categories.
Screening of variables- it helps us to identify groupings so that we can select one
variable to represent many.
Example
84
Let us consider an example to understand the use of factor analysis.
Suppose we want to know whether certain aspects such as “task skills”

and “communication skills” attribute to the quality of “leadership” or
not. We prepare a questionnaire with 20 items, 10 of them pertaining to
task elements and 10 to communication elements.
Before using the questionnaire on the sample we use it on a small
group of people, who are like those in the survey. When we analyze the
data we try to see if there are really two factors and if those factors
represent the aspects of task and communication skills.
In this way, factors can be found to represent variables with similar aspects.
Explorable.com (Jul 7, 2009). Factor Analysis. Retrieved from Explorable.com:

https://explorable.com/factor-analysis
85
8.5 ROC Curve Analysis
In almost all fields of human activity, there is often a need to discriminate between
good and bad, presence and absence. Various tests have been designed to meet this
objective. The ROC curve technique has been designed to attain two objectives in this
regard.
ROC - Receiver Operating

Characteristic
Various tests have been designed to meet this objective. The ROC curve technique has been
designed to attain two objectives in this regard.
First, it can be used to calibrate (in some sense) a test so that it is able to perform the
discrimination activity well. Second, it can be used to choose between tests and specify best
among them.
What is ROC?
If one applies to a bank for credit, it is most likely that the bank will calculate a credit score out
of the applicant’s background. A higher score could indicate a good customer with minimal
chance of default. The banker could refuse credit if the score is low. Often a credit score cut-
off is used below which the application is rejected.
It is not difficult to see that there is always an element of risk here – risk of committing two
types of errors. A good prospective customer (one who would not default) could be refused
credit by the bank and a bad one could be approved credit. Clearly the banker would like the
cut-off be fixed in a manner that chances of both the errors are minimized if not entirely
eliminated.
The ROC Curve was

invented during the WW2
to help radars detect weak
signals from aircrafts
86
While complete elimination is impossible, the ROC curve analysis is a technique which
contributes to this endeavour. A related problem is the question of choosing between methods
of identifying good/bad customers should there be a choice. The ROC curve analysis
technique can be of use even here.
The Plot
In order to draw the ROC curve, the concepts of ‘Sensitivity’ and ‘Specificity’ are used – the
curve actually is the plot of sensitivity (in the y axis) against 1- specificity (in the x axis) for
different values of the cut-off.
To understand these concepts, assume that we select a sample of z customers of the bank by
retrospective sampling method. Further suppose that m and n of these are good and bad
(defaulting) customers respectively (m+n=z). Next, we use the credit scale on these
customers and calculate their credit scores. Then we use the cut-off and label customers
good or bad according to whether the credit score is above or below the cut-off. Out of m
good customers, the test classified x of them as good while the remaining m-x were classified
as bad.
In the parlance of ROC curve, x is termed as TP (for true positive meaning that the credit
scale was able to identify these customers as good correctly) while m-x in termed as FN (for
false negative). Further suppose that out of n bad customers, the test classified y of them as
bad while the remaining n-y were classified as good. In ROC parlance, y is termed as TN (for
true negative) while n-y in termed as FP (for false positive).
Actual vs Credit Scale
87
Actual status of costumers
Good Bad
As predicted by credit scale Good x n-y
Bad m-x y
Total m n
Sensitivity and Specificity

The probability that among the good customers the test will identify a customer as good is
known as ‘Sensitivity’ of the test for that cut-off (given by x/m). On the other hand, among the
bad customers, the probability that the test will identify a customer as bad is known as
‘Specificity’ (given by y/n) of the test again for the same cut-off.
Shape of a Good Curve

To draw the curve, the sensitivity and specificity are determined for a range of cut-offs. Then
sensitivity and 1-specificity are plotted. A good test is one in which the curve is closer to the
upper left corner.
So far so good but how does one determine the optimal cut-off? The banker would like to
determine that cut-off for which sensitivity is high and 1-specificity is low – ideally 100%
sensitivity with 100% specificity. That is easier said than done as the best of the curves is not
a vertical line but one which rises steeply initially and then slowly. The highest point on the
curve has 100% sensitivity and 0% specificity. In other words as one of sensitivity or
specificity increases, the other decreases and vice versa. The problem of determining the
ideal cut-off is to choose one depending upon the extent of sensitivity and specificity that the
decision maker is comfortable with. Having thus fixed a cut-off, the banker can then use it for
evaluating fresh credit applications.
The fact that the best of the tests has a curve which rises steeply initially is used to choose
between tests. A test can be called the best if its corresponding ROC curve is higher than
others.
Explorable.com (Jun 25, 2010). ROC Curve Analysis. Retrieved from Explorable.com:
https://explorable.com/roc-curve-analysis
88
8.6 Meta Analysis
Meta analysis is a statistical technique developed by social scientists, who are very
limited in the type of experiments they can perform.
Social scientists have great difficulty in designing and implementing true experiments, so
meta-analysis gives them a quantitative tool to analyze statistically data drawn from a number
of studies, performed over a period of time.
Medicine and psychology increasingly use this method, as a way of avoiding time-consuming
and intricate studies, largely repeating the work of previous research.
What is Meta-Analysis?
Social studies often use very small sample sizes, so any statistics used generally give results
containing large margins of error.
This can be a major problem when interpreting and drawing conclusions, because it can mask
any underlying trends or correlations. Such conclusions are only tenuous, at best, and leave
the research open for criticism.
Meta-analysis is the process of drawing from a larger body of research, and using powerful
statistical analyzes on the conglomerated data.
This gives a much larger sample population and is more likely to generate meaningful and
usable data.
The Advantages of Meta-Analysis

Meta-analysis is an excellent way of reducing the complexity and breadth of research,
allowing funds to be diverted elsewhere. For rare medical conditions, it allows researchers to
collect data from further afield than would be possible for one research group.
89
As the method becomes more common, database programs have made the process much
easier, with professionals working in parallel able to enter their results and access the data.
This allows constant quality assessments and also reducing the chances of unnecessary
repeat research, as papers can often take many months to be published, and the computer
records ensure that any researcher is aware of the latest directions and results.
The field of meta study is also a lot more rigorous than the traditional literature review, which
often relies heavily upon the individual interpretation of the researcher.
When used with the databases, a meta study allows a much wider net to be cast than by the
traditional literature review, and is excellent for highlighting correlations and links between
studies that may not be readily apparent as well as ensuring that the compiler does not
subconsciously infer correlations that do not exist.
The Disadvantages of Meta-Analysis

There are a number of disadvantages to meta-analysis, of which a researcher must be aware
before relying upon the data and generated statistics.
The main problem is that there is the potential for publication bias and skewed data.
Research generating results not refuting a hypothesis may tend to remain unpublished, or
risks not being entered into the database. If the meta study is restricted to the research with
positive results, then the validity is compromised.
The researcher compiling the data must make sure that all research is quantitative, rather
than qualitative, and that the data is comparable across the various research programs,
allowing a genuine statistical analysis.
It is important to pre-select the studies, ensuring that all of the research used is of a sufficient
quality to be used.
One erroneous or poorly conducted study can place the results of the entire meta-analysis at
risk. On the other hand, setting almost unattainable criteria and criteria for inclusion can leave
the meta study with too small a sample size to be statistically relevant.
Striking a balance can be a little tricky, but the whole field is in a state of constant
development, incorporating protocols similar to the scientific method used for normal
quantitative research.
Finding the data is rapidly becoming the real key, with skilled meta-analysts developing a skill-
set of library based skills, finding information buried in government reports and conference
90
data, developing the knack of assessing the quality of sources quickly and effectively.
Conclusions and the Future

Meta-analysis is here to stay, as an invaluable tool for research, and is rapidly gaining
momentum as a stand-alone discipline, with practitioners straddling the divide between
statisticians and librarians.
The conveniences, as long as the disadvantages are taken into account, are too apparent to
ignore, and a meta study can reduce the need for long, expensive and potentially intrusive
repeated research studies.
Martyn Shuttleworth (Jun 21, 2009). Meta Analysis. Retrieved from Explorable.com:
https://explorable.com/meta-analysis
Thanks for reading!

Explorable.com Team
Explorable.com - Copyright © 2008-2015 All Rights Reserved.
91

Statistical Tests

Uploaded by

Copyright:

Available Formats

Statistical Tests

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Tests

Uploaded by

Copyright:

Available Formats

Published on Explorable.com (https://explorable.

8 Other Ways to Analyse Data

Cover design by Explorable / Gilli.me.

Statistical hypothesis testing is used to determine whether an experiment conducted

The Critical Value

It is very important to understand relationship between variables to draw the right

Relationships in Physical and Social Sciences

Positive and Negative Correlation

How to cite this article:

Some Examples of Linear Relationships

Mass = density x volume

How to cite this article:

Non-linear relationship is fundamental to most physical and statistical phenomena and

Examples of Non-Linear Relationships

Studying Non-Linear Relationships

How to cite this article:

Statistical correlation is a statistical technique which tells us if two variables are

Relationship Between Variables

1. whether the relationship is positive or negative

Correlation is a powerful tool that provides these vital pieces of information.

How to cite this article:

Explorable.com (May 2, 2009). Statistical Correlation. Retrieved from Explorable.com:

Pearson Product-Moment Correlation is one of the measures of correlation which

1. the variables are in the interval or ratio scale of measurement

Positive and Negative Correlation

1. It is independent of the units of measurement. It is in fact unit free. For example, ρ

Caveats and Warnings

Under such circumstances it is possible that a non linear relationship exists.

2. One has to be careful in interpreting the value of ρ.

How to cite this article:

Explorable.com (Oct 8, 2009). Pearson Product-Moment Correlation. Retrieved from

Spearman Rank Correlation Coefficient is a non-parametric measure of correlation,

It is also known as the

Interpretation of Numerical Values

rs > 0 implies positive agreement among ranks

Closer rs is to 1, better is the agreement while rs closer to -1 indicates strong agreement in

Suppose that scores of the judges (out of 10 were as follows):

Advantages and Caveats

How to cite this article:

What Correlation does not Provide

How to cite this article:

Explorable.com (Oct 9, 2010). Partial Correlation Analysis. Retrieved from Explorable.com:

Correlation and causation, closely related to confounding variables, is the incorrect

Patterns of Causality in the Mind

The Sensationalism of the Media

A survey, as reported in a British newspaper, involved questioning a group of teenagers about

Emotive Bias Influences Causality

The Cost of Disregarding Correlation and Causation

Smoking Parents Cause Teenage Delinquency, (2006, August 21). Criticalthinking.org.uk.

How to cite this article:

4.1 Linear Regression Analysis

Dependent and Independent Variables

Two Lines of Regression

On the other hand, the line of regression of X on Y is given by X = c + dY which is used to

Choice of Line of Regression

How to cite this article:

Explorable.com (Mar 7, 2009). Linear Regression Analysis. Retrieved from Explorable.com:

Dependent and Independent Variables

The Multiple Regression Model