Chapter 4 Correlational Analysis

CHAPTER IV:
CORRELATIONAL ANALYSIS
Topic Outline:
1. Introduction
2. Hypothesis Testing for Correlation
3. Pearson Product-Moment Correlation (Pearson r)
4. Spearman Rank Correlation (Spearman rho, )
5. Gamma Correlation (G)
6. Point-Biserial Correlation (rpb)
7. Lambda Correlation ()
8. Chi-Square (2) Tests
Learning Outcomes:
At the end of the unit, the students must have:
1. discussed the conditions imposed by each measure of
relationship/associations;
2. computed and interpreted each measure of relationship;
3. performed hypothesis testing involving each of the measures of relationship;
and
4. differentiated multiple from partial correlation.
Prepared by:
Prof. Jeanne Valerie Agbayani-Agpaoa
STAT 201: Statistical Methods I
Dr. Virgilio Julius P. Manzano, Jr.
Engr. Lawrence John C. Tagata
CHAPTER IV: CORRELATIONAL ANALYSIS
TOPIC 1: INTRODUCTION
Measure of correlation or relationship is used to find the amount and degree of relationship or the absence
of relationship between two sets of values, characteristics or variables. This relationship is expressed by a factor
called Coefficient of Correlation. It may be expressed as an abstract number. It is the ratio of two values, or series
of values, or variables being compared. It can also be expressed in percent.
Correlation is a measure of degree of relationship between paired data. All statistical research aim to
establish relationship between paired variables to enable the researcher to predict one variable in terms of the
other variable. For example, grades in Science and English tend to be related to high grades in Mathematics.
However, in some instances, there may be weak or none at all, such as the bulk of sales of candy tend to be
unrelated to the rate of crime in a particular place. It must be remembered that correlation does not determine the
cause and effect of the relationship, but rather it merely focuses on the strength of the relationship between paired
data.
Simple correlation is amenable to either ungrouped or grouped data, for nominal, ordinal, or interval
scales of data. Usually, however, rank correlation is aptly applied to ordinal data when the number of items or
cases is rather small (less than 30).
The term correlation refers to the association which occurs between two or more statistical series of
values. The coefficient of correlation which represents correlation values shows the extent to which two variables
are related and to what extent variations in one group of data go the variations in the other. Coefficient of
correlation is a single number that tells us to what extent two values are related. It can vary from 1.00 which means
perfect positive correlation through 0, which means no correlation at all, and -1.00 which means perfect negative
correlation.
Perfect correlation refers to direct relationship between any two sets of data in that any increase in the
values of the first set of data will correspondingly generate in a corresponding increase or decrease in the second
set of data, respectively. When correlation is negative, an inverse behaviour of data is observed, that is, a decrease
in values of a first set of data will result in an increase in the second set being compared or vice versa. When there
is a minimal or even zero change at one time or another between two sets of data being correlated, there is little
or no correlation at all.
The coefficient of correlation does not give directly anything like a percentage of relationship. It cannot
be concluded that a correlation value of 0.50 indicates twice the relationship that is indicated by a correlation
value of 0.25. A coefficient of relationship is an index number, not a measurement on an interval scale. Moreover,
we cannot compute a coefficient of correlation from just two measurements on one person alone.
Coefficient of correlation has some uses which are as follows:

1. It indicates the amount of agreement between scores on any two sets of data. It is an index of the
predictive value of a test.
2. It is a form or reliability coefficient which can be obtained by correlating scores of two alternatives or
parallel forms of the same test.
3. The correlation value is always relative to the situation under which it is obtained and should be
interpreted in the light of those circumstances. Its size does not represent absolute natural facts.
51
Below is a guide in interpreting Coefficient of Correlation:

-1.00 Perfect Negative Correlation
-0.99 to -0.75 Very High Negative Correlation
-0.74 to -0.50 High Negative Correlation
-0.49 to -0.25 Moderately Small Negative Correlation
-0.24 to -0.01 Very Small Negative Correlation
0 No Correlation
+0.01 to +0.24 Very Small Positive Correlation
+0.25 to +0.49 Moderately Small Positive Correlation
+0.50 to +0.74 High Positive Correlation
+0.74 to +0.99 Very High Positive Correlation
+1.00 Perfect Positive Correlation
Anybody who wants to interpret the result of the coefficient of correlation should be guided by the
following reminders:
1. The relationship of two variables does not necessarily mean that one is the cause or the effect of the other
variable. It does not imply cause-effect relationship.
2. When the computed r is high, it does not necessarily mean that one factor is strongly dependent on the
other. This is shown by height and intelligence of people. Making a correlation here does not make any
sense at all. On the other hand, when the computed r is small, it does not necessarily mean that one factor
has no dependence on the other factor. This may be applicable to IQ and grades in school. A low grade
would suggest that a student did not make use of his time in studying.
3. If there is a reason to believe that the two variables are related and the computed r is high, these two
variables are really meant to be associated. On the other hand, if the variables correlated are low (though
theoretically related), other factors might be responsible for such small associations.
4. Lastly, the meaning of correlation coefficient simply informs us that when two variables change, there
may be a strong or weak relationship taking place.
TOPIC 2: HYPOTHESIS TESTING FOR CORRELATIONS
It is often useful to test the hypotheses

Ho: 𝜌 = 0. (There is no significant relationship)
Ha: 𝜌 ≠ 0. (There is a significant relationship)
Test Statistic 𝑹√𝒏 − 𝟐

for Zero 𝑻𝟎 =
Correlation √𝟏 − 𝑹𝟐
which has the t distribution with 𝑛 − 2 degrees of freedom if Ho: 𝜌 = 0 is true.
Therefore, we would reject the null hypothesis if
|𝒕𝟎 | > 𝒕𝜶,𝒏−𝟐
𝟐
52
TOPIC 3: PEARSON PRODUCT MOMENT COEFFICIENT OF CORRELATION

This is a linear correlation necessary to find the degree of the association of two sets of variables, x and
y. this is the most commonly used measure of correlation to determine the relationship between two sets of
variables quantitatively.
For any two variables, x and y, the correlation coefficient between them can be determined using Pearson Product
Moment Coefficient of Correlation:
𝒏 ∑ 𝒙𝒚 − [(∑ 𝒙)(∑ 𝒚)]

𝒓𝒙𝒚 =
√[(𝒏 ∑ 𝒙𝟐 ) − (∑ 𝒙)𝟐 ][[(𝒏 ∑ 𝒚𝟐 ) − (∑ 𝒚)𝟐 ]]
Example: Consider the values of x and y on the descriptive problem, “What is the relationship
between the NSAT percentile rank and the scholastic rating of BS Physics students in
selected universities and colleges in a certain region?
Student 1 2 3 4 5 6 7 8 9 10
NSAT Percentile Rank, x 60 73 61 70 75 79 65 67 77 80
Scholastic Rating, y 78 87 80 86 87 90 85 84 89 90
NSAT
Scholastic
Percentile
Student Rating, x2 y2 xy
Rank,
y
x
1 60 78 3,600 6,084 4,680
2 73 87 5,329 7,569 6,351
3 61 80 3,721 6,400 4,880
4 70 86 4,900 7,396 6,020
5 75 87 5,625 7,569 6,525
6 79 90 6,241 8,100 7,110
7 65 85 4,225 7,225 5,525
8 67 84 4,489 7,056 5,628
9 77 89 5,929 7,921 6,853
10 80 90 6,400 8,100 7,200
Totals 707 856 50,459 73,420 60,772
𝑛 ∑ 𝑥𝑦 − [(∑ 𝑥)(∑ 𝑦)] (10 ∗ 60,772) − (707 ∗ 856)

𝑟𝑥𝑦 = =
√[(𝑛 ∑ 𝑥 2 ) − (∑ 𝑥)2 ][[(𝑛 ∑ 𝑦 2 ) − (∑ 𝑦)2 ]] √((10 ∗ 50,459) − 7072 ) ∗ ((10 ∗ 73,420) − 8562 )
𝒓𝒙𝒚 = 𝟎. 𝟗𝟓𝟗𝟔
Interpretation:
The rxy value obtained is 0.9596 which denotes very high positive relationship. This means the higher the NSAT
percentile rank, the higher is the scholastic rating of the BS Physics students.
53
TOPIC 4: SPEARMAN RANK CORRELATION COEFFICIENT OR SPEARMAN RHO (r s)

A Spearman rho correlation of coefficient is a statistic which is used to measure the relationship of paired
ranks assigned to indicate individual scores on two variables. A correlation estimates the degree of association of
two sets of variables in at least an ordinal scale (first, second, third, and so on) so that the subjects under study
may be ranked in a two ordered series. This is commonly used to measure the disarray, ∑ 𝐷 2 , where a coefficient
of rank correlation has a value of +1 when paired ranks are in similar order and a value of -1 when paired ranks
are in the reverse order.
Spearman rho is the most widely used of the rank correlation methods. It is much easier and therefore,
faster to compute. This is for 30 cases or less only.
To obtain the Spearman rho (rs), consider the formula:
𝟔 ∑ 𝑫𝟐
𝒓𝒔 = 𝟏 − 𝟑
𝒏 −𝒏
where:
rs = Spearman rho
∑ 𝐷2 = sum of the squared difference between ranks
n = number of cases/measurements
Example: Consider the specific problem: “What is the rank relationship between capital and profit of light bulbs?”
Capital, x, Profit, y Rx Ry D = |Rx – Ry| D2
1 20,000 5,000 6 7 1 1
2 50,000 15,000 3 3.5 0.5 0.25
3 10,000 3,000 9 9.5 0.5 0.25
4 100,000 30,000 2 2 0 0
5 15,000 4,000 7 8 1 1
6 25,000 9,000 5 5 0 0
7 11,000 6,000 8 6 2 4
8 150,000 70,000 1 1 0 0
9 5,000 3,000 10 9.5 0.5 0.25
10 40,000 15,000 4 3.5 0.5 0.25
TOTAL 7.0
6 ∑ 𝐷2 6∗7
𝑟𝑠 = 1 − =1− 3 = 𝟎. 𝟗𝟓𝟕𝟔
𝑛3 − 𝑛 10 − 10
54
TOPIC 5: GAMMA (G)

An alternative to the rank-order correlation coefficient is the Goodman’s and Kruskal’s Gamma (G). The
value of one variable can be estimated or predicted from the other variable when you have the knowledge of their
values. The gamma can also be used when ties are found in the ranking of the data. The formula for gamma is:
𝑵𝒔 − 𝑵𝒊
𝑮=
𝑵𝒔 + 𝑵𝒊
where:
G = the difference between the proportion of pairs ordered in the parallel direction and the
proportion off pairs ordered in the opposite direction
Ni = the number of pairs ordered in the opposite direction
Ns = the number of pairs in the parallel direction
Example: Compute for the gamma for the data shown below
Socio-Economic Educational Status
Status Upper Middle Lower
Upper 24 19 5
Middle 12 54 29
Lower 9 26 25
Solution:
Step 1. Arrange the ordering for one of the two characteristics from the highest to the lowest or vice
versa from top to bottom through the rows and for the other characteristics from the highest
to the lowest or vice versa from left to right through the column.
Step 2. Compute Ns by multiplying the frequency in every cell by the series of the frequencies in all
of the other cells which are both to the right of the original cell below it and then sum up the
products obtained.
Ns = 24*(54 + 29 + 26 + 25) + 19*(29 + 25) + 12*(26 + 25) + 54*(25)
Ns = 6,204
Step 3. To solve for Ni, simply reverse partially the process described in Step 2. Multiply the
frequency of every cell by the sum of the frequencies in all of the cells to the left of the
original cell below it, and then sum up the products obtained.
Ni = 19*(12 + 9) + 5*(12 + 54 + 9 + 26) + (54*9) + 29*(9 + 26)
Ni = 2,405
Step 4. Apply gamma formula.

𝑁𝑠 − 𝑁𝑖 6204 − 2405
𝐺= = = 𝟎. 𝟒𝟒𝟏𝟑
𝑁𝑠 + 𝑁𝑖 6204 + 2405
55
TOPIC 6: CORRELATION BETWEEN AN INTERVAL AND NOMINAL DATA:

THE POINT BISERIAL COEFFICIENT OF CORRELATION (rpbi)
There are instances when you are interested in getting the degree of relationship between two variables
where one variable is continuous (e.g. test scores) and the other is a dichotomous variable (e.g. gender). A question
perhaps is, “Is gender related to intelligence?” in this case, the most appropriate statistical technique if the point
biserial correlation, rpbi. The formula is:
(𝒙
̅𝟏 − 𝒙̅𝟐 )
𝒓𝒑𝒃𝒊 = √𝒑𝒒
𝒔𝒅𝒚
where:
𝑟𝑝𝑏𝑖 = Point biserial coefficient of correlation
̅𝟏
𝒙 = mean score of group 1
̅
𝒙𝟐 = mean score of group 2
𝑝 = proportion of group 1
𝑞 = proportion of group 2
𝑠𝑑𝑦 = standard deviation of all the scores
Example:
A researcher wishes to determine if a significant relationship exists between the sex of the worker and if they
experience pain while performing an electronics assembly task. The independent variable is the question which
asks “What is your sex, male or female?” (Dichotomous). The dependent variable is from the question that asks
“How many years have you been performing the task?” (Ratio).
Respondent 1 2 3 4 5 6 7 8 9 10
Sex M M M M F F M F F F
Number of years 10 11 6 11 4 3 12 2 2 1
Males Females
10 4
11 3
6 2
11 2
12 1
Mean 10.0 2.4
Standard Deviation 4.37
(𝑥̅1 − 𝑥̅ 2 ) (10 − 2.4) 5 5

𝑟𝑝𝑏𝑖 = √𝑝𝑞 = √ ∗ = 𝟎. 𝟖𝟔𝟗𝟔
𝑠𝑑𝑦 4.37 10 10
56
TOPIC 7: CORRELATION BETWEEN NOMINAL DATA:

LAMBDA CORRELATION (𝝀𝑪 )
This is represented by the lower-case Greek letter 𝜆 which is also known as Guttman’s coefficient of
predictability. This is defined as the proportionate reduction in error measure which shows the index of how much
an error is reduced in prediction of one variable from one value of another. It is also another way of measuring to
what degree of accuracy of the prediction can be improved. If you have a lambda of 0.80, you have minimized
the error of your prediction about the values of the dependent variable by 80%, if your lambda is 0.30, you have
minimized the error of your prediction by only 30%. The lambda coefficient is a measure of association of
comparing several groups or categories at the nominal level.
Formula:
𝑭𝒃𝒊 − 𝑴𝒃𝒄
𝝀𝒄 =
𝑵 − 𝑴𝒃𝒄
where:
𝑭𝒃𝒊 = the biggest cell frequencies in the ith row (with the sum taken over all of the rows)
𝑴𝒃𝒄 = the biggest of the column totals
𝑵 = the number of observations
However, if your dependent variable is regarded as the row variable, the formula to be used is:
𝑭𝒃𝒋 − 𝑴𝒃𝒓
𝝀𝒓 =
𝑵 − 𝑴𝒃𝒓
where:
𝑭𝒃𝒋 = the biggest cell frequencies in the jth column (with the sum taken over all of the columns)
𝑴𝒃𝒓 = the biggest of the row totals
𝑵 = the number of observations
Example: Compute 𝝀𝒄 and 𝝀𝒓 for the data in the table below.
A Segment of the Filipino Electorate according to Religion and Political Party

Political Party
Religion PPC LDP Independent TOTAL
Catholic 49 25 18 92
Iglesia ni Cristo 34 72 21 127
Protestant 26 25 20 71
TOTAL 109 122 59 290
𝐹𝑏𝑖 − 𝑀𝑏𝑐 (49 + 72 + 26) − 122

𝜆𝑐 = = = 𝟎. 𝟏𝟒𝟖𝟖
𝑁 − 𝑀𝑏𝑐 290 − 122
𝐹𝑏𝑗 − 𝑀𝑏𝑟 (49 + 72 + 21) − 127

𝜆𝑟 = = = 𝟎. 𝟎𝟗𝟐𝟎
𝑁 − 𝑀𝑏𝑟 290 − 127
57
TOPIC 8: CHI-SQUARE DISTRIBUTION, 𝝌𝟐
Chi-square distribution was discovered by Karl Pearson. The distribution was introduced to determine
whether or not discrepancies between observed and theoretical counts were significant. The test used to find out
how well an observed frequency distribution conforms to or fits some theoretical frequency distribution is referred
to as a “goodness of fit test”.
Also, chi-square distribution can be used to test the normality of any distribution. Testing a hypothesis
made about several population proportions are sometimes considered. In this section, a discussion for testing the
normality with the use of chi-square is being emphasized.
On the other hand, tables representing rows and columns are often called contingency tables. This
particular topic is equally important. It helps us determine whether the two classifications of variances are
independent. The value of chi-square varies from each number of degrees of freedom, one of the assumptions that
apply for a contingency table is to have 5 expected frequencies for every one of the X categories.
USES OF CHI-SQUARE
1. Chi-square is used in descriptive research if the researcher wants to determine the significant difference
between the observed and the expected or theoretical frequencies from independent variables.
2. It is used to test the goodness of fit where a theoretical distribution is fitted to some data, i.e., the fitting
of a normal curve.
3. It is used to test the hypothesis that the variances of a normal population are equal to a given value.
4. It is also used for the construction of confidence interval for variances.
5. It is used to compare two uncorrelated and correlated proportions.
DEGREES OF FREEDOM FOR THE CHI-SQUARE

The degree of freedom involved in the one-variable chi-square is determined by this formula:
𝒅𝒇 = 𝒌 − 𝟏, where k is the number of categories. On the other hand, the degrees of freedom to use in the two-
variable chi-square is determined by the formula df = (𝒓 − 𝟏) ∗ (𝒄 − 𝟏), where r is the number of rows and c is
the number of columns.
Using the degree of freedom, we can use the table of chi-square values in order to compare our obtained
𝝌𝟐 value. If our computed 𝝌𝟐 is equal or greater than the table value, in the degree of freedom required
and the probability level chosen, our chi-square value is significant and the null hypothesis earlier set is
rejected.
TESTING GOODNESS OF FIT

Testing goodness of fit can be used to test how well an observed frequency distribution fits to some
theoretical frequency distribution. For example,
Example 1: Suppose we want to test the claim that the fatal accidents occur at the different widths of the
road.
Width of the Road

4.0 to 4.5 m 4.6 to 5.0 m 5.1 to 5.5 m 5.6 to 6.0 m
Number of Accidents 95 90 83 73
(𝑶 − 𝑬)𝟐
𝝌𝟐 =
𝑬
where
𝜒2 = chi-square
O = observed frequency
E = expected frequency
Observed frequency 95 90 83 73
Expected frequency 85.25 85.25 85.25 85.25
58
Ho: Fatal accidents do not occur at the different widths of the road.
Ha: Fatal accidents occur at the different widths of the road
(𝑂 − 𝐸)2 (95 − 85.25)2 (90 − 85.25)2 (83 − 85.25)2 (73 − 85.25)2

𝜒2 = ∑ = + + +
𝐸 85.25 85.25 85.25 85.25
𝝌𝟐 = 𝟑. 𝟏𝟗𝟗𝟒
The tabular value of 𝜒2 at 0.05 level of significance with degrees of freedom of 𝑑𝑓 = 𝑘 − 1 = 4 − 1 = 3 is 7.815.
Since the computed value is less than the critical value of 𝜒2 , the null hypothesis is not rejected.
Thus, we can say that fatal accidents do not occur at the different widths of the road.
Example 2: Students from MMSU claim that among the four most popular flavors of ice cream, students
have these preference rates: 58% prefer Double Dutch, 25% prefer Rocky Road, 12% prefer
chocolate mocha and 5% prefer vanilla. A random sample of 300 students was chosen. Test the
claim that the percentages given by the students are correct. Use 0.01 significance level.
Flavor Number of Students
Double Dutch 123
Rocky Road 72
Chocolate Mocha 55
Vanilla 50
Solution:
Ho: The claim of the students is correct, that is P1=0.58 and P2=0.25 and P3=0.12 and P4=0.05.
Ha: At least one of the proportions is not equal to the value claimed.
Double Rocky Chocolate Vanilla

Dutch Road Mocha
Observed frequency 123 72 55 50
Preference Rate 58% 25% 12% 5%
Expected frequency 71.34 18 6.6 2.5
(𝑂 − 𝐸)2 (123 − 71.34)2 (72 − 18)2 (55 − 6.6)2 (50 − 2.5)2

𝜒2 = ∑ = + + +
𝐸 71.34 18 6.6 2.5
𝟐
𝝌 = 𝟏, 𝟒𝟓𝟔. 𝟖𝟒
The critical value of 𝜒2 at 0.01 level of significance with degrees of freedom of 𝑑𝑓 = 𝑘 − 1 = 4 − 1 = 3 is

11.345.
Since the computed value is greater than the critical value of 𝜒2 , the null hypothesis is rejected.
Thus, we can say that at least one of the proportions is not equal to the value claimed.
TESTING THE NORMALITY

Many statistical tests require normality in the distribution. Chi-square is one of the tests that can be used
to determine if the distribution is normal. Summary of the steps on how to apply the Chi-square for testing the
normality are listed below:
Step 1: Use the mean and the standard deviation of the sample to estimate the mean and the standard
deviation of the population if not known or assumed.
Step 2: Group the sample data into class intervals or categories.
Step 3: Calculate for the z-values for the class boundaries.
Step 4: Determine the area under the standard normal curve between z-values to obtain the hypothesis
proportion of the sample in each class.
Step 5: Multiply each proportion by the total number of observations to obtain FE.
Step 6: Compute for the 𝝌𝟐 .
59
Remarks:
1. The hypothesis being tested is that the sample came from a population that has a normal distribution.
2. The degrees of freedom for the chi-square test is 𝑘 − 1 − 𝑚, where k is the number of classes and m is
the number of population parameters estimated. If the sample mean and the standard deviation have been
used to estimate the population mean and the standard deviation, then 𝑚 = 2; thus, the
𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 (𝑑𝑓) = 𝑘 − 3.
CONTINGENCY TABLES
In contingency tables, we intend to test that the row variable is independent of the column variable.
Computation for expected frequency for the contingency table is different from the one in the goodness of fit. The
expected frequency E can be computed with the use of this formula:
𝑹𝒐𝒘 𝑻𝒐𝒕𝒂𝒍 ∗ 𝑪𝒐𝒍𝒖𝒎𝒏 𝑻𝒐𝒕𝒂𝒍
𝑬=
𝑮𝒓𝒂𝒏𝒅 𝑻𝒐𝒕𝒂𝒍
Teenagers and Young adults have their own style of studying. Some prefer to study with music; others
do not. A group of psychologists conducted a study to determine the particular age of the students who like
studying with music. At the 0.01 level of significance, test the claim that style of studying is independent of the
listed age groups. The table below summarizes the information.
Age Groups
Study Habit
9-12 13-16 17-20 21-24
With Music 89 75 63 52
Without Music 28 20 34 39
The following website may be used for Chi-Square Test:

http://www.socscistatistics.com/tests/chisquare2/Default2.aspx
Contingency Table:
Age Groups Row
Study Habit
9-12 13-16 17-20 21-24 Totals
With Music 89 (81.61) 75 (66.26) 63 (67.66) 52 (63.47) 279
[0.67] [1.15] [0.32] [2.07]
Without Music 28 (35.39) 20 (28.74) 34 (29.34) 39 (27.53) 121
[1.54] [2.66] [0.74] [4.78]
Column Totals 117 95 97 91 400
The critical value of 𝜒2 at 0.01 level of significance with degrees of freedom of 𝑑𝑓 = (𝑟 − 1) ∗ (𝑐 − 1) =

(2 − 1) ∗ (4 − 1) = 3 is 11.345.
Interpretation:
At the 0.01 significance level, the tabulated 𝝌𝟐 = 𝟏𝟑. 𝟗𝟑𝟕𝟑 and the obtained value lies within the rejection
region. Therefore, there is a sufficient evidence to reject the null hypothesis. The result further implies that the
type of study habit has something to do with age.
60
ONE-WAY CLASSIFICATION
Chi-square in one way of classification in applicable when the researcher is interested in determining the
number of subjects, objects or responses which fall in various categories.
Example:
The subjects are 30 women and 30 men, or a total of 60 subjects in all. When asked “Can divorce be applied in
the Philippines?” of the 30 women, 9 answered yes, 12, no; and 9, undecided, and of the 30 men, 15 answered
yes; 2, no; and 13, undecided. Test the significant difference in their responses.
Sex
Responses Row Totals
Women Men
Yes 9 (12.00) [0.75] 15 (12.00) [0.75] 24
No 12 (7.00) [3.57] 2 (7.00) [3.57] 14
Undecided 9 (11.00) [0.36] 13 (11.00) [0.36] 22
Column Totals 30 30 60

(3 − 1) ∗ (2 − 1) = 2 is 5.991.
Interpretation:
At the 0.05 significance level, the tabulated 𝝌𝟐 = 𝟗. 𝟑𝟕𝟎𝟏 and the obtained value lies within the rejection region.
Therefore, there is a sufficient evidence to reject the null hypothesis. The result further implies that the response
to the survey question has something to do with sex.
INDEPENDENCE IN A 2X2 TABLE

Independence in a 2x2 table chi-square or fourfold table involves two variables to test if these variables
are independent form each other. These values are usually nominal. These values are arranged in the form of a
2x2 table which is composed of two rows (R) and two columns (C).
Example:
The frequencies shown in the table below are observed frequencies. The specific question is “Is there a significant
difference in the job performance of mentors who failed and mentors who passed the teacher’s licensure
examination?” Of the 100 subjects, 20 failed but with satisfactory job performance; 40 passed with satisfactory
job performance; 25 failed with unsatisfactory job performance; and 15 passed with unsatisfactory job
performance. Test the significant difference existing in the foregoing data.
Teachers Licensure Examination

Job Performance
Failed Passed Total
Satisfactory 20 40 60
Unsatisfactory 25 15 40
Total 45 55 100
Ho: There is no significant difference in the job performance of mentors who failed and mentors who
passed the teacher’s licensure examination.
Ha: There is a significant difference in the job performance of mentors who failed and mentors who
passed the teacher’s licensure examination.
Ho:
Teachers Licensure Examination
Job Performance
Failed Passed Total
Satisfactory 20 (27.00) 40 (33.00) 60
[1.81] [1.48]
Unsatisfactory 25 (18.00) 15 (22.00) 40
[2.72] [2.23]
Total 45 55 100
61

(2 − 1) ∗ (2 − 1) = 1 is 3.841.
Interpretation:
At the 0.05 significance level, the tabulated 𝝌𝟐 = 𝟖. 𝟐𝟒𝟗𝟐 and the obtained value lies within the rejection region.
Therefore, there is a sufficient evidence to reject the null hypothesis. The result further implies that there is a
significant difference in the job performance of mentors who failed and mentors who passed the teacher’s
licensure examination.
ASSESSMENT
Login to mVLE portal to access the assessment for Chapter IV.
REFERENCES:
• D.C. Montgomery and G.C. Runger, Applied Statistics and Probability for Engineers, 5th Edition, John
Wiley & Sons, Inc., 2011.
• R.E. Walpole. R.H. Myers, S.L. Myers and K. Ye, Probability and Statistics for Engineers and
Scientists, 9th Edition, Pearson International Edition, 2012.
• Zulueta, F. M. and Nestor Edilberto B. Costales, Jr. (2005). Methods of Research: Thesis Writing and
Applied Statistics. Mandaluyong City: National Bookstore, Inc.
62

Chapter 4 Correlational Analysis

Uploaded by

Copyright:

Available Formats

Chapter 4 Correlational Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4 Correlational Analysis

Uploaded by

Copyright:

Available Formats

CHAPTER IV:

Coefficient of correlation has some uses which are as follows:

Below is a guide in interpreting Coefficient of Correlation:

TOPIC 2: HYPOTHESIS TESTING FOR CORRELATIONS

It is often useful to test the hypotheses

Test Statistic 𝑹√𝒏 − 𝟐

TOPIC 3: PEARSON PRODUCT MOMENT COEFFICIENT OF CORRELATION

𝒏 ∑ 𝒙𝒚 − [(∑ 𝒙)(∑ 𝒚)]

𝑛 ∑ 𝑥𝑦 − [(∑ 𝑥)(∑ 𝑦)] (10 ∗ 60,772) − (707 ∗ 856)

TOPIC 4: SPEARMAN RANK CORRELATION COEFFICIENT OR SPEARMAN RHO (r s)

STAT 201: Statistical Methods I

TOPIC 5: GAMMA (G)

Step 4. Apply gamma formula.

STAT 201: Statistical Methods I

TOPIC 6: CORRELATION BETWEEN AN INTERVAL AND NOMINAL DATA:

(𝑥̅1 − 𝑥̅ 2 ) (10 − 2.4) 5 5

STAT 201: Statistical Methods I

TOPIC 7: CORRELATION BETWEEN NOMINAL DATA:

Example: Compute 𝝀𝒄 and 𝝀𝒓 for the data in the table below.

A Segment of the Filipino Electorate according to Religion and Political Party

𝐹𝑏𝑖 − 𝑀𝑏𝑐 (49 + 72 + 26) − 122

𝐹𝑏𝑗 − 𝑀𝑏𝑟 (49 + 72 + 21) − 127

TOPIC 8: CHI-SQUARE DISTRIBUTION, 𝝌𝟐

DEGREES OF FREEDOM FOR THE CHI-SQUARE

TESTING GOODNESS OF FIT

Width of the Road

(𝑂 − 𝐸)2 (95 − 85.25)2 (90 − 85.25)2 (83 − 85.25)2 (73 − 85.25)2

Double Rocky Chocolate Vanilla

(𝑂 − 𝐸)2 (123 − 71.34)2 (72 − 18)2 (55 − 6.6)2 (50 − 2.5)2

The critical value of 𝜒2 at 0.01 level of significance with degrees of freedom of 𝑑𝑓 = 𝑘 − 1 = 4 − 1 = 3 is

TESTING THE NORMALITY

normality are listed below:

The following website may be used for Chi-Square Test:

The critical value of 𝜒2 at 0.01 level of significance with degrees of freedom of 𝑑𝑓 = (𝑟 − 1) ∗ (𝑐 − 1) =

The critical value of 𝜒2 at 0.05 level of significance with degrees of freedom of 𝑑𝑓 = (𝑟 − 1) ∗ (𝑐 − 1) =

INDEPENDENCE IN A 2X2 TABLE

Teachers Licensure Examination

The critical value of 𝜒2 at 0.05 level of significance with degrees of freedom of 𝑑𝑓 = (𝑟 − 1) ∗ (𝑐 − 1) =

STAT 201: Statistical Methods I

You might also like