module 5 bivariate analysis
module 5 bivariate analysis
• The total in the Red column is 26, which means there are 26 red cards
in the deck. Of these, 2 are Aces and 24 are non-Aces. There are 52
cards in a deck. Use the values in the table to calculate some
conditional probabilities.
A contingency table is a type of table that summarizes the relationship
between two categorical variables.
To create a contingency table in Python, we can use
the pandas.crosstab() function.
Syntax:
pandas.crosstab(index, columns)
• index: name of variable to display in the rows of the contingency
table
• columns: name of variable to display in the columns of the
contingency table
• Step 1: Create the Data
Let’s create a dataset that shows information for 20 different product orders,
including the type of product purchased (TV, computer, or radio) along with the
country (A, B, or C) that the product was purchased in:
#create data
import pandas as pd
df = pd.DataFrame({'Order': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
'Product': ['TV', 'TV', 'Comp', 'TV', 'TV', 'Comp',
'Comp', 'Comp', 'TV', 'Radio', 'TV', 'Radio', 'Radio',
'Radio', 'Comp', 'Comp', 'TV', 'TV', 'Radio', 'TV'],
'Country': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B',
'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C']})
#view data
df
Step 2: Create the Contingency Table:
To create a contingency table to count the number of each product
ordered by each country:
#create contingency table
pd.crosstab(index=df['Country'], columns=df['Product'])
• Expected frequencies: The expected frequency counts are computed separately for each level of one
categorical variable at each level of the other categorical variable. Compute r * c expected frequencies,
according to the following formula.
Er,c = (nr * nc) / n
• where Er,c is the expected frequency count for level r of Variable A and level c of Variable
B, nr is the total number of sample observations at level r of Variable A, nc is the total
number of sample observations at level c of Variable B, and n is the total sample size.
• Test statistic. The test statistic is a chi-square random variable (Χ2) defined by the following
equation.Χ2 = Σ [ (Or,c - Er,c)2 / Er,c ]
• where Or,c is the observed frequency count at level r of Variable A and level c of Variable B, and
Er,c is the expected frequency count at level r of Variable A and level c of Variable B.
• P-value. The P-value is the probability of observing a sample statistic as extreme as the test
statistic. Since the test statistic is a chi-square, use the Chi-Square Distribution Calculator to
assess the probability associated with the test statistic. Use the degrees of freedom computed
above.
(4).Interpret Results:
• If the sample findings are unlikely, given the null hypothesis,rejects the null hypothesis.
• Typically, this involves comparing the P-value to the significance level, and rejecting the null
hypothesis when the P-value is less than the significance level.
• A public opinion poll surveyed a simple random sample of 1000
voters. Respondents were classified by gender (male or female) and by
voting preference (Republican, Democrat, or Independent). Results are
shown in the contingency table below.
A public opinion poll surveyed a simple random sample of 1000 voters. Respondents were classified
by gender (male or female) and by voting preference (Republican, Democrat, or Independent).
Results are shown in the contingency table below.
Voting Preferences
Republican Democratic Independent
Male 200 150 50
Female 250 300 50
Is there a gender gap? Do the men's voting preferences differ significantly from the women's
preferences? Use a 0.05 level of significance.
Find the row total and column total
Voting Preferences
Rep Dem Ind Row Total
Male 200 150 50 400
Female 250 300 50 600
Column Total 450 450 100 1000
• State the hypotheses. The first step is to state the null
hypothesis and an alternative hypothesis.
Ho: Gender and voting preferences are independent.
Ha: Gender and voting preferences are not independent.
Formulate an analysis plan. For this analysis, the significance level is
0.05. Using sample data, we will conduct a chi-square test for
independence.
Analyze sample data. Applying the chi-square test for independence to
sample data, we compute the degrees of freedom, the expected
frequency counts, and the chi-square test statistic. Based on the
chi-square statistic and the degrees of freedom, we determine
the P-value.
Voting Preferences
Rep Dem Ind Row Total
Male 200 150 50 400
Female 250 300 50 600
Column Total 450 450 100 1000
• DF = (r - 1) * (c - 1) = (2 - 1) * (3 - 1) = 2
• Er,c = (nr * nc) / n
E1,1 = (400 * 450) / 1000 = 180000/1000 = 180
E1,2 = (400 * 450) / 1000 = 180000/1000 = 180
E1,3 = (400 * 100) / 1000 = 40000/1000 = 40
E2,1 = (600 * 450) / 1000 = 270000/1000 = 270
E2,2 = (600 * 450) / 1000 = 270000/1000 = 270
E2,3 = (600 * 100) / 1000 = 60000/1000 = 60
• Χ2 = Σ [ (Or,c - Er,c)2 / Er,c ]
Χ22 = (200 - 180)2/180 + (150 - 180)2/180 + (50 - 40)2/40 + (250 - 270)2/270 + (300 - 270)2/270 + (50 - 60)2/60
Χ2 = 400/180 + 900/180 + 100/40 + 400/270 + 900/270 + 100/60
Χ = 2.22 + 5.00 + 2.50 + 1.48 + 3.33 + 1.67 = 16.2
• where DF is the degrees of freedom, r is the number of levels of gender, c is the number of levels of the voting
preference, nr is the total number of observations from level r of gender, nc is the total number of observations
from level c of voting preference, n is the number of observations in the sample, Er,c is the expected frequency
count when gender is level r and voting preference is level c, and Or,c is the observed frequency count when
gender is level r voting preference is level c.
• The P-value is the probability that a chi-square statistic having 2
degrees of freedom is more extreme than 16.2.
• We use the Chi-Square Distribution Calculator to find P(Χ2 > 16.2) =
0.0003.
• Interpret results. Since the P-value (0.0003) is less than the
significance level (0.05), we cannot accept the null hypothesis. Thus,
we conclude that there is a relationship between gender and voting
preference.
Example 2
• The results of a random sample of children with pain from
musculoskeletal injuries treated with acetaminophen, ibuprofen, or
codeine are shown in the table. At α = 0.10, is there enough evidence
to conclude that the treatment and result are independent?
Acetaminophen(c,1) Ibuprofen (c. 2) Codeine (c. 3)
• The interpretation for the phi coefficient is similar to the Pearson Correlation Coefficient. The range is from
-1 to 1, where:
• 0 is no relationship.
• 1 is a perfect positive relationship: most of your data falls along the diagonal cells.
• -1 is a perfect negative relationship: most of your data is not on the diagonal.
Φ value =
0 No relationship
Again, the measure ranges between 0 and 1 with higher values meaning a stronger association.
Example: Find the Phi Coefficient for the following data.
Subjects Marital Status Gender
Gender
2=Married 1=Male
1=Single 2=Female
Male Female
A 2 1
B 1 1 Married 1 4
Marital
C 1 1 Status
Single 5 2
D 2 2
E 2 2
F 1 1
G 2 2
H 1 2 Φ=(2-20)/sqrt((5)(7)(6)(6)
I 2 2 =-18/sqrt(1260)=-18/35.496=-0.507
J 1 1 A negative Phi coefficient would indicate that most of the data
K 1 1 are in the off-diagonal cells.
L 1 2
The table below shows the ‘first time’ driving test results of a sample of 200 individuals classified by
gender and success or failure in the examination. We wish to explore the association between the two
variables, the null hypothesis being that there is no relationship between gender and success/failure in
driving test results. Consider 1% significant level.
Gender SUCCESS FAILURE When each of the variables is dichotomous, that is, can
only take two values (male/female) or pass/fail) then
Male 70 28
the phi coefficient (φ) is an appropriate test of
Female 50 52 association. Phi is given as:
Φ=(32-66)/(sqrt((10)(19)(15)(14))
=-34/sqrt(39900)=-34/199.749=-0.170
• Compare your score to the critical score.
• To interpret the -0.17, we need to convert it to a Chi Square value.
• To do this, multiply N x (Phi Coeff)2
• If the obtained score is greater than the critical value, reject the Null
hypothesis and accept the alternative hypothesis.
• Here, 29x -0.172=0.84
• Since 0.84 is less than 3.84, we failed to reject the Null hypothesis.
• Conclusions:
• There is no significant relationship between the genders of the workers and
if they feel pain while they perform the task.
• Both males and females have pain or (no pain) at approximate equal
frequencies.
Scatter plot and its interpretations
• The most useful graph for displaying the relationship between two quantitative variables is a
scatterplot.
Interpreting Scatterplots:
• As in any graph of data, look for the overall pattern and for striking departures from that pattern.
• The overall pattern of a scatterplot can be described by the direction, form, and strength of the
relationship.
• An important kind of departure is an outlier, an individual value that falls outside the overall pattern of
the relationship.
Interpreting Scatterplots: Direction
• One important component of a scatterplot is the direction of the relationship between the two variables.
• Each observation (or point) in a scatterplot has two coordinates; the first corresponds to the first piece of
data in the pair (that’s the X coordinate; the amount that you go left or right). The second coordinate
corresponds to the second piece of data in the pair (that’s the Y-coordinate; the amount that you go up or
down). The point representing that observation is placed at the intersection of the two coordinates.
If the data show an uphill pattern as you move from left to right,
this indicates a positive relationship between X and Y. As the
X-values increase (move right), the Y-values tend to increase
(move up).
If the data show a downhill pattern as you move from left to right,
this indicates a negative relationship between X and Y. As the
X-values increase (move right) the Y-values tend to decrease (move
down).
If the data don’t seem to resemble any kind of pattern (even a vague one),
then no relationship exists between X and Y.
Interpreting Scatterplots: Form
• Another important component to a scatterplot is the form of the relationship between the two variables.
Curvilinear relationship
Linear Relationship
• The strongest linear relationship occurs when the slope is 1. This means that when one variable increases
by one, the other variable also increases by the same amount. This line is at a 45 degree angle.
• The strength of the relationship between two variables is a crucial piece of information. Relying on the
interpretation of a scatterplot is too subjective. More precise evidence is needed, and this evidence is
obtained by computing a coefficient that measures the strength of the relationship under investigation.
Correlation Coefficient
• A correlation is a statistical measure of the relationship between two variables.
• The correlation coefficient is a measure of the association between two variables. OR
• The correlation coefficient is a statistical measure of the strength of the relationship between the relative movements of two
variables.
• The values range between -1.0 and 1.0.
• A calculated number greater than 1.0 or less than -1.0 means that there was an error in the correlation measurement.
• A correlation of -1.0 shows a perfect negative correlation, while a correlation of 1.0 shows a perfect positive correlation.
• A correlation of 0.0 shows no linear relationship between the movement of the two variables.
• The absolute value of the correlation coefficient gives us the relationship strength. The larger the number, the stronger the
relationship.
• Pearson correlation coefficient or Pearson’s correlation coefficient or Pearson’s r is defined in statistics as the measurement
of the strength of the relationship between two variables and their association with each other.
Pearson correlation coefficient formulas
Test1(x) 15 16 12 10 8
Test2(y) 18 11 10 20 17
Test1(x) 15 16 12 10 8
Test2(y) 18 11 10 20 17
r = (5×902–61×76)/ √[5×789–(61)2][5×1234–(76) 2] 5
==
r55
=(4510-4636)/√[3945-3721][6170-5776]
=-126/√[224][394]
=-126/√88256
=-126/297.079
=-0.424
John is an investor. His portfolio primarily tracks the performance of the S&P 500 and John wants to add the
stock of Apple Inc. Before adding Apple to his portfolio, he wants to assess the correlation between the stock
and the S&P 500 to ensure that adding the stock won’t increase the systematic risk of his portfolio. To find the
correlation coefficient, John gathers the following prices for the last five years.
Determine the correlation between the prices of the S&P 500 Index and Apple Inc.
Solution:
Advantages and disadvantages of Pearson Correlation Coefficient
Advantages:
• It helps in knowing how strong the relationship between the two variables is. Not only the presence or the absence of
the correlation between the two variables is indicated using the Pearson Correlation Coefficient but it also determines
the exact extent to which those variables are correlated.
• Using this method, one can ascertain the direction of correlation i.e. whether the correlation between two variables is
negative or positive.
Disadvantages:
• The Pearson Correlation Coefficient R is not sufficient to tell the difference between the dependent variables and the
independent variables as the Correlation coefficient between the variables is symmetric. For example, if a person is
trying to know the correlation between the high stress and blood pressure, then one might find the high value of the
correlation which shows that high stress causes the blood pressure. Now if the variable is switched around then the
result, in that case, will also be the same which shows that stress is caused by the blood pressure which makes no
sense. Thus, the researcher should be aware of the data that he is using for conducting the analysis.
• Using this method one cannot get the information about the slope of the line as it only states whether any relationship
between the two variables exists or not.
• It is likely that the Pearson Correlation Coefficient may be misinterpreted especially in case of the homogeneous data.
• When compared with the other methods of the calculation, this method takes much time for arriving at the results.
Regression coefficient
• Regression is a method or an algorithm in Machine Learning that models a target value based on independent
predictors. It is essentially a statistical tool used in finding out the relationship between a dependent variable and
independent variable.
• Regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and
one or more independent variables.
• In regression analysis, one variable is considered as dependent and other(s) as independent. Thus, it measures the
degree of dependence of one variable on the other(s).
• Regression coefficient is a statistical measure of the average functional relationship between two or more variables.
Note:
1. Regression coefficient is denoted by b.
2. It is expressed in terms of original unit of data.
3. Between two variables (say x and y), two values of regression coefficient can be obtained. One will be obtained when
we consider x as independent and y as dependent and the other when we consider y as independent and x as dependent.
The regression coefficient of y on x is represented as byx and that of x on y as bxy.
4. Both regression coefficients must have the same sign. If byx is positive, bxy will also be positive and vice versa.
5.For calculation of regression coefficient from un-replicated
2
data
2
three estimates, viz., (1) sum of all observations on x
and y (∑x, ∑y) variables, (2) their sum of squares (∑x and ∑y ) and (3) sum of products of all observations on x and y
variables (∑xy).
• Regression coefficient of X on Y:
• Regression equation of X on Y:
• Regression coefficient of Y on X:
• Regression equation of Y on X:
• Calculate the regression coefficient and obtain the lines of regression for
the following data.
X 1 2 3 4 5 6 7
Y 9 8 10 12 11 13 14
Regression coefficient of X on Y: 2 8 4 64 18
3 10 9 100 30
4 12 16 144 48
=(7(334)-(28)(77))/(7(875)-(77)2) 5 11 25 121 55
=(2338-2156)/(6125-5929) 6 13 36 169 78
=182/196 7 14 49 196 98
X-4=0.929(Y-11)
X-4=0.929Y-10.219
X=0.929Y-6.219
The regression equation X on Y is:
X=0.929Y-6.219
Y-11=0.929(X-4)
byx=7(334)-(28)(77)/7(140)-(28)2 Y=0.929X-3.716+11
=(2338-2156)/(980-784) Y=0.929X+7.284
=182/196 The regression equation of Y on X is:
byx=0.929 Y=0.929X+7.284
• Obtain regression equation of Y on X and estimate Y when X=55 from
the following.
X 40 50 38 60 65 50 35
Y 38 60 55 70 60 48 30
X Y X2 Y2 XY
40 38 1600 1444 1520
50 60 2500 3600 3000
38 55 1444 3025 2090
60 70 3600 4900 4200
65 60 4225 3600 3900
50 48 2500 2304 2400
35 30 1225 900 1050
ΣX=338 ΣY=361 ΣX2=17094 ΣY2 =19773 ΣXY=18160
Regression coefficient of Y on X:
Y–51.57 = 0.942(X–48.29 )
Y=0.942X-45.49+51.57
Y=0.942X+6.08
byx=(7(18160)-(338)(361))/((7(17094)-(338)2)
The regression equation of Y on X is Y= 0.942X+6.08
=5102/5414
Estimation of Y when X= 55
=0.9423
Y= 0.942(55)+6.08=57.89
• The values of x and their corresponding values of y are shown in the
table below:
X 0 1 2 3 4
Y 2 3 5 4 6
The formula to use when there are tied ranks is (repetition of ranks):
When there is a repetition of ranks, a correction factor m(m2-1)/12 is added to Σd2 in the
Spearman’s rank correlation coefficient formula, where m is the number of times a rank is
repeated. It is very important to know that this correction factor is added for every repetition
of rank in both characters.
• Interpretation
• Spearman’s rank correlation coefficient is a statistical measure of the strength of a monotonic
(increasing/decreasing) relationship between paired data. Its interpretation is similar to that of
Pearson’s. That is, the closer to the ±1 means the stronger the monotonic relationship.
• The scores for nine students in ML and Big Data are as follows:
ML: 35, 23, 47, 17, 10, 43, 9, 6, 28
Big Data: 30, 33, 45, 23, 8, 49, 12, 4, 31
Compute the student’s ranks in the two subjects and compute the Spearman rank correlation.
Sorting the data in descending order
Assigning the ranks to the original data and finding d.
ML Rank Big Data Rank ML Rank Big Data Rank d d2
47 1 49 1 35 3 30 5 -2 4
43 2 45 2 23 5 33 3 2 4
35 3 33 3 47 1 45 2 -1 1
28 4 31 4 17 6 23 6 0 0
23 5 30 5 10 7 8 8 -1 1
17 6 23 6 43 2 49 1 1 1
10 7 12 7 9 8 12 7 1 1
9 8 8 8 6 9 4 9 0 0
6 9 4 9 28 4 31 4 0 0
12
= 1 – (6*12)/(9(81-1))
=1 – 72/720
= 1-0.1= 0.9
This indicates a strong positive relationship between the ranks individuals obtained in the ML and Big Data
exam. That is, the higher you ranked in ML, the higher you ranked in Big Data also, and vice versa.
• The scores for nine students in ML and Big Data are as follows:
ML: 35, 23, 47, 17, 10, 43, 9, 6, 28
Big Data: 30, 33, 45, 23, 8, 49, 12, 4, 31
Compute the student’s ranks in the two subjects and compute the Spearman rank correlation.
Sorting the data in descending order
Assigning the ranks to the original data and finding d.
ML Rank Big Data Rank ML Rank Big Data Rank d d2
47 1 49 1 35 3 30 5 -2 4
43 2 45 2 23 5 33 3 2 4
35 3 33 3 47 1 45 2 -1 1
28 4 31 4 17 6 23 6 0 0
23 5 30 5 10 7 8 8 -1 1
17 6 23 6 43 2 49 1 1 1
10 7 12 7 9 8 12 7 1 1
9 8 8 8 6 9 4 9 0 0
6 9 4 9 28 4 31 4 0 0
12
= 1 – (6*12)/(9(81-1))
=1 – 72/720
= 1-0.1= 0.9
This indicates a strong positive relationship between the ranks individuals obtained in the ML and Big Data
exam. That is, the higher you ranked in ML, the higher you ranked in Big Data also, and vice versa.
Spearman Rank Correlation: with Tied Ranks
• Tied ranks are where two items in a column have the same rank. Tied data point assigned a mean rank:
Expenditure on
10 15 14 25 14 14 20 22
advertisement
Profit 6 25 12 18 25 40 10 7
Sort the data in descending order:
Assigning the ranks to the original data and finding d
Exp. Rank(x) Profit Rank(y)
Advt(x) (y)
Exp. Rank(x) Profit Rank(y) d d2
25 1 40 1 Advt(x) (y)
22 2 25 2.5 8 8 0 0
10 6
20 3 25 2.5 4 2.5 1.5 2.25
15 25
15 4 18 4 6 5 1 1
14 12
14 6 12 5 1 4 -3 9
25 18
14 6 10 6 6 2.5 3.5 12
14 25
14 6 7 7 6 1 5 25
14 40
10 8 6 8 3 6 -3 9
20 10
22 2 7 7 -5 25
83.5
Sort the data in descending order:
Assigning the ranks to the original data and finding d
Exp. Rank(x) Profit Rank(y)
Advt(x) (y)
Exp. Rank(x) Profit Rank(y) d d2
25 1 40 1 Advt(x) (y)
22 2 25 2.5 8 8 0 0
10 6
20 3 25 2.5 4 2.5 1.5 2.25
15 25
15 4 18 4 6 5 1 1
14 12
14 6 12 5 1 4 -3 9
25 18
14 6 10 6 6 2.5 3.5 12.25
14 25
14 6 7 7 6 1 5 25
14 40
10 8 6 8 3 6 -3 9
20 10
22 2 7 7 -5 25
Σd2 83.5
N=8
The formula to use when there are tied ranks is (repetition of ranks):
Here rank 6 is repeated three times in rank of x and rank 2.5 is repeated twice in rank of y, so the correction factor
is 3(32-1)/12+ 2(22-1)/12.
X 10 20 30 30 40 45 50
Y 15 20 25 30 40 40 40
• Quotations of index numbers of equity share prices of a certain joint
stock company and the prices of preference shares are given below.
Equity
97.5 99.4 98.6 96.2 95.1 98.4 97.1
Shares
Preference
75.1 75.9 77.1 78.2 79 74.6 76.2
shares
Using the method of rank correlation determine the relationship between equity shares and
preference shares prices.
Years 2013 2014 2015 2016 2017 2008 2007
Equity
97.5 99.4 98.6 96.2 95.1 98.4 97.1
Shares
Preference
75.1 75.9 77.1 78.2 79 74.6 76.2
shares
=1-[6(90)/(7(49-1))
=1-[540/336]
=1-1.607
=-0.607
• Compute the rank correlation coefficient for the following data of the
marks obtained by 8 students in the Python and Time Series.
Marks in
15 20 28 12 40 60 20 80
Python
Marks in
40 30 50 30 20 10 30 60
Time Series
Kendall’s Tau Coefficients
• Kendall rank correlation coefficient (often called Kendall’s τ or tau) is a non-parametric test which
measures the strength of the relationship between two variables.
• This correlation procedure was developed by Kendall (1938). Kendall’s tau is based on an analysis
of two sets of ranks, X and Y.
• This test is also used when Pearson correlation cannot be used because (one of) the assumptions for
the test (is) are challenged. It is also an alternative to Spearman’s rho when the sample size is
small.
• Like Spearman’s rho, the range of tau is from – 1.00 to + 1.00. Though there are some similarities
in the properties of tau and rs , the logic employed by tau is entirely different than that of rho.
• The interpretation is based on the sign and the value. Higher value indicates stronger relationship.
Positive value indicates positive relationship and negative value indicates negative relationship.
• The tau is based on concordant and discordant among two sets of ranks.
• Concordant pairs: The number of observed ranks below a particular rank which are larger than that particular
rank.
Student Rx Ry C D
A 1 1 3 0
B 2 3 1 1
C 3 4 0 1
D 4 2
• Discordant pairs: The number of observed ranks below a particular rank which are smaller in
value than that particular value.
The Kendall τ coefficient is defined as:
τ=
• Interviewer 2: ABDCFEHGJILK.
A 1 1 11 0
• Calculate the Kendall Tau correlation.
B 2 2 10 0
Step 1: Make a table of rankings.
C 3 4 8 1
The first column, “Candidate” is optional and
D 4 3 8 0
for reference only. The rankings for Interviewer 1
E 5 6 6 1
should be in ascending order.
F 6 5 6 0
G 7 8 4 1
H 8 7 4 0
I 9 10 2 1
=(61-5)/((12(12-1))/2) J 10 9 2 0
=56/66
=0.848 K 11 12 0 1
L 12 11
The Tau coefficient is .85, suggesting a Total 61 5
strong relationship between the
rankings.
• Two interviewers ranked 12 candidates (A through L) for a position. The results from most preferred to least
preferred are:
• Interviewer 1: ABCDEFGHIJKL. Candidate Interviewer1 Interviewer2 C D
• Interviewer 2: ABDCFEHGJILK.
A 1 1 11 0
• Calculate the Kendall Tau correlation.
B 2 2 10 0
Step 1: Make a table of rankings.
C 3 4 8 1
The first column, “Candidate” is optional and
D 4 3 8 0
for reference only. The rankings for Interviewer 1
E 5 6 6 1
should be in ascending order.
F 6 5 6 0
G 7 8 4 1
H 8 7 4 0
I 9 10 2 1
=(61-5)/((12(12-1))/2) J 10 9 2 0
=56/66
=0.848 K 11 12 0 1
L 12 11
The Tau coefficient is .85, suggesting a Total 61 5
strong relationship between the
rankings.
• Calculate the Kendall Tau correlation for the following data.
Subjects Rank X Rank Y
(Rx) (Ry) Kendall Tau can be calculated as
A 1 1
B 4 2
C 3 4
D 2 3
Τ=(6-4)/(20/2)
= 2/10
=0.2
• Consider ranking the grades and IQ levels for a group of students is shown. Calculate the Kendall's correlation
coefficient.
Students Grades IQ C D
Students Grades IQ A 1 1 4 0
A 1 1 B 2 4 1 2
B 2 4 C 3 3 1 1
E 5 2 D 4 5 0 1
C 3 3 E 5 2
D 4 5 Total 6 4
Τ=(6-4)/(20/2)
= 2/10
=0.2