Statistics Tutorial

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

ERING

MATHEMATICS-Im (Comp. Engg. and T Group) (S-)(5.29)


ENGINEE CORRELATION
AND
REGRESSION
STATISTICS,

12:he first four moments about the working mean 30.2


ho Kest fedr moments aoout the mean. Also evaluate of a distribution are 0.253,
o
B Bz and comment upon the skewness and
Nov. 2019)
(Dec. 2005, 2006, May 2010,
sol: The first four moments about the
arbitrary origin 30.2 are
A= 0.255,
= 6.222, s 30.211, = 400.25 =

ANf i-30.2) 2-30.2 --302 0.255


X = 30.455
H2H-(4 =6.222- (0.255)2 =6.15698
As H-342' 4+2(4) 30.211-3 (6.222) (0.255) +2 (0.255)
=

=
30.211 4.75983 0.03316275
A325.48433
H4 H-44 H+642 (4) -3(41)
=
440.254 (30.211) (0.255) 6 (6.222) (0.255)-3 (0.255)
H4 378.9418

(25.48433)2
(6.15698)3 2.78255

B2 378.9418
(6.15698)
B29.99625
Y= VB1 =y2.78255 1.6681
which indicates considerable positive skewness of the distribution.
Y2Ba-3 =9.99625-3 =6.99625
whieh shos that the distributionis leptokurtic
Ex. 13 :/The first four moments of a distribution about the value 5 are 2, 20, 40 and 50. From the given information obtain the
firstfour sehtral moments, mean, standard deviation and coefficient of skewness and kurtosis. (Dec. 2007; May 2015, 2019)

Sol.: A 5,
=
H' =
2, 4
20, 4'= 40 and H' 50.
= =

On the basis of given information we can calculate the various central moments, mean, standard deviation and coefficient of
skewness and kurtosis.
The first moment about zero gives the value of the distribution.

Mean x = A +H = 5 +2 = 7

Now we calculate central moments.

H2 4-(4) =
20-(2)2 =16
H3 Hs-3H 4 + 2(4
= 40-3 (2) (20)+2 (2
40 120 +16
= - 64

H4 H44 4 4+6(41) 42-3(4


= 50-4 (2) (40)+6 (2)2 (20) -3 (2)4
50-320+480 48
E 162
ENGINEERING MATHEMATICS-m (Comp. Engg. and IT Group) ( -) (5.30) STATISTICs, CORRELATION AND
The second central moment gives the value of variance.
REGRL
Variance H2 16
V16 4
Standarddeviation VA2 =

Coefficient of skewness is given by,

- 161
6)3
Coefficient of kurtosis is given by,
e 4s is negative, the distribution is negatively skewed.

(16)2 0.63

Since the value of Bzis less than 3, hence the distributionis platylkUrtic
EX. 14: The first four central moments of distribution are 0, 2.5, 0.7 and 18.75. Comment on the skewness and i
kurtosis of the
distribution.
0.7 and jH4 18./5
(May 208
Sol.:Testing of Skewness:H =0, H2 =
2.5, H3 =

Coefficient of skewness is given by,

Ba
(0.7)2
(2.5)3 0.0314

Since, is positive, the distribution is positively skewed slightly.


Testing of Kurtosis: Coefficient of kurtosis is given by

H4 18.75
B2 (2.5)
Since, B2 is exactly three, the distribution is mesokurtic.

EXERCISE 5.1
1. Find the Arithmetic Mean, Median and Standard deviation for the following frequency distribution.

59 12
15 20 24 30 42 49

36889 10
8 7 6 2

Ans. x = 22.9851, M 20, a = 1133%

2. Age distribution of 150 life insurance policy-holders is as follows

Age as on Nearest Birthday


Number
15 19.5
10
20 24.5
20
25 29.5
14
30 34.5 30
35 39.5 a 32
40 44.5
14
45 49.5
15
50 54.5
10
55 59.5
842
Calculate mean deviation from median age. M.D.
=

Ans.
(5-0(5.34)
STATISTICs, cORRELATION AND REGRtss
ENGINEERING MATNEMATICS- (Comp. Engg.and IT Group)

n-8 (Total number ofpoints) and b by c we get


after replacing a by m
Substituting in (1) and (2) of (56.2) 140m 28c 140
28m 8c 16
Sm+c5

4 1) or
7m 2
2, cm-S
Solving () and (2) we get m
line is
Hence the equation of the straight
y 2x-5
data using least square criteria
br cto the following
form y= ax
x. 2: Fit a parobola ofthe 6
31 50

Sol.:
X = X-4 X Xy Xy
81 15 -45

16
5
0 0
16 31 31
5 31 100 200
16
50
27 81 219 657
73 x= Xy 364 Xy =840
y 168 IX = 0 x =28 x=0
196

n 7.
(6) and (7) of (5.6.3)
Substituting in equations (5), .(1)
28a Ob +7c 168 =

.2)
a-0+28b c0 = 364
)
840
0 b+28c
=
196a

(1), (2), 3) can be written as


(4)
4a +0 b +c = 24
.15)
0 13
a 0+b +C
... (6)
7a 0 - b +c = 30

From (5) b = 13 and from (4) and (5) we get,


a = 2, c= 16

of variable X is
Equation of parabola in
terms

y 2X+ 13X+16
14 (x-4) + 16
Putting X =X-4 y =2(x-4
y = 2x-3x-4

i sthe required fit for the data.


load Pkg) its middle point. Corresponding
to various values ofP, the
carries a concentrated
Ex. 3:A simply supported beam
cms is tabulated as:
maximum deflection y 200
100 120 140 160 180

0.90 1.10 1.20 1.40 1.60 1.70

Find a law of the form y= aP+b by using least square criteria.


REGRESSION
AND
ENGINEERINGMATNEMAT
MATICS-m (Comp. Engg. and IT roup) (S-D(5.35) STATISTICS,
c O R R E L A T I O N

Sol:
Preparing thetable as
PO)
100
XP-140 Xy
0.90
40 1600 -36
120 1.10 20 400 -22
140 1.20 0 0 0
160 1.40 20 400 28
180 1.60 40 1600 64
200 1.70 60 3600 102
y-79 EX 60
X-7600 xy-136
of points)
n= 6 (No.
From (1) and (2) of (5.6.2)
60a 6 b = 7.9
.. 1)
7600a 60b = 136
(2)
Solving (1) and (2) we get,
a 0.008143
b 1.2352
y 0.008143 X+ 1.2352
X = P-140
but
y 0.008143 (P -140)+ 1.2352
y = 0.008143 P+ 0.9518

isthe required result.


Ex. 4: Values of x and y are tabulated as under
1 1.5 2.0 2.5
25 56.2 100 156
Find the law of the form x = ay" to satisfy the given by data

Sol.: Taking logarithms, we get,


og x log a + n log y

which can be written as,


X = nY +C

whereX = log x Y = log y.

XY
1.0 25 0.0 1.3979 1.9541
1.5 56.2 0.1761 1.7497 3.0615 0.3081
2.0 100 0.301 2.0 4.0 0.602
2.5 156 0.3979 2.1931 4.8097 0.8726
0.875 7.3407 13.8253 1.7827
Substituting in (1) and (2) of (5.6.1) where x is replaced by Y andy by X, a by n, b by log a = Cc. n in (1) of (6.1) = 4 (No. of

points
7.3407n + 4c = 0.875 .
(1)
13.8253n+ 7.3407c = 1.7827 ... (2)
Solving (1) and (2) we get,
n 0.5, C = log a = - 0.6988375 a = 0.2

ence required law of the formx = ay" isx = 0.2 y"


REGRESSION

cORRELATIOw
AND
niG MATHEMATICS- I (Comp. Engg. and IT Group) (S-D(S.4))
ENGINEERINGMAT STATISTICS,

simple
Sol.:LetX:Quantity exported,Y: Quantity imported, Preparing table as followscalculationscan bemade
10 12 120
100 144
11 14 154
121 196
14 15 196 225 210
14 6 196 256 224
20 21 400 441 420
22 26 484 676 572
16 21 256 441 336
12 15 144 225 180
15 16 225 256 240
13 14 169 196 182
Total= 147 170 2291 3056 2638

hence = 14.7
Here, n= 10,

and y N10 17
Xy-n Xy
Vox-nx)x (-ny
2638-10 x 14.7 x17
V(2291-10 x 14.72) (3056 - 10 x 17)
139 0.9458
V130.1x 166
(Dec. 2012)
Ex. 2 Calculate thecorrelationcoeficientfor thefollowing weights(in kq)ofhusband (0and wife(
65 66 67 67 68 69 70 72
55 58 72 55 66 71 70 50
Sol.

55 55 4225 3025 3575


66 58 4356 3364 3828
67 72 4489 5184 4824
55 4489 3025 3685
66 4624 4354 4488
68
71 4761 5041 4899
69
70 4900 4900 4900
0
72 50 5184 2500 3600
544 497 37028 31393 33799
544 68

2 62125
Correlation coefficient between x and y is given by

Cov (X 5y-Xy
n

r (%, y) ox oy
- 3 --0)
TATISTICS,
RELATION AND NG
ENGINEERING MATHEMATICS- (Comp. Engg. and IT Group) (S 1(6.4

(33799)-68 (62.125)

37028 6B 59(62.125)
8
4224.875-4224.5
V4628.5-4624) (3924.125-3859.52)

0.375 0.375
0375 17.051
V45x64.605 290.7225

r(X.y)0.022
Mechanics
of Mathematics and Applled are aie
given o
marks obtained by each in papers
group of 10 students,
Ex. 3: From a
23 28 42 1726 3529 37 1646
x Marksin Maths 18 44
y Marksin App. Mech. 25 22 38 21 27 39 24 32
Calculate Karl Pearson's Coefficient of correlation
Sol.: The data is tabulated as uv
u2
UX-35 Vy-39 441 399
21 361
16 18 19 -

324 324
18 18 324
17 21 196 168
12 14 144
23 25 108
12 81 144
26 27 09
49 289 119
22 07 17
28 90
15 36 225
29 24 06
00 00 00
39 00 00
35 14
07 04 49
37 32 02
49 01 -07
42 38 07 01
44 11 05 121 25 55
46
u-51 v=-100 u2= 1169 v2= 1694 uv =1242
Total
5.1.
10 T-26.01
10 - 10, V
00 100

cov (u, v) 2uiVi -UV (1242)-51 732


26.01= 90.89
Ou V90.89=9.534

- 7 .1694 10-100 = 69.4

oy= 69.4 = 8.33

r (x y) r(u, v) = cOv (u,V)


ou Oy
73.2
9.534 x 8.33 0.9217

Ex. 4: Compute correlation coeficent oerween supply and price of commodity using following data.
152 158 169 182
Supply 182 160 166
Price 198 178 167 152 180 170 162
REGRESSION
NGNEERING

ATHEMATIcS-I (Comp. Engg. and IT Group) (S-m(5.45)


MATNEN
coRRELATION AND
STATISTICS

Sol.: Let
=Supply,uX-150 y price, vwy-160
152 198 76
38 1444
158 178 144
18 64 324
169 167 19
361 49 133
182 152 32 8 1024 64 256
160 180 10 20 200
100 400
166 170 16 10 160
256 100
182 162 32 64
1024 4
Total 119 87 2833 521
2385
Ueren = 7 , u = 119, 2v = 87, 2U° = 2833, v = 2385, u v = 521

u 17, v = 12.4286
UV-n uv

VE-n) x(2-ni)
521-7x 17x 124286
V(2833-7x17) (2.385-7x12.4286)2
958 -958
810 x 1303.7142 1027.6227
- 0.9322

Interpretation: There is highnegativecorrelation between supply and price


Bx. Obtain correlation coefficient between population density (per square miles) and death rate (per thousand persons) from
to 5 cities.
Wstdrelated (Dec. 2010, 2017; May 2010)
Population Density 200 500 400 700 800

Death Rate 12 18 16 21 10
Population density and y =
Death rate.
Sol.: Let x =

u = X-a and V =y-b


Let,
X-500 y-15
X u =X-500 uv
200 12 300 90000 9 900

500 18 0 0 0

400 16 100 1 10000 1 100

700 21 200 6 40000 36 1200

00 300 90000 25 - 1500


100 230000 80 500
Total
Here, n= 5, Zu =
100, v 2, Eu' =
230000, Iv =80, uv =
500

100 20

04
UV- nuv
r (u, v)

Vz-n z-n
500-5 (20) (0.4)
V230000-5 (20) V80-5 (0.4)
STATISTICS, cORRELATIOM AND B
UNGINEERNG MATHEMATICS-(Come. Enpe andTGreup) (5-46
460
V2ZBO00 VF
460
42494202082

(Dec. 2006
x.6 Calculate the coeficient of correlotion for the following distroutic 2
23

sol: Tabulating thedata as fu f fuv


ux-V =y-21
19 - 84 1176 1176
1176
- 84
-14 1296
- 108 900 1080
10 -90
208 637 364
52 -91
5 14 0 0
0 0
19 21 20 64
32 400 160
24 23 16 80
891 704 792
99 88
28 11
63 1183 567 819
32 30 7 1
fu =44 fv fu2 4758 fv2 = 4444
fuv
Total f 82 100 4391

- 05366 -0288

52 12195; 14872

COV (uv)
uiv-ü -0654 5289
-0.288 57.7364

-1.4872 = 52.708

Ou 7.598

oy 7.26

r(u, v) = cOvtuOy 5.1


0.9588
Coefficient of correlationry) =0.9588
Ex 7: Find correlotion coeficient between X and Ygiven that, n = 25, Ex = 75, y = 100, p = 250, y = 50 y- 325

Sol.: Here -3 7-4


y-nxy
Va-nx Z-ny)
325-25x3x4 25
V250-25 x9) (500-25 x16)25x100 500.5
x-X y y-y) or x-x = bxy (- y) (11)

The cOefficient byx Involvea in tne equation (Lo) Is known as regression coefficient of y on x and the coefficient b, involved

ouation (11
equation
(11) is known as regression coefficient of x on y.
in the
mark 1: For obtaining (10) and (11) we have to calculate r = r (x, y) the correlation coefficient, which can be also
and scale property.
termined using changeof origin
r = r(xy) =* cOv cov (Urlu, v)
Thus, ox Gy Ou Oy

u - vV then ay =hou, y =koy

and - and a' -7


and x a+hu, and y =b +kv
In particular, IT u = X-a, V=y-bthen, h = k = 1 and oy = Ou and oy oy

and X a + U, y =b+V
These results help us to determine (10) and (11).
0. f
Remark2: Correlation coefficient and regression coefficients have same algebraic signs. If r > 0, then by, >0 and by
r<0,then bx <0 and bx <0.
therefore correlation coeficient = r=
Vb,x bye i.e. geometric mean of regression
Remark 3: Since b x by =

coeticients. Choose positive square root, if regression coefticients are positive, otherwise negative

Remark 4: The acute angle 6 between the regression lines is given by,

6 =
tan 6,
emark 5: The point of intersection of two regression lineis(X,
TLLUSTRATIONS
(Dec. 2012, 2016)
EX, 1:Dbtain regression linesfor the following data 8
6 2 10

11 5 8
9
y
STATISTICs, CORRELATIONAND
AND RRLORUSA
ENGINEERING
MATHEMATICS-Im (Comp. Engg. and o -)(5.50)

These
icients depend
coefficiente
bxy and byx
So:To find
regression lines we require to calculate regression
coeficient
upon
2 2 x, 2y and xy. So we prepare the following tableand simplify the calculations.
xy

81 54
9 36
121 22
11
25 50
10 100
32
16
64
4 8
49 56
64
214
y= 340 xy
2** 300 y= 40 2x- 220
No. of observations = n 5

X X = 6 and y 8
n 5

n 0 =-(6) =44 -36 =8


Oy 2 n - ( 8 =68-64 4

Cov (x. y) = n
y =4-6x8
Cov (x, y) 42.8 4 8 - 5.2

byx Cov e2 -0.65


bry Cov g -1.3
Regression line of Y on X is

y-y byx (x x)
y-8 - 0.65 (x -6)

y = - 0.65 x + 3.9 +8

y = 0.65 x + 11.9

Regression line of X on Y is

x-x = bxy (y-y)

x-6 -1.3 (y-8)

x-6 -1.3 y + 10.4

X - 1.3 y +10.4 +6

X = -1.3 y 16.4
Ex. 2:Obtain regression lines for the following data:
2 3 5 7 9 10 12
2
15
8
10 12 14 15
Estimate of ( Ywhen X = 6 and (i) X when Y = 20. 16
STATISTICS, CORRELATION ANO
ENGINEERING MATHEMATICS m (Comp. Engg. and IT Group) (S-)(5.5)
Ex. 3; Find the lines of regression for the following data
26 30 34 39
10 614 19 26 29 35 38
16 18
12
and estimate y for x = 145 and x for y = 29.5. (May 2
Sol.: Tabulating the data as: 2 Uv
u2
ux-26 v =y-26 196 224
256
16 -14
10 12 144 100 120
12 -10
14 16 64 56
49
19 18 0 0 0

26 26 0 9 12
3 16
29 4
30 64 81 72
8 9
34 35 144 156
12 169
39 38 13 uv=640
v-8 u'= 698 V=594
Total u=-10
=-1429, V -=-1.143
Here n 7,
v2 1.306
u2 2.042,

cov (u, v) 2 uV uv

= (640) -(1.429) (1.143) = 89.795

o u'- u -(698)-2042=97.672
Ou 9.883

-7-594)(594) 1.306 = 83.551

Oy 9.14

r r(x, y) = r (u, v)
cOv(u, V) 89.795
ou y 9.883 x 9.14

89.795 0.9941
90.33062

9
0.9941 9.883 0.9194

x y rx 0.9941 x 9.14 = 1.0749


X a u = 26 1429 24.571

y b+ v = 26-1.143 24.857
Regression line of y on x is given by equation (10)

y-24.857 = 0.9194 (x-24.571)


Regression line of x on y is given by equation (11)

x-24.571 = 1.0749 (y- 24.857)


EMATICS I l
NEERING MATHEM (Comp. Engg. and IT Group) AND REGRESSION
(S-m(5.53) STATISTICS, cORRELATION
for x = 14.5
To estimate y

tY= 14.5 in (). y= 24.857+0.9194 (14.5-24.571) =


15.5977
Estimate of x for y = 29.5 is obtained from ( )

x
24.571 1.0749 (29.5-24.857)
29.56176
Ey.4: The table below gives the respective heights x and y of a sample of 10 fathers and their sons

(0 Find egression line of y on x.


( Find regression line of x on y.

(üo Estimate son's height if father's height is 65 inches.

(v)Estimate fother's height if son's height is 60 inches.


Compute correlation coefficient between x and y.

v Find the angle between the regression lines.

Height of Father x(inches) 65 63 6764 6862 70 66 68 67


Height ofSon y(inches) 68 66 68 65 69 66 68 65 71 67
Sol.: Let u =X- 62, V=y- 65. We prepare the table to
simplify the computations.
UV

65 8 9 9 9
63 66
67 68 3 25 9 15
64 65 0 4 0 0
68 69 6 4 36 16 24
62 56 0 0 0
70 68 8 3 64 24

66 65 0 16 0 0

68 71 6 36 36 36
67 67 5 25 4 10

Total 40 23 216 85 119


n Number of pairs = 10

04 a-142=5.6

10 23 a023 -321
Cov (u, v) = 4 x 2.3 = 2.7

2. 0.4821
bxy buv 321 0.8411, and byx = byu =
X =u 62 = 66, y = v 65 =67.3

0 Regression line ofy on x is Y-Y = byx (X-X)


Y-67.3 = 0.4821 (X-66)
Y = 0.4821X + 35.4814
MATHEMATICS-T (Comp. Engg. and IT Group) (S-m(s.55)
GINEERINGA
STATISTICS,
coRRELATION AND REGRESSIoN
NO

2 n
180-2 18-4-14
o -( 488
10-(3 48.8-9 39.8
o 3.742 and oy =6.309
dard deviation is invariant to the change of origin.
ox3.742 and ay 6.309
2
1 4 and oy 39.8
Cov (u, v)

Cov (u, V)= -9.3


-UV1-20)-33-6
Covariance is invariant to the change of origin.
Cov (X. y)= Cov (u, v) =-9.3
We have to find regression equation of y on x. It is given by

y-y byx -x)

byx
Cov (&. 14
-0.64
x
Regression equation becomes,
y-38 =-0.664 (x- 32)
y =-0.664 x + 21.248 + 38
y = - 0.664 x + 59.248

Now, we have to estimate marks in Statistics if marks in Economics are 30, i.e. we have to find value of y whenx = 30.

Substitutingx= 30 in above equation, we get


y = 0.664 x 30 59.248

y 39.328

Marks in Economics are 39.328 i.e. approximately 39.


the
Ex : Determine regression line for price, given the supply, hence estimate price when is 180 units,
klowiphinformation:X = supply, y = Price, n = 7, Z - 150) = 119, Sy- 160) = 84, Xx- 150= 2835, 2y- 160
supply from
= 2387,
-150) Jy - 160) = 525. Also, find correlation coefficient between price and supply (Dec. 2018)
Sol.: Let u = X-150, v =y-160

17, 12
2835) -(17) = 405 -289 = 116

o (2835) -(12) =
341 -144 = 197

129
covx. y) =
cov(u, v) =
(525)- 17x 12 = -

x = 150 +u = 167, y = 160 + V = 172

-129
Day bu = 116
Equation of regression line y on x is
y-y = bx (x- x)
y-172 = (-1.1121) (x-167)

Correlation coefficient r is obtain as


cOv(U, V) -129
- 0.8534
V116x197
Sine Doth the regression coefficients are negative, we take r =-0.663.
MATHEMATICS-(Comp. Engg. and IT Group) (S-m(5.57)
I N G I N E E R I N GM A T H E M A

CORRELATION AND REGRESSION


STATISTICS,

bstituting values of x and y. we get


S u b s t

9 (2)-3) =

A 18-3 15
4 (2)(-3) =
no 8 - 3 =5
regression
lines are,
Thus, the
9x+y 15 and 4x +y = 5

Let9x+y
=19 be the regression line of x on y, so it can be written as

bxy =-= -0.11


ietdr+y = 5 be the regression line of y on x. So it can be written asy = 5-4x.
byx-4
between x and y is
given as,
Correlation coefficient
r Vbyx bxy V-4)x(-0.11) =yo44 0.663
taker
Since both the regression coefticients are negative,
we =
-0.663,.
0 and 40x- 18y =214. The value of variance of x is 9. Find:
Er9: The regression equations are 8x 10y 66
+ -

wThe mean ofxandy values


x andy and
2Thecorrelation (Nov. 2015, May 2019)
standard deviation ofy.
8) The
lines pass through the point (x, y ), we have
Sol.:(1) Since both the regression
8 -10y 66 0 and 40x -18 y = 214

Solving these two equations, we get


x = 13 and y 17
66 =0 be the line of regression of y on x and 40x 18y 214 be the line of regression of x on y.
(2) Let 8x 10y+
can be written in the form
These equations
y1010 214
and x 40Y+40

y = 0.8 x+ 6.6 and x =0.45 y 5.35 +

byx Regression coefficient of y on x


0.8
and bxy Regression coefficient of x on y
= 0.45

Correlation coefficient between x and y is given by


t0.6
r
Vbxyx bypx =
V045 x 0.8 =

coefficients are positive, we take


But since both the regression
+0.6

(3) Variance of x =
9, i.e. o = 9

.. Ox 3

We have,
byx ox
0.8 = 0.6 x
ay

Gy 4

You might also like