Aashish Yadav Stats Final Practical
Aashish Yadav Stats Final Practical
Aashish Yadav Stats Final Practical
(STATISTICS)
:AASHISH YADAV
:B.Sc (I)
PRACTICAL 1
42,74,40,60,82,115,41,61,75,83,63,53,110,76,84,50,67,65,78,77,56,95,68,69,104,
80,79,79,54 ,73,59,81,100,66,49,77,90,84,76,42,64,69,70,80,72,50,79,52,103,96,5
1,86,78,94,71
1) Draw the histogram and frequency polygon of the above data. From the
histogram, obtain approximated value of mode.
2) For the above weights, prepare a cumulative frequency table and draw the less
than ogive. Hence obtain approximate value of media.
Objective:To find the approximate value of mode and to draw less than and
more than ogive for the given frequencies and also to prepare a histogram for the
same.
Theory:
Histogram:A histogram is a graph that shows the frequency of numerical data
using rectangles. It is used to summarize discrete or continuous data that are
measured on an interval scale.
More Than Ogive: It shows the cumulative frequency above each data point. The
curve starts from the right and descends with each data point. Both ogives are
useful for analyzing the cumulative distribution of a dataset and understanding
how many observations fall below or above certain values.
PROCEDURE:
x<-c(42,74,40,82,115,41,61,75,83,63,53,110,76,84,50,67,65,78,77
,56,95,68,69,104,80,79,79,54,73,59,81,100,66,49,77,90,84,76,42,64,69,70,80,72,5
0,79,52,103,96,51,86,78,94,71)
#HISTOGRAM
hist<-hist(x,xlab="weights",xlim=c(35,130),ylim=c(0,10))
#POLYGON
x.axis=c(min(hist$breaks),hist$mids,max(hist$breaks))
y.axis=c(0,hist$counts,0)
lines(x.axis,y.axis,type='l')
bins=seq(40,120,by=10);bins
xcut=cut(x,bins,right=FALSE);xcut
xf=table(xcut);xf
z=cbind(xf);z
xcumfre<-cumsum(z);xcumfre
cumfre<-cbind(xcumfre);cumfre
cumgp<-c(0,cumsum(z));cumgp
plot(bins,cumgp,main="ogive curve",xlab="weights",ylab="frequency")
lines(bins,cumgp,col="blue")
par(new=TRUE)
bins2<-rev(bins);bins2
y<-rev(z);y
cumgp2=c(0,cumsum(y));cumgp2
plot(bins2,cumgp2,main="ogive curve",xlab="weights",ylab="frequency")
lines(bins2,cumgp2,col="red")
abline(v=73,col="red")
CALCULATION:-
x<-c(42,74,40,82,115,41,61,75,83,63,53,110,76,84,50,67
,65,78,77,56,95,68,69,104,80,79,79,54,73,59,81,100,66,49,77,90,84,76,42,6
4,69,70,80,72,50,79,52,103,96,51,86,78,94,71)
> #HISTOGRAM
> hist<-hist(x,xlab="weights",xlim=c(35,130),ylim=c(0,10))
> #POLYGON
>x.axis=c(min(hist$breaks),hist$mids,max(hist$breaks))
>y.axis=c(0,hist$counts,0)
> lines(x.axis,y.axis,type='l')
>
> bins=seq(40,120,by=10);bins
[1] 40 50 60 70 80 90 100 110 120
> xcut=cut(x,bins,right=FALSE);xcut
[1] [40,50) [70,80) [40,50) [80,90) [110,120) [40,50) [60,70)
[8] [70,80) [80,90) [60,70) [50,60) [110,120) [70,80) [80,90)
[15] [50,60) [60,70) [60,70) [70,80) [70,80) [50,60) [90,100)
[22] [60,70) [60,70) [100,110) [80,90) [70,80) [70,80) [50,60)
[29] [70,80) [50,60) [80,90) [100,110) [60,70) [40,50) [70,80)
[36] [90,100) [80,90) [70,80) [40,50) [60,70) [60,70) [70,80)
[43] [80,90) [70,80) [50,60) [70,80) [50,60) [100,110) [90,100)
[50] [50,60) [80,90) [70,80) [90,100) [70,80)
8 Levels: [40,50) [50,60) [60,70) [70,80) [80,90) [90,100) ... [110,120)
> xf=table(xcut);xf
xcut
[40,50) [50,60) [60,70) [70,80) [80,90) [90,100) [100,110) [110,120)
5 8 9 15 8 4 3 2
> z=cbind(xf);z
xf
[40,50) 5
[50,60) 8
[60,70) 9
[70,80) 15
[80,90) 8
[90,100) 4
[100,110) 3
[110,120) 2
>
> xcumfre<-cumsum(z);xcumfre
[1] 5 13 22 37 45 49 52 54
> cumfre<-cbind(xcumfre);cumfre
xcumfre
[1,] 5
[2,] 13
[3,] 22
[4,] 37
[5,] 45
[6,] 49
[7,] 52
[8,] 54
> cumgp<-c(0,cumsum(z));cumgp
[1] 0 5 13 22 37 45 49 52 54
>plot(bins,cumgp,main="ogive curve",xlab="weights",ylab="frequency")
> lines(bins,cumgp,col="blue")
> par(new=TRUE)
>
> #MORE THAN OGIVE
> bins2<-rev(bins);bins2
Result:-
i) The first graph shows the histogram of the given data with the frequency
polygon
ii) The second graph shows the Ogive of the cumulative frequency of the
data.
Result:
Graph:
Histogram
ogive curve
50
40
frequency
30
20
10
0
40 60 80 100 120
weights
Interpretation:-
i)Here the histogram shows the mode of thegiven data which is approx 75
ii)The intersection between the less than and more than ogive shows the median
of the data and here the median is approx 75.
PRACTICAL 2
Scores Frequency
50-60 1
60-70 0
70-80 0
80-90 1
90-100 1
100-110 2
110-120 1
120-130 0
130-140 4
140-150 4
150-160 2
160-170 5
170-180 10
180-190 11
190-200 4
200-210 1
210-220 1
220-230 2
Compute the first four moments about the mean of the distribution.
Obtain moment coefficients of skewness and kurtosis and comment on the nature
of the distribution.
Objective:
To obtain first four moments of the given distribution that includes mean,
variance, skewness and kurtosis. It gives an idea about the variation and nature of
distribution.
Theory:
Moments:-
μr=n1∑i=1n(xi−c)r
• μ1=n1∑i=1n(xi−xˉ)
• μ2=n1∑i=1n(xi−xˉ)2
• 3μ3=n1∑i=1n(xi−xˉ)3
• 4μ4=n1∑i=1n(xi−xˉ)4
• Measures the tailedness or sharpness of the distribution.
Procedure:-
# Calculate variance
Calculation:-
> In probability theory and statistics, kurtosis (from Greek: κυρτός, kyrtos or
kurtos, meaning "curved, arching") is a measure of the "tailedness" of the
probability distribution of a real-valued random variable. Like skewness, kurtosis
describes # Given data
>
>
>
>
>
>
>
> # Display the results
Mean: 165
Variance: 1176
Result:-
Mean- 165
Variance-1176
Rank 1 6 5 10 3 2 4 9 7 8
by X
Rank 3 5 8 4 7 10 2 1 6 9
by Y
Rank 6 4 9 8 1 2 3 10 5 7
by Z
Using correlation method, discuss which pair of judges has the nearest approach
to common likings in music.
Objective:-
To find the rank correlation coefficient to know about the relationship between
the two ranked variables.
Theory:-
Rank correlation: It is a method of finding the degree of association between two
variables . The calculation of rank correlation coefficient is calculated using ranks
in the observations not their numerical values. This method is useful when the
data are not available in numerical form but information is sufficient to rank
the data.
R= 1 - 6d*2/n(n*2 – 1)
r: rank coefficient
D: Difference of rank relation
Procedure:-
x<-c(1,6,5,10,3,2,4,9,7,8);x
y<-c(3,5,8,4,7,10,2,1,6,9);y
z<-c(6,4,9,8,1,2,3,10,5,7);z
n<-length(x);n
m<-length(y);m
o<-length(z);o
d1<-x-y;d1
d2<-y-z;d2
d3<-x-z;d3
rho1<-1-((6*sum(d1^2))/(n*((n^2)-1)));rho1
rho2<-1-((6*sum(d2^2))/(m*((m^2)-1)));rho2
rho3<-1-((6*sum(d3^2))/(o*((o^2)-1)));rho3
Calculation:-
> x<-c(1,6,5,10,3,2,4,9,7,8);x
[1] 1 6 5 10 3 2 4 9 7 8
> y<-c(3,5,8,4,7,10,2,1,6,9);y
[1] 3 5 8 4 7 10 2 1 6 9
> z<-c(6,4,9,8,1,2,3,10,5,7);z
[1] 6 4 9 8 1 2 3 10 5 7
> n<-length(x);n
[1] 10
> m<-length(y);m
[1] 10
> o<-length(z);o
[1] 10
> d1<-x-y;d1
[1] -2 1 -3 6 -4 -8 2 8 1 -1
> d2<-y-z;d2
[1] -3 1 -1 -4 6 8 -1 -9 1 2
> d3<-x-z;d3
[1] -5 2 -4 2 2 0 1 -1 2 1
> rho1<-1-((6*sum(d1^2))/(n*((n^2)-1)));rho1
[1] -0.2121212
> rho2<-1-((6*sum(d2^2))/(m*((m^2)-1)));rho2
[1] -0.2969697
> rho3<-1-((6*sum(d3^2))/(o*((o^2)-1)));rho3
[1] 0.6363636
Result:-
From the above calculation we get the following ranks:-
R1 -0.2121212
R2 -0.2969697
R3- 0.6363636
Practical 4
Problem:-
Objective:-
To find the rank correlation coefficient to know about the relationship between
the two ranked variables.
Theory:-
R= 1 - 6d*2/n(n*2 – 1)
r: rank coefficient
X<-c(68,64,75,50,64,80,75,40,55,64);X
Y<-c(62,58,68,45,81,60,68,48,50,70);Y
RankX<-c(4,6,2.5,9,6,1,2.5,10,8,6);RankX
RankY<-c(5,7,3.5,10,1,6,3.5,9,8,2);RankY
d<-RankX-RankY;d
n<-length(RankX);n
CX<-((2*(2^2-1))/12) + ((3*(3^2-1))/12);CX
CY<-((2*(2^2-1))/12);CY
rho<-1-(6*(sum(d^2)+CX+CY))/(n*(n^2-1));rho
Calculation:-
>X
<-c(68,64,75,50,64,80,75,40,55,64);X
[1] 68 64 75 50 64 80 75 40 55 64
> Y<-c(62,58,68,45,81,60,68,48,50,70);Y
[1] 62 58 68 45 81 60 68 48 50 70
>RankX<-c(4,6,2.5,9,6,1,2.5,10,8,6);RankX
[1] 4.0 6.0 2.5 9.0 6.0 1.0 2.5 10.0 8.0 6.0
>RankY<-c(5,7,3.5,10,1,6,3.5,9,8,2);RankY
[1] 5.0 7.0 3.5 10.0 1.0 6.0 3.5 9.0 8.0 2.0
> d<-RankX-RankY;d
[1] -1 -1 -1 -1 5 -5 -1 1 0 4
> n<-length(RankX);n
[1] 10
[1] 2.5
> CY<-((2*(2^2-1))/12);CY
[1] 0.5
> rho<-1-(6*(sum(d^2)+CX+CY))/(n*(n^2-1));rho
[1] 0.5454545
Result:-
Correlation (x)-0.5454545
Interpretation:-
Problem:-Generate a random data set and fit a scatter plot , a bar plot and a box
plot, pie chart .
Objective:-To represent the data using different types of plots and use it for
analysing for further use.
Theory:-
Scatter Plot: Displays individual data points on a two-dimensional graph to show
the relationship between two variables. Each point represents a data pair, with
one variable on the x-axis and the other on the y-axis and it is useful for
visualizing correlations, clusters, or patterns in data.
Bar Chart: Represents categorical data with rectangular bars, where the length of
each bar corresponds to the frequency or value of the category. Categories are on
the x-axis, and the height or length of the bars represents the frequency or value
of each category. It is useful for Comparing discrete categories and showing
relative sizes.
Box Plot: Displays the distribution of a dataset and highlights key statistical
measures such as the median, quartiles, and outliers. A rectangular box is formed
by the interquartile range extending to the minimum and maximum values within
a certain range. It is useful for Identifying central tendency, spread, and detecting
outliers in data.
Pie Chart: Represents parts of a whole, illustrating the contribution of each
category to the total. A circular chart divided into slices, where each slice
represents a proportion of the whole. It is useful for showing the percentage
distribution of categories in a dataset
Procedure:-
x<-c(1,2,3,4,5)
y<-c(25.1,17.8,12.5,11.8,5.5)
#scatter plot
#box plot
#bar plot
#pie chart
Calculation:-
> x<-c(1,2,3,4,5)
> y<-c(25.1,17.8,12.5,11.8,5.5)
> x<-c(1,2,3,4,5)
> y<-c(25.1,17.8,12.5,11.8,5.5)
> x<-c(1,2,3,4,5)
> y<-c(25.1,17.8,12.5,11.8,5.5)
> x<-c(1,2,3,4,5)
> y<-c(25.1,17.8,12.5,11.8,5.5)
X Y
1 5
2 7
3 11
4 4
5 8
Theory:-
Linear Model: A linear model specifies a linear relationship between a dependent
variable and n independent variables.
Procedure:
#Given data
x<-c(1,2,3,4,5)
y<-c(5,7,11,4,8)
#Scatter Plot
plot(x,y,xlab="X-axis",ylab="Y-axis")
Calculation:
> x<-c(1,2,3,4,5)
> y<-c(5,7,11,4,8)
>
> plot(x,y,xlab="X-axis",ylab="Y-axis")
Result:
PROBLEM 7
Problem:-
Fit the binomial distribution in the following data and calculate the
x 0 1 2 3 4 5 6 7 8
f 71 112 117 157 27 11 3 1 1
Objective:-
To fit the binomial distribution in the given data and interpret results based on
the distribution.
Theory-
Binomial Probability Distribution
This distribution pattern is used in statistics but has implications in finance and
other fields. Banks may use it to estimate the likelihood of a particular borrower
defaulting ,how much money to lend , and the amount to keep in reserve. It’s also
used in insurance industry to determine policy pricing and assess.
x F(x) F(x)*x
0 71 0 .09669452 48.3472587 48
5 11 55 .02429187 12.1459362 12
6 3 18 .00411913 2.05956367 02
7 1 7 .00039913 0.19956363 0
8 1 8 1.692E-05 .00845991 0
Calculation:-
MEAN=SUM(F*X)/SUM(F)
PROBABILITY=MEAN/n
mean 2.026
n 9
P(estimate) .225111
Result:-
Hence, we can say the mean of the given data is 2.026, n is 9 and
p(estimate) is .225111.
Result:-
Hence, we can say the mean of the given data is 2.026, n is 9 and
p(estimate) is .225111.
PRACTICAL 8
Problem:-Fit the poisson distribution in the following data and calculate the
following frequencies in excel
x f
0 142
1 156
2 69
3 37
4 5
5 1
Objective:-
To check whether the following distribution follows poisson distribution or not.
Theory:-
Poisson probability distribution: It is discrete probabilitydistribution that tells us
about the number of times an event is likely to occur over a specified period . For
poisson distribution the variable can take only whole number values(0,1,2,3,4,5,6
etc.).
Procedure:-
0 71 0 0.096695 48.34726 48
5 11 55 0.024292 12.14594 12
6 3 18 0.004119 2.059564 2
7 1 7 0.000399 0.199564 0
8 1 8 1.69E-05 0.00846 0
Calculation:-
Cases Symbol Value
No.of cases n 8
Mean np 2.026