Aashish Yadav Stats Final Practical

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 41

PRACTICAL FILE

(STATISTICS)

:AASHISH YADAV
:B.Sc (I)
PRACTICAL 1

Problem:The following numbers give the weights of 55 students of a class.


Prepare a suitable frequency table:

42,74,40,60,82,115,41,61,75,83,63,53,110,76,84,50,67,65,78,77,56,95,68,69,104,
80,79,79,54 ,73,59,81,100,66,49,77,90,84,76,42,64,69,70,80,72,50,79,52,103,96,5
1,86,78,94,71

1) Draw the histogram and frequency polygon of the above data. From the
histogram, obtain approximated value of mode.

2) For the above weights, prepare a cumulative frequency table and draw the less
than ogive. Hence obtain approximate value of media.

Objective:To find the approximate value of mode and to draw less than and
more than ogive for the given frequencies and also to prepare a histogram for the
same.

Theory:
Histogram:A histogram is a graph that shows the frequency of numerical data
using rectangles. It is used to summarize discrete or continuous data that are
measured on an interval scale.

Ogive:The curve obtain by plottingcumulative frequencies is called a cumulative


frequency curve or ogive.

~THERE ARE TWO TYPES OF OGIVE:-


 Less than type
 More than type
Less Than Ogive: It represents the cumulative frequency below each data point.
The curve starts from the left and rises with each data point

More Than Ogive: It shows the cumulative frequency above each data point. The
curve starts from the right and descends with each data point. Both ogives are
useful for analyzing the cumulative distribution of a dataset and understanding
how many observations fall below or above certain values.

PROCEDURE:
x<-c(42,74,40,82,115,41,61,75,83,63,53,110,76,84,50,67,65,78,77

,56,95,68,69,104,80,79,79,54,73,59,81,100,66,49,77,90,84,76,42,64,69,70,80,72,5
0,79,52,103,96,51,86,78,94,71)

#HISTOGRAM

hist<-hist(x,xlab="weights",xlim=c(35,130),ylim=c(0,10))

#POLYGON

x.axis=c(min(hist$breaks),hist$mids,max(hist$breaks))

y.axis=c(0,hist$counts,0)

lines(x.axis,y.axis,type='l')

bins=seq(40,120,by=10);bins

xcut=cut(x,bins,right=FALSE);xcut

xf=table(xcut);xf
z=cbind(xf);z

xcumfre<-cumsum(z);xcumfre

cumfre<-cbind(xcumfre);cumfre

cumgp<-c(0,cumsum(z));cumgp

plot(bins,cumgp,main="ogive curve",xlab="weights",ylab="frequency")

lines(bins,cumgp,col="blue")

par(new=TRUE)

#MORE THAN OGIVE

bins2<-rev(bins);bins2

y<-rev(z);y

cumgp2=c(0,cumsum(y));cumgp2

plot(bins2,cumgp2,main="ogive curve",xlab="weights",ylab="frequency")

lines(bins2,cumgp2,col="red")

abline(v=73,col="red")

CALCULATION:-
x<-c(42,74,40,82,115,41,61,75,83,63,53,110,76,84,50,67
,65,78,77,56,95,68,69,104,80,79,79,54,73,59,81,100,66,49,77,90,84,76,42,6
4,69,70,80,72,50,79,52,103,96,51,86,78,94,71)
> #HISTOGRAM
> hist<-hist(x,xlab="weights",xlim=c(35,130),ylim=c(0,10))
> #POLYGON
>x.axis=c(min(hist$breaks),hist$mids,max(hist$breaks))
>y.axis=c(0,hist$counts,0)
> lines(x.axis,y.axis,type='l')
>
> bins=seq(40,120,by=10);bins
[1] 40 50 60 70 80 90 100 110 120
> xcut=cut(x,bins,right=FALSE);xcut
[1] [40,50) [70,80) [40,50) [80,90) [110,120) [40,50) [60,70)
[8] [70,80) [80,90) [60,70) [50,60) [110,120) [70,80) [80,90)
[15] [50,60) [60,70) [60,70) [70,80) [70,80) [50,60) [90,100)
[22] [60,70) [60,70) [100,110) [80,90) [70,80) [70,80) [50,60)
[29] [70,80) [50,60) [80,90) [100,110) [60,70) [40,50) [70,80)
[36] [90,100) [80,90) [70,80) [40,50) [60,70) [60,70) [70,80)
[43] [80,90) [70,80) [50,60) [70,80) [50,60) [100,110) [90,100)
[50] [50,60) [80,90) [70,80) [90,100) [70,80)
8 Levels: [40,50) [50,60) [60,70) [70,80) [80,90) [90,100) ... [110,120)
> xf=table(xcut);xf
xcut
[40,50) [50,60) [60,70) [70,80) [80,90) [90,100) [100,110) [110,120)
5 8 9 15 8 4 3 2
> z=cbind(xf);z
xf
[40,50) 5
[50,60) 8
[60,70) 9
[70,80) 15
[80,90) 8
[90,100) 4
[100,110) 3
[110,120) 2
>
> xcumfre<-cumsum(z);xcumfre
[1] 5 13 22 37 45 49 52 54
> cumfre<-cbind(xcumfre);cumfre
xcumfre
[1,] 5
[2,] 13
[3,] 22
[4,] 37
[5,] 45
[6,] 49
[7,] 52
[8,] 54
> cumgp<-c(0,cumsum(z));cumgp
[1] 0 5 13 22 37 45 49 52 54
>plot(bins,cumgp,main="ogive curve",xlab="weights",ylab="frequency")
> lines(bins,cumgp,col="blue")
> par(new=TRUE)
>
> #MORE THAN OGIVE
> bins2<-rev(bins);bins2

[1] 120 110 100 90 80 70 60 50 40


> y<-rev(z);y
[1] 2 3 4 8 15 9 8 5
> cumgp2=c(0,cumsum(y));cumgp2
[1] 0 2 5 9 17 32 41 49 54
>plot(bins2,cumgp2,main="ogive curve",xlab="weights",ylab="frequency")
> lines(bins2,cumgp2,col="red")
> abline(v=73,col="red")

Result:-

i) The first graph shows the histogram of the given data with the frequency
polygon
ii) The second graph shows the Ogive of the cumulative frequency of the
data.
Result:
Graph:

Histogram
ogive curve
50
40
frequency

30
20
10
0

40 60 80 100 120

weights

Interpretation:-

i)Here the histogram shows the mode of thegiven data which is approx 75

ii)The intersection between the less than and more than ogive shows the median
of the data and here the median is approx 75.
PRACTICAL 2

Problem:For the frequency distribution of scores in mathematics of 50 candidates


selected at random from among those appearing at a certain examination.

Scores Frequency
50-60 1
60-70 0
70-80 0
80-90 1
90-100 1
100-110 2
110-120 1
120-130 0
130-140 4
140-150 4
150-160 2
160-170 5
170-180 10
180-190 11
190-200 4
200-210 1
210-220 1
220-230 2
Compute the first four moments about the mean of the distribution.

Obtain moment coefficients of skewness and kurtosis and comment on the nature
of the distribution.

Objective:
To obtain first four moments of the given distribution that includes mean,
variance, skewness and kurtosis. It gives an idea about the variation and nature of
distribution.

Theory:
Moments:-

μr=n1∑i=1n(xi−c)r

First Moment (Mean):

• μ1=n1∑i=1n(xi−xˉ)

• Represents the center or average of the data.

Second Central Moment (Variance):

• μ2=n1∑i=1n(xi−xˉ)2

• Measures the spread or dispersion of the data.

Third Central Moment (Skewness):

• 3μ3=n1∑i=1n(xi−xˉ)3

• Indicates the asymmetry of the probability distribution.

Fourth Central Moment (Kurtosis):

• 4μ4=n1∑i=1n(xi−xˉ)4
• Measures the tailedness or sharpness of the distribution.

These moments provide insights of the characteristics of a dataset, helping to


understand its properties. Higher moments are less commonly used in practice.
Skewness and kurtosis, in particular, are often examined to understand the shape
of a distribution.

Skewness:-Skewness is a measure of the asymmetry of a distribution. A


distribution is asymmetrical when its left and right side are not mirror images. A
distribution can have right (or positive), left (or negative), or zero skewness.

Kurtosis:-Kurtosis is a measure of the tailedness of a distribution. Tailedness is


how often outliers occur. Excess kurtosis is the tailedness of a distribution relative
to a normal distribution. Distributions with medium kurtosis (medium tails) are
mesokurtic. Distributions with low kurtosis (thin tails) are platykurtic.

Procedure:-

# Calculate variance

variance <- sum(((midpoint - mean_value)^2) * frequency) / sum(frequency)

# Calculate third central moment

mu3 <- sum(((midpoint - mean_value)^3) * frequency) / sum(frequency)

# Calculate fourth central moment


mu4 <- sum(((midpoint - mean_value)^4) * frequency) / sum(frequency)

# Calculate moment of skewness

gamma1 <- mu3 / (variance^(3/2))

# Calculate moment of kurtosis

gamma2 <- (mu4 / (variance^2)) - 3

# Display the results

cat("Mean:", mean_value, "\n")

cat("Variance:", variance, "\n")

cat("Third Central Moment (mu3):", mu3, "\n")

cat("Moment of Skewness (gamma1):", gamma1, "\n")

cat("cat("Fourth Central Moment (mu4):", mu4, "\n")

Moment of Kurtosis (gamma2):", gamma2, "\n")

Calculation:-
> In probability theory and statistics, kurtosis (from Greek: κυρτός, kyrtos or
kurtos, meaning "curved, arching") is a measure of the "tailedness" of the
probability distribution of a real-valued random variable. Like skewness, kurtosis
describes # Given data

Error: unexpected symbol in "In probability"


> midpoint <- c(55, 65, 75, 85, 95, 105, 115, 125, 135, 145, 155, 165, 175, 185,
195, 205, 215, 225)

> frequency <- c(1, 0, 0, 1, 1, 2, 1, 0, 4, 4, 2, 5, 10, 11, 4, 1, 1, 2)

>

> # Calculate mean

> mean_value <- sum(midpoint * frequency) / sum(frequency)

>

> # Calculate variance

> variance <- sum(((midpoint - mean_value)^2) * frequency) / sum(frequency)

>

> # Calculate third central moment

> mu3 <- sum(((midpoint - mean_value)^3) * frequency) / sum(frequency)

>

> # Calculate fourth central moment

> mu4 <- sum(((midpoint - mean_value)^4) * frequency) / sum(frequency)

>

> # Calculate moment of skewness

> gamma1 <- mu3 / (variance^(3/2))

>

> # Calculate moment of kurtosis

> gamma2 <- (mu4 / (variance^2)) - 3

>
> # Display the results

>cat("Mean:", mean_value, "\n")

Mean: 165

>cat("Variance:", variance, "\n")

Variance: 1176

>cat("Third Central Moment (mu3):", mu3, "\n")

Third Central Moment (mu3): -41160

>cat("Fourth Central Moment (mu4):", mu4, "\n")

Fourth Central Moment (mu4): 5745600

>cat("Moment of Skewness (gamma1):", gamma1, "\n")

Moment of Skewness (gamma1): -1.020621

>cat("Moment of Kurtosis (gamma2):", gamma2, "\n")

Moment of Kurtosis (gamma2): 1.154519

Result:-
Mean- 165

Variance-1176

3rd central moment- -41160

4th central moment – 5745600

Moment of skewness- 1.020621

Moment of kurtosis- 1.154519


Interpretation:-
i) The value of skewness tells the the given distribution is almost
symmetrical
ii) The value of kurtosis signifies that the graph of given distribution is
neither too high nor too flat .
Practical 3
Problem:-
Ten competitors in a musical test were ranked by three judges X,Y,Z in the
following order.

Rank 1 6 5 10 3 2 4 9 7 8
by X
Rank 3 5 8 4 7 10 2 1 6 9
by Y
Rank 6 4 9 8 1 2 3 10 5 7
by Z

Using correlation method, discuss which pair of judges has the nearest approach
to common likings in music.

Objective:-
To find the rank correlation coefficient to know about the relationship between
the two ranked variables.

Theory:-
Rank correlation: It is a method of finding the degree of association between two
variables . The calculation of rank correlation coefficient is calculated using ranks
in the observations not their numerical values. This method is useful when the
data are not available in numerical form but information is sufficient to rank
the data.

R= 1 - 6d*2/n(n*2 – 1)

r: rank coefficient
D: Difference of rank relation

Procedure:-
x<-c(1,6,5,10,3,2,4,9,7,8);x

y<-c(3,5,8,4,7,10,2,1,6,9);y

z<-c(6,4,9,8,1,2,3,10,5,7);z

n<-length(x);n

m<-length(y);m

o<-length(z);o

d1<-x-y;d1

d2<-y-z;d2

d3<-x-z;d3

rho1<-1-((6*sum(d1^2))/(n*((n^2)-1)));rho1

rho2<-1-((6*sum(d2^2))/(m*((m^2)-1)));rho2

rho3<-1-((6*sum(d3^2))/(o*((o^2)-1)));rho3

Calculation:-

> x<-c(1,6,5,10,3,2,4,9,7,8);x

[1] 1 6 5 10 3 2 4 9 7 8

> y<-c(3,5,8,4,7,10,2,1,6,9);y

[1] 3 5 8 4 7 10 2 1 6 9
> z<-c(6,4,9,8,1,2,3,10,5,7);z

[1] 6 4 9 8 1 2 3 10 5 7

> n<-length(x);n

[1] 10

> m<-length(y);m

[1] 10

> o<-length(z);o

[1] 10

> d1<-x-y;d1

[1] -2 1 -3 6 -4 -8 2 8 1 -1

> d2<-y-z;d2

[1] -3 1 -1 -4 6 8 -1 -9 1 2

> d3<-x-z;d3

[1] -5 2 -4 2 2 0 1 -1 2 1

> rho1<-1-((6*sum(d1^2))/(n*((n^2)-1)));rho1

[1] -0.2121212

> rho2<-1-((6*sum(d2^2))/(m*((m^2)-1)));rho2

[1] -0.2969697

> rho3<-1-((6*sum(d3^2))/(o*((o^2)-1)));rho3

[1] 0.6363636

Result:-
From the above calculation we get the following ranks:-

R1 -0.2121212

R2 -0.2969697

R3- 0.6363636
Practical 4
Problem:-

Obtain the rank correlation coefficient of the following data.

X Y Rank X (x) Rank Y (y)


68 62 4 5
64 58 6 7
75 68 2.5 3.5
50 45 9 10
64 81 6 1
80 60 1 6
75 68 2.5 3.5
40 48 10 9
55 50 8 8
64 70 6 2

Objective:-

To find the rank correlation coefficient to know about the relationship between
the two ranked variables.

Theory:-

Rank correlation: It is a method of finding the degree of association between two


variables . The calculation of rank correlation coefficient is calculated using ranks
in the observations not their numerical values. This method is useful when the
data are not available in numerical form but information is sufficient to rank
the data.

R= 1 - 6d*2/n(n*2 – 1)

r: rank coefficient

D: Difference of rank relation


Procedure:-

X<-c(68,64,75,50,64,80,75,40,55,64);X

Y<-c(62,58,68,45,81,60,68,48,50,70);Y

RankX<-c(4,6,2.5,9,6,1,2.5,10,8,6);RankX

RankY<-c(5,7,3.5,10,1,6,3.5,9,8,2);RankY

d<-RankX-RankY;d

n<-length(RankX);n

CX<-((2*(2^2-1))/12) + ((3*(3^2-1))/12);CX

CY<-((2*(2^2-1))/12);CY

rho<-1-(6*(sum(d^2)+CX+CY))/(n*(n^2-1));rho

Calculation:-

>X

<-c(68,64,75,50,64,80,75,40,55,64);X

[1] 68 64 75 50 64 80 75 40 55 64

> Y<-c(62,58,68,45,81,60,68,48,50,70);Y

[1] 62 58 68 45 81 60 68 48 50 70

>RankX<-c(4,6,2.5,9,6,1,2.5,10,8,6);RankX

[1] 4.0 6.0 2.5 9.0 6.0 1.0 2.5 10.0 8.0 6.0

>RankY<-c(5,7,3.5,10,1,6,3.5,9,8,2);RankY

[1] 5.0 7.0 3.5 10.0 1.0 6.0 3.5 9.0 8.0 2.0

> d<-RankX-RankY;d
[1] -1 -1 -1 -1 5 -5 -1 1 0 4

> n<-length(RankX);n

[1] 10

> CX<-((2*(2^2-1))/12) + ((3*(3^2-1))/12);CX

[1] 2.5

> CY<-((2*(2^2-1))/12);CY

[1] 0.5

> rho<-1-(6*(sum(d^2)+CX+CY))/(n*(n^2-1));rho

[1] 0.5454545

Result:-

Correlation (x)-0.5454545

Interpretation:-

Correlation of XY is 0.5454545 , hence it is positive correlation .

Value of Y increase with increase in X.


PRACTICAL 5

Problem:-Generate a random data set and fit a scatter plot , a bar plot and a box
plot, pie chart .

Objective:-To represent the data using different types of plots and use it for
analysing for further use.

Theory:-
Scatter Plot: Displays individual data points on a two-dimensional graph to show
the relationship between two variables. Each point represents a data pair, with
one variable on the x-axis and the other on the y-axis and it is useful for
visualizing correlations, clusters, or patterns in data.

Bar Chart: Represents categorical data with rectangular bars, where the length of
each bar corresponds to the frequency or value of the category. Categories are on
the x-axis, and the height or length of the bars represents the frequency or value
of each category. It is useful for Comparing discrete categories and showing
relative sizes.

Box Plot: Displays the distribution of a dataset and highlights key statistical
measures such as the median, quartiles, and outliers. A rectangular box is formed
by the interquartile range extending to the minimum and maximum values within
a certain range. It is useful for Identifying central tendency, spread, and detecting
outliers in data.
Pie Chart: Represents parts of a whole, illustrating the contribution of each
category to the total. A circular chart divided into slices, where each slice
represents a proportion of the whole. It is useful for showing the percentage
distribution of categories in a dataset

Procedure:-
x<-c(1,2,3,4,5)

y<-c(25.1,17.8,12.5,11.8,5.5)

#scatter plot

plot(x,y,xlab="Country",ylab="Million of Carats",main="Scatter Plot",col="blue")

#box plot

boxplot(x,y,xlab="Country",ylab="Million of Carats",main="Box plot",col="blue")

#bar plot

barplot(x,y,xlab="Country",ylab="Million of Carats",main="Bar plot",col="blue")

#pie chart

pie(x,y,xlab="Country",ylab="Million of Carats",main="Pie Chart")

Calculation:-
> x<-c(1,2,3,4,5)

> y<-c(25.1,17.8,12.5,11.8,5.5)

> #scatter plot


> plot(x,y,xlab="Country",ylab="Million of Carats",main="Scatter Plot",col="blue")

> x<-c(1,2,3,4,5)

> y<-c(25.1,17.8,12.5,11.8,5.5)

> #box plot

> boxplot(x,y,xlab="Country",ylab="Million of Carats",main="Box plot",col="blue")

> x<-c(1,2,3,4,5)

> y<-c(25.1,17.8,12.5,11.8,5.5)

> #bar plot

> barplot(x,y,xlab="Country",ylab="Million of Carats",main="Bar plot",col="blue")

> x<-c(1,2,3,4,5)

> y<-c(25.1,17.8,12.5,11.8,5.5)

> #pie chart

> pie(x,y,xlab="Country",ylab="Million of Carats",main="Pie Chart")


Result:
Interpretation:-
i) The given plots and charts give us an easy representation of data which
is also suitable for further mathematical operations.
PRACTICAL 6
Problem:-Generate a random data set an fit a suitable curve on a given data set
and fit a linear model and 2 degree polynomial.

Objective:- Fitting a suitable curve or fitting a linear model and 2 degree


polynomial on the given data helps us determine relationship between the given
set of variables .

X Y
1 5
2 7
3 11
4 4
5 8

Theory:-
Linear Model: A linear model specifies a linear relationship between a dependent
variable and n independent variables.

Procedure:
#Given data

x<-c(1,2,3,4,5)

y<-c(5,7,11,4,8)

#Scatter Plot
plot(x,y,xlab="X-axis",ylab="Y-axis")

Calculation:
> x<-c(1,2,3,4,5)

> y<-c(5,7,11,4,8)

>

> #Scatter Plot

> plot(x,y,xlab="X-axis",ylab="Y-axis")

Result:
PROBLEM 7
Problem:-
Fit the binomial distribution in the following data and calculate the

following frequencies in excel.

x 0 1 2 3 4 5 6 7 8
f 71 112 117 157 27 11 3 1 1

Objective:-
To fit the binomial distribution in the given data and interpret results based on
the distribution.

Theory-
Binomial Probability Distribution

Binomial distribution - In a binomial distribution the probabilities of interest are


those of receiving A certain number of success ,r, in n independent trials each
having only two possible outcomes and the probability ,p, of success.

How is binomial distribution used?

This distribution pattern is used in statistics but has implications in finance and
other fields. Banks may use it to estimate the likelihood of a particular borrower
defaulting ,how much money to lend , and the amount to keep in reserve. It’s also
used in insurance industry to determine policy pricing and assess.
x F(x) F(x)*x

P(x) E.F. Roundup

0 71 0 .09669452 48.3472587 48

1 112 112 .26234093 131.170467 131

2 117 234 .31139263 155.696314 156

3 157 471 .21120906 105.60453 106

4 27 108 .08953581 44.7679065 45

5 11 55 .02429187 12.1459362 12

6 3 18 .00411913 2.05956367 02

7 1 7 .00039913 0.19956363 0

8 1 8 1.692E-05 .00845991 0
Calculation:-
MEAN=SUM(F*X)/SUM(F)

PROBABILITY=MEAN/n

mean 2.026

n 9

P(estimate) .225111

- Relative frequency (f*x) table is calculated by multiplying the value o fx


(random variable) and f(frequency) probability.
- And the probability table is calculated by using the binomial formula given
in the excel.
- Frequency (E.F) is calculated by multiplying the probability and total
frequencies.

Result:-
Hence, we can say the mean of the given data is 2.026, n is 9 and
p(estimate) is .225111.

Result:-
Hence, we can say the mean of the given data is 2.026, n is 9 and
p(estimate) is .225111.
PRACTICAL 8
Problem:-Fit the poisson distribution in the following data and calculate the
following frequencies in excel

x f

0 142

1 156

2 69

3 37

4 5

5 1

Objective:-
To check whether the following distribution follows poisson distribution or not.

Theory:-
Poisson probability distribution: It is discrete probabilitydistribution that tells us
about the number of times an event is likely to occur over a specified period . For
poisson distribution the variable can take only whole number values(0,1,2,3,4,5,6
etc.).

In addition , the poissiondistributiuon can be obtained as an approximation of


binomial distribution when the number of trials n of the latter distribution is
large , success probability p is small , and np is finite number.
Properties of poisson distribution:-

- The events are independent.


- The average number of successes in the given period of time alone occur.
- No two event can occur at the same time .
- The poisson distribution is limited when the number of trial n is indefinitely
large .

Procedure:-

x f(x) f(x)*x p(x) expected frequency

0 71 0 0.096695 48.34726 48

1 112 112 0.262341 131.1705 131

2 117 234 0.311393 155.6963 156

3 157 471 0.211209 105.6045 106

4 27 108 0.089536 44.76791 45

5 11 55 0.024292 12.14594 12

6 3 18 0.004119 2.059564 2

7 1 7 0.000399 0.199564 0

8 1 8 1.69E-05 0.00846 0

500 1013 1 500 500

Calculation:-
Cases Symbol Value
No.of cases n 8
Mean np 2.026

Probability of success p 0.25325


Probability of failure q 0.74675
Interpretation:-
Hence, we can say that mean of the given data is 1.04878, n is 5

Estimated frequency after fitting poisson distribution similar to409.6888176

You might also like