0% found this document useful (0 votes)
0 views51 pages

Brm Notes(Unit IV)

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 51

Ashutosh Kumar Jha

Assistant Professor
Medi-Caps University, Indore
Unit-IV Syllabus
Statistical Data Analysis:
Parametric and Non-Parametric Tests
Descriptive Statistics
Comparing Means using t-test - One Sample, Independent
Samples and Paired Samples;

One-way Analysis of Variance (ANOVA)


Chi-square Test
Parametric and Non-Parametric Tests

Parametric Test:-
Parametric Test assumes that sample data comes from a population that
has probability distribution with a fixed set of parameter.
Example:- t-Test, Z-Test etc.

Non-Parametric Test:- The nonparametric test is defined as the


hypothesis test which is not based on underlying assumptions, i.e. it does not
require population’s distribution to be denoted by specific parameters.

Example:- Chi- Square Test, Kruskal-Wallis Test


Difference between Parametric and Non-Parametric Tests

Properties Parametric Non-Parametric

Assumption Yes No
Central Tendency Value Mean Median Value
Correlation Pearson Spearman
Probability Distribution Normal Arbitrary
Population Knowledge Required Not Required
Applicability Variable Variable & Attribute
Example t-Test, Z-Test Chi-Square Test,
Kruskal-Wallis Test
Descriptive Statistics & Central Tendency

Descriptive statistics
It is a means of describing features of a data set by generating
summaries about data samples.
It's often depicted as a summary of data shown that explains the
contents of data.
For example, a population census may include descriptive statistics
regarding the ratio of men and women in a specific city.

Central tendency
The central tendency is stated as the statistical measure that
represents the single value of the entire distribution or a dataset.
Tools of Descriptive Statistics

Mean :- The value which we get by dividing the total of the value of
various given item in a series by the total number of item.

Median:- Median divides the data sets into two equal parts.
Tools of Descriptive Statistics

Mode:- The mode is the value that appears most often in a set
of data
t-Test
t-test
A T-test is the final statistical measure for
determining differences between two means
that may or may not be related.

The t-test is based on the t- distribution, which


is a bell-shaped curve like the normal
distribution, but has heavier tails.

As the sample size increases, the degrees


of freedom also increase, and the
William Sealy Gosset
becomes similar to the normal distribution.
t-test
Degree of Freedom
Types of t- Test
It is the number of values in
the final calculation of a
One Sample t- Test
statistic that are free to vary.
Independent Two Sample t-
Formula of Degree of Test

Freedom = n-1 Paired (Two Sample) t- Test


Where, n = Number of Observation
Welch’s Test or unequal
Variance t- Test
t-Distribution Curve
Assumption in one Sample t-Test
 The Sample is Collected form Population through
Simple Random Sampling method.

 Each Observation (Sampling Unit) is independent to each


Other
 The Population Standard Deviation ‘σ’ is unknown.

 The Population from which Sample is drown, follows


Normal Distribution
Assumption in Two Sample t-Test
 Independence: The observations in one sample are
independent of the observations in the other sample

 Normality: Both samples are approximately normally


distributed.

 Homogeneity of Variances: Both samples have


approximately the same variance.

 Random Sampling: Both samples were obtained using a


random sampling method.
Question
Raju Restaurant near the Railway Station has average
sale of 500 tea cup per day. Because of the development
of bus stand nearby, it is expected to increase its sale.
After the start of Bus Stand, the Daily sales of first 12
days is Following
550, 570, 490, 615, 505, 580, 570, 460, 600, 580, 530, 526
On the basis of above sample information, can one
conclude that Raju Restaurant’s sales have increased?
Apply 5% Level of Significance.
Solution
H0 = There is no any incensement in sales of tea cup
H1 = There is an incensement in sales of tea cup
Given
Population Mean, µ = 500 tea cup per day
Number of Sample, n = 12

To be find
Sample Mean, ẋ
Standard Deviation of Sample , S
Solution
SN Xi (Xi-ẋ) (Xi-ẋ)2
1 550 2 4
2 570 22 484
3 490 -58 3364
4 615 67 4489
5 505 -43 1849
6 580 32 1024
7 570 22 484
8 460 -88 7744
9 600 52 2704
10 580 32 1024
11 530 -18 324
12 526 -22 484
ƩXi = 6576 Ʃ(Xi-ẋ)2 = 23978
Solution
Applying the Formula of t-Test

S= Ʃ(Xi-ẋ)2/(n-1) = 23978/11
t= (548-500)/46.68/
48/13.49 = 3.558
Solution
Applying the Formula of t –Test

By using table of t- distribution


Critical Value = 1.796
Calculated Value = 3.558 1.796
Calculated Value > Critical Value

Conclusion:- 3.558
Null Hypothesis is Rejected,
Therefore, There is an increase
in sales of tea cup.
Independent Two Sample t-Test
Formula of Two
Mean = X1 sample t- Test Mean = X2
Number of Observation = with Equal Number of Observation =
n1 n2
Variance
Standard Deviation = S1 Standard Deviation = S2
Question
Ajay grows tomatoes in two separate fields. He is
curious to know whether the size of tomato plan is
differ in both the field or not. He takes random
sample from two fields, Here is the summery of
Result Mean of sample 1 and Sample 2 is 14 cm
and 11 cm respectively.
Standard Deviation of sample 1 and Sample 2 is
1.91 cm and 2.45 cm respectively. Number of Plant
in Sample 1 and Sample 2 is 7 and 5 Respectively.
Solution
Hypothesis Formulation
H0 = Size of the tomato Plant does not differ in both the Sample
H1 = Size of the tomato differs in both the Sample

Given,
● Mean of first Sample, X1 bar = 14 cm
● Mean of Second Sample, X2 bar = 11 cm
● Standard Deviation of First Sample , S1 = 1.91 cm
● Standard Deviation of Second Sample , S2 = 2.45 cm
● Number of Units in First Sample, n1 = 7
● Number of Units in First Sample, n1 = 5
Calculation of t - Statistics
Calculation of t Critical
Degree of Freedom =
n1 + n2 -2 = 7+5-2 = 10

At 10 Degree of freedom and


2.419
5% Level Significance , t
Critical for two tailed Test =
-2.228
2.228 +2.228

Conclusion
t Calculated > t Critical
Null Hypothesis is
Rejected, Size of Tomato Plant
is differ in both Sample
Paired Sample t-Test
Formula of Paired
Sample t- Test

Where,
D bar = Mean of Differences
S diff = Standard Deviation of Differences
D = Differences (Di= Xi-Yi)
n = Number of Observation
Question
Typing Speed of 9 Students was tested before and
after training. State at 5% Level of Significance
whether the training was effective from the
following Scores.

Student 1 2 3 4 5 6 7 8 9
Before 10 15 9 3 7 12 16 17 4
After 12 17 8 5 6 11 18 20 3
Solution
SN X Y D= (X-Y) D2 (D Square)
1 10 12 -2 4
2 15 17 -2 4
3 9 8 1 1
4 3 5 -2 4
5 7 6 1 1
6 12 11 1 1
7 16 18 -2 4
8 17 20 -3 9
9 4 3 1 1
ƩD = -7 ƩD2 = 29
Solution
Hypothesis Formulation
H0 = There is no any impact of Training on Typing Speed
H1 = There is significant impact of Training on Typing Speed

Calculation for Mean of Difference


= ∑D/n = -7/9 = (- 0.778)

Calculation for Degree of Freedom


n-1 = 9-1 = 8
Solution (Cont. …)
Now Appling the formula of Standard Deviation of Differences
Solution (Cont. …)
Calculation of t- Statistics
Now from t-Table, we
will get Value of t
Critical at 5%
Significance Level
and 8 Degree of
Freedom for Two
tailed Test

T Critical = 2.306
-2.306 +2.306

-1.362
Chi- Square Test
Chi-Square Test
 The chi-squared test is done in order to check the difference that is
present between the observed value and expected value.
 It was introduced by Karl Pearson in 1900.
 The Test Statistics do not utilize any estimate like Sample mean,
Sample Variance etc.
 Therefore it is a Non-Parametric Test.

Formula Chi-Square Test =


Question
Assembly Election are announced in three state. A
Political Party wants to know the proportion of its
supporter. Party conducted sample survey of 1000
in each state and found that, there are 300, 350 and 425
Supporter in three states.
Using above data, find the proportion of supporter is
same or not.
Solution
● Given in Question
Supporter State 1 State 2 State 3
Yes 300 350 425
No 700 650 575
● H0 = Supporters across all three states are NOT in
different Proportion.
● H1 = Supporters across all three states are in different
Proportion.
● Applying Chi-Square Test
Observed Frequency
Supporter State 1 State 2 State 3 Row Total (RT)

Yes 300 350 425 1075

No 700 650 575 1925

Column Total (CT) 1000 1000 1000 3000


● Expected Frequency = (CT x RT)/ GT CT = RT = GT***
● CT = Column Total
***This Condition must be
● RT = Row Total
fulfill in Chi Square Test
● GT = Grand Total
Solution
Observed Expected Observed – (Observed – (Observed –
Frequency Frequency Expected Expected)2 Expected)2/Expected
O E (O-E) (O-E)2 (O-E)2/E
300 358.33 -58.33 3402.38 9.49
350 358.33 -8.33 69.38 0.19
425 358.33 66.67 4444.88 12.40
700 641.67 58.33 3402.38 5.30
650 641.67 8.33 69.38 0.10
575 641.47 -66.47 4418.26 6.88
Solution
Ʃ[(O-E)2/E] = 9.49 + 0.19 + 12.40 + 5.30 + 0.10 + 6.88 = 34.36
Calculated Value = 34.36
Now we will find critical Value from Ch-Square Table

Degree of Freedom = (Column -1) x ( Row – 1) = (C-1) x (R-1)


= (3-1) x (2-1) = 2 x 1 = 2
Degree of Freedom = 2

From the Table of Chi- Square, Critical Value at 5% Level of


Significance and 2 Degree of Freedom is 5.991
Critical Value =
5.991

34.36 > 5.991


Calculated Value > Critical Value
Therefore,
The Null Hypothesis is Rejected
Conclusion :- Supporters are Not in
same proportion across all three
states
ANOVA
(Analysis of Variance)
INTRODUCTION
Analysis of variance (ANOVA) is a
collection of statistical models and their
associated estimation procedures (such
as the "variation" among and between
groups) used to analyze the differences
among group means in a sample.

It is used to compare more than two


population group.

ANOVA was developed by the British


Sir Ronald Aylmer Fisher
statistician Ronald Fisher
(1890-1962)
Question
AIIMS, New delhi has conducted a research on effect of Anti Corona drug
on three group of people in india belongs to three different states/UT , NCT
of Delhi, Kerala and Assam. These states have been chosen due to its
geographical location. The drug cure patient in certain hours. The data of
patients have been mentioned following.

NCT of Delhi Kerala Assam


2 hr. 4 hr. 2 hr.

4 hr. 3 hr. 4 hr.

3 hr. 7 hr. 4 hr.

7 hr. 3 hr. 10 hr.

4 hr. 3 hr. 5 hr.


Solution = It will solve by One Way ANOVA
Given in Question
NCT of Kerala Assam
Total No of Observation,n = Delhi (X2) (X3)
(X1)
15
2 4 2
Formation of Hypothesis 4 3 4
H0 = There is no significant 3 7 4
difference in group Sample 7 3 10

H1 = There is Significant 4 3 5
difference across all groups ∑X1 = 20 ∑X2 = 20 ∑X3 = 25
Solution
Step -1: Take the total value of individual item in all sample.

T= ∑Xij = ∑X1+ ∑X2+ ∑X3 = 20 + 20 + 25 = 65

Step-2 : Correction Factor = T 2/n= 65 2/15 = 281.67

Step-3 : Sum of square between = ∑T 2j/nj - T 2/n


(20^2)/5 + (20^2)/5 + (25^2)/5 - 281.67
= (400/5 + 400/5+625/5)-281.67
= 285-281.67 = 3.33
Solution
Step - 4 : Sum of square within

{Xij2 - T 2 /n } - {T 2j/nj - T 2/n}

={ (4+16+9+49 +16+ 16+ 9+ 49+ 9+ 9+ 4+ 16+ 16+ 100+


25) - 281.67 } - 3.33

= 347-281.67 - 3.33

= 65.33 - 3.33

= 62
Solution
Step - 5 : Sum of square Total
{Xij2 - T 2 /n }
= { (4+16+9+49 +16+ 16+ 9+ 49+ 9+ 9+ 4+ 16+ 16+
100+ 25) - 281.67 }
= 347-281.67
= 65.33
Source of Sum of Degree of Mean Square F- Ration Critical
Veriation Square Freedom (MS) Value of F
SS (d.f)

Between
Sample MS Between / 3.88
SS Between/df MS within = From F-
3.3 C-1= 3-1=2 between sample 1.65/5.16 = distributi
= 3.3/2 = 1.65
0.319 on Table

Within SS Within/ df
N-C=15-
Sample 62 3=12
Within sample =
62/12 = 5.16

Total N-1=15-
65.3 1=14
Graphical Representation

Accepted Region
Rejection Region

F- Critical
F- Calculated 3.88
0.319
Conclusion
Since the F value calculated is less then F
value critical, therefore Null Hypothesis will
be fail to rejected.
There is No significant difference in the group
of sample.

You might also like