ESE stats
ESE stats
ESE stats
1. The mean of 6, 8, x + 2, 10, 2x - 1, and 2 is 9. Find the value of x and also the
value of the observation in the data.
(9, 11, 17)
3. The mean of the following distribution is 26. Find the value of p and also the
value of the observation.
xi 0 1 2 3 4 5
fi 3 3 p 7 p-1 4
(2, 1)
4. If a die is rolled, then find the variance and standard deviation of the
possibilities.
5. Find the standard deviation of the average temperatures recorded over a five-
day period last winter: 18, 22, 19, 25, 12 (The mean = 19.2)
(Standard deviation for the temperatures recorded is 4.9; the variance is 23.7)
A survey of 36 students of a class was done to find out the mode of transport used by
them while commuting to the school. The collected data is shown in the table given
below. Represent the data in the form of a bar graph.
Mode of Transport Number of Students
Cycle 6
School Bus 16
Walking 10
Car 4
6. Construct a frequency distribution table for the following weights (in gm) of 30
oranges using the equal class intervals, one of them is 40-45 (45 not included). The
weights are: 31, 41, 46, 33, 44, 51, 56, 63, 71, 71, 62, 63, 54, 53, 51, 43, 36, 38, 54, 56, 66,
71, 74, 75, 46, 47, 59, 60, 61, 63.
ANS:
C.I. 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80
Frequency 2 2 3 3 5 3 6 1 4 1
(a) 52.5
(b) 44 gm
(c) 10
(d) 65 - 70, 75 - 80
7. The box plot below was constructed from a collection of times taken to run
a 100 m sprint. Using the box plot, determine the range and interquartile range.
Ans :
Solution:
(i) 25
(ii) 20 – 25
(iii) 90
(iv)
10. Assume that we have increased the sample size to 80 in the example above and
derived similar values for the mean and standard deviation of returns. Estimate the
standard error of the sample mean.
A. 0.01
B. 0.02
C. 0.08
a) 0.9938
b) 0.9878
c) 0.3944
12. A radar unit is used to measure speeds of cars on a motorway. The speeds are
normally distributed with a mean of 90 km/hr and a standard deviation of 10 km/hr.
What is the probability that a car picked at random is travelling at more than 100
km/hr?
(The probability that a car selected at a random has a speed greater than 100 km/hr is
equal to 0.1587)
13. For a certain type of computers, the length of time bewteen charges of the battery is
normally distributed with a mean of 50 hours and a standard deviation of 15 hours.
John owns one of these computers and wants to know the probability that the length of
time will be between 50 and 70 hours.
(The probability that John's computer has a length of time between 50 and 70 hours is
equal to 0.4082.)
14. Calculate the correlation coefficient for the following data. X = 4, 8 ,12, 16 and Y =
5, 10, 15, 20.
(Ans. 1)
15. Find the value of the correlation coefficient from the data given in the
following table:
(Ans-0.5298)
16. The scores for some candidates in a test are 40, 45, 49, 53, 61, 65, 71, 79, 85,
91. What will be the percentile for the score 71?
(And-60)
17. The scores for some candidates in a test are 40, 45, 49, 53, 61, 65, 71, 79, 85,
91. What will be the score with a percentile value of 90?
(And-85)
18.
Central Limit Theorem
Bootstrap
Confidence interval & Standard Error
1. Find the standard error of the estimate of the mean weight of high school football
players using the data given of weights of high school football players from your
school. Then find a 95% confidence interval for the data.
Ans.
Mean = 181.6 pounds, SD = 15.88
Standard error = 5.02 pounds
Confidence interval : We add & subtract 1.96 x 5.02.
Therefore it is 171.76 & 191.4
2. Find the standard error of the estimate for the average number of children in a
household in your city by using the data collected from a sample of households in
your city. Then find a 95% confidence interval for the data.
Ans.
Mean = 2.23, SD = 1.669
Standard error = 0.59
Confidence interval : We add & subtract 1.96 x 0.59.
Therefore it is 1.09 & 3.4
Normal Distribution & Standard Normal distribution
19. X is a normally distributed variable with mean μ = 30 and standard deviation σ = 4.
Find a) P(x < 40), b) P(30 < x < 35)
Ans:
a) 0.9938
b) 0.3944
20. A radar unit is used to measure speeds of cars on a motorway. The speeds are normally
distributed with a mean of 90 km/hr and a standard deviation of 10 km/hr. What is the
probability that a car picked at random is travelling at more than 100 km/hr?
Ans : (The probability that a car selected at a random has a speed greater than 100 km/hr is equal
to 0.1587)
21. For a certain type of computers, the length of time between charges of the battery is normally
distributed with a mean of 50 hours and a standard deviation of 15 hours. A student owns one of
these computers and wants to know the probability that the length of time will be between 50 and
70 hours.
Ans \: (The probability that John's computer has a length of time between 50 and 70 hours is
equal to 0.4082.)
22.
Ans: 0.5948
23.
Ans: 711.24
24.
Ans: 0.5471
25.
Ans:274.32
26.
Ans:0.4401
27.
Ans:4067.5
T-distribution
1. If the sample mean and expected mean value of the marks obtained by 15 students
in a class test is 290 and 300 respectively. What is the t-score if the standard
deviation of the marks is 50?
Answer: T score of the marks is -0.7745
3. If the sample mean and expected mean value of the marks obtained by 15 students
in a class test is 290 and 300 respectively. What is the t-score if the standard
deviation of the marks is 50?
Answer: T score of the marks is -0.7745.
4. If the sample mean and expected mean value of the height of 16 friends is 170 and
165 respectively. What is the t-score if the standard deviation of the heights is 21.05?
Answer: T score of the height is 0.95.
5. If the sample mean and expected mean value of the marks obtained by 15 students
in a class test is 290 and 300 respectively. What is the t-score if the standard
deviation of the marks is 50?
6. If the sample mean and expected mean value of the height of 16 friends is 170 and
165 respectively. What is the t-score if the standard deviation of the heights is 21.05?
Answer: T score of the height is 0.95.
QQ-plots
Binomial distribution
Exponential distribution
𝟏𝟏
𝟏𝟏
A mobile conversation follows an exponential distribution 𝒇𝒇(𝒙𝒙) = 𝟑𝟑 𝒆𝒆−𝒂𝒂𝒙𝒙 . What is the
probability that the conversation takes more than 5 minutes?
Poisson distribution
F distribution
Chi square distribution
Weibull distribution
10. Same as before, but this time jokers are included, and you counted 1662 cards, with
these results:
Spades 404
Hearts 420
Diamonds 400
Clubs 356
Jokers 82
a. How many jokers would you expect out of 1662 random cards? How many of
each suit?
b. Is it possible that the cards are really random? Or are the discrepancies too
large?
11. A genetics engineer was attempting to cross a tiger and a cheetah. She predicted a
phenotypic outcome of the traits she was observing to be in the following ratio 4
stripes only: 3 spots only: 9 both stripes and spots. When the cross was performed
and she counted the individuals she found 50 with stripes only, 41 with spots only
and 85 with both. According to the Chi-square test, did she get the predicted
outcome?
12. Let X= amount of time a shopkeeper spends with his customer follows exponential
distribution with the average amount of time equal to 4 minutes. Find the
probability that the shopkeeper is going to spend 5 minutes with the customer?
Solution.
13. The amount of time a student takes to solve any problem follows an exponential
distribution with the average amount of time equal to 8 minutes. What will be the
probability that he will take 5 minutes to solve the problem?
Solution.
14. Let X be a random variable with mean μ=20 and standard deviation σ=4. A sample
of size 64 is randomly selected from this population. What is the approximate
probability that the sample mean ˉX of the selected sample is less than 19?
15. In the first semester of the year 2003, the average return for a group of 251
investing companies was 4.5% and the standard deviation was 1.5%. If a sample of
40 companies is randomly selected from this group, what is the approximate
probability that the average return of the companies in this sample was
between 4% and 5% in the first semester of the year 2003?
16. A pension fund company carries out a study of a large group of mutual funds and
find that their average return over a period of 5 years was 80%80% with a standard
deviation equal to 30%30%. If a sample of 5050 mutual funds is randomly selected
from the group, what is the approximate probability that the sample had an average
return greater than 90%90% over the 5 year period?
18. Assume that we have increased the sample size to 80 in the example above and derived
similar values for the mean and standard deviation of returns. Estimate the standard error of the
sample mean.
A. 0.01
B. 0.02
C. 0.08
28.
Ans:20.9%
Module – 3 Practice problems part 1 (Hypothesis Testing & Type I & 2 errors)
Ha µ≠ 14 mg
2. The school principal wants to test if it is true what teachers say – that high
school juniors use the computer an average 3.2 hours a day. What are our
null and alternative hypotheses?
Ho µ = 3.2 hrs
Ha µ≠ 3.2 hrs
3. A researcher claims that black horses are, on average, more than 30 lbs
heavier than white horses, which average 1100 lbs. What is the null
hypothesis, and what kind of test is this?
The null hypothesis would be notated H0 : µ ≤ 1130 lbs This is a right-tailed test, since the tail of the
graph would be on the right. Recognize that values above 1130 would indicate that the null hypothesis
be rejected.
4. A package of gum claims that the flavor lasts more than 39 minutes. What
would be the null hypothesis of a test to determine the validity of the claim?
What sort of test is this?
The null hypothesis would by notated as H0 : µ ≤ 39. This is a right-tailed test, since the rejection
region would consist of values greater than 39
5. What is the critical value �𝑍𝑍𝛼𝛼 � for a 95% confidence level, assuming a two-
𝑧𝑧
tailed test?
A 95% confidence level means that a total of 5% of the area under the curve is considered the critical
region. Since this is a two-tailed test, 1 2 of 5% = 2.5% of the values would be in the left tail, and the
other 2.5% would be in the right tail. Looking up the Z-score associated with 0.025 on a reference
table, we find 1.96. Therefore, +1.96 is the critical value of the right tail and -1.96 is the critical value
of the left tail. The critical value for a 95% confidence level is Z = +/−1.96
6. Sketch the Z-score critical region for Example 5.
7. What would be the critical value for a right-tailed test with α = 0.01?
If α = 0.01, then the area under the curve representing H1, the alternative hypothesis, would be 99%,
since α (alpha) is the same as the area of the rejection region. Using the Z-score reference table above,
we find that the Z-score associated with 0.9900 is approximately 2.33. It appears that the critical value
is Z = 2.33
8. The school nurse thinks the average height of 7th graders has increased. The
average height of a 7th grader five years ago was 145 cm with a standard
deviation of 20 cm. She takes a random sample of 200 students and finds
that the average height of her sample is 147 cm. Are 7th graders now taller
than they were before? Conduct a single-tailed hypothesis test using a .05
significance level to evaluate the null and alternative hypotheses.
H0 : µ ≤ 145 Ha : µ > 145
Choose α = .05. The critical value for this one tailed test is z=1.64. This is a one-tailed test, and a z-
score of 1.64 cuts off 5% in the single tail. Any test statistic greater than 1.64 will be in the rejection
region
Next, we calculate the test statistic for the sample of 7th graders. z = 147−145 √ 20 200 ≈ 1.414 The
calculated z−score of 1.414 is smaller than 1.64 and thus does not fall in the critical region. Our
decision is to fail to reject the null hypothesis and conclude that the probability of obtaining a sample
mean equal to 147 is likely to have been due to chance.
9. A farmer is trying out a planting technique that he hopes will increase the
yield on his pea plants. The average number of pods on one of his pea plants
is 145 pods with a standard deviation of 100 pods. This year, after trying his
new planting technique, he takes a random sample of his plants and finds the
average number of pods to be 147. He wonders whether or not this is a
statistically significant increase. What are his hypotheses and the test
statistic?
H0 : µ ≤ 145 Ha : µ > 145
If we choose α = .05 4. The critical value will be 1.645. We will reject the null hypothesis if the test
statistic is greater than 1.645. The value of the test statistic is 0.24. 5. This is less than 1.645 and so our decision
is to fail to reject H0. Based on our sample we believe the mean is equal to 145.
10. The high school athletic director is asked if football players are doing as
well academically as the other student athletes. We know from a previous
study that the average GPA for the student athletes is 3.10. After an
initiative to help improve the GPA of student athletes, the athletic director
randomly samples 20 football players and finds that the average GPA of the
sample is 3.18 with a sample standard deviation of 0.54. Is there a
significant improvement? Use a 0.05 significance level.
H0 : µ = 3.10 Ha : µ 6= 3.10
We know that we have 20 observations, so our degrees of freedom for this test is 19. Nineteen degrees
of freedom at the 0.05 significance level gives us a critical value of ± 2.093.
Thus, the athletic director can conclude that the mean academic performance of football players does
not differ from the mean performance of other student athletes.
11. Duracell manufactures batteries that the CEO claims will last an average of
300 hours under normal use. A researcher randomly selected 20 batteries
from the production line and tested these batteries. The tested batteries had a
mean life span of 270 hours with a standard deviation of 50 hours. Do we
have enough evidence to suggest that the claim of an average lifetime of 300
hours is false?
H0 : µ = 300 HA : µ 6= 300
Standard Error: SEx¯ = √s n SEx¯ = √ 50 20 = 11.18
We know that we have 20 batteries, so our degrees of freedom for this test is (20-1)= 19. Nineteen
degrees of freedom at the 0.05 significance level gives us a critical value of ± 2.093
The average battery life of the sample is significantly different from the average battery life claim by the
CEO.
12. You have just taken ownership of a pizza shop. The previous owner told you
that you would save money if you bought the mozzarella cheese in a 4.5
pound slab. Each time you purchase a slab of cheese, you weigh it to ensure
that you are receiving 72 ounces of cheese. The results of 7 random
measurements are 70, 69, 73, 68, 71, 69 and 71 ounces. Are these
differences due to chance or is the distributor giving you less cheese than
you deserve?
a. State the hypotheses.
b. Calculate the test statistic.
c. Would the null hypothesis be rejected at the 10% level? The 5% level?
The 1% level?
14. The average score on a test is 80 with a standard deviation of 10. With a
new teaching curriculum introduced it is believed that this score will
change. On random testing, the score of 38 students, the mean was found to
be 88. With a 0.05 significance level, is there any evidence to support this
claim?
There is a difference in the scores after the new curriculum was introduced.
15. The average score of a class is 90. However, a teacher believes that the
average score might be lower. The scores of 6 students were randomly
measured. The mean was 82 with a standard deviation of 18. With a 0.05
significance level use hypothesis testing to check if this claim is true.
16. A stenographer claims that she can take dictation at the rate of 120 words
per minute. Can we reject her claim on the basis of 100 trials in which she
demonstrated a mean of 116 words with standard deviation of 15 words ?
Claim rejected
17. An automatic machine was designed to pack exactly 2 kg. of tea. A sample
of 100 packs was examined to test the machine. The average weight was
found to be 1.94 kg. with standard deviation of 0.10 kg. is the machine
working properly ?
The machine is not working properly
18. A sample of 600 persons selected at random from a large city shows that
there are 53% smokers. Is there any reason to doubt the hypothesis that
smokers and non-smokers are equal in number in the city ?
smokers and non-smokers are equal in numbers in that city
19. When flipped 1000 times, a coin landed 515 times heads up. Does it support
the hypothesis that the coin is unbiased ?
The coin is not unbiased
20. While throwing 5 die 40 times, a person got success 25 times - getting a 4
was called success. Can we consider the difference between expected value
and observed value as being significantly different ?
The dice is not unbiased
21. A patented medicine claimed that it is effective in curing 90% of the patients
suffering from malaria. From a sample of 200 patients using this medicine,
it was found that only 170 were cured. Determine whether the claim is right
or wrong. (Take 1% level of significance).
The claim is justified
22. A random sample of 400 male students have average weight of 55 kg. Can
we say that the sample comes from a population with mean 58 kg. with a
variance of 9 kg. ?
23. A random sample of 400 tins of vegetable oil and labeled "5 kg. net weight"
has a mean net weight of 4.98 kg. with standard deviation of 0.22 kg. Do we
reject the hypothesis of net weight of 5 kg. per tin on the basis of this sample
at 1% level of significance ?
Accepted at 1% level of significance
25. Which of the following is a correct statement (in the context of hypothesis
tests)?
A. The Power of a test increases as the Type 2 error probability does
B. It is not possible to decrease both Type 1 error and Type 2 error at the same time.
C. The significance level is always equal to the probability of Type 2 error.
D. A test is significant if it fails to reject the null hypothesis.
26. Bottles of water have a label stating that the volume is 12 oz. A consumer
group suspects the bottles are under‐filled and plans to conduct a test. A
Type I error in this situation would mean
A. the consumer group concludes the bottles have less than 12 oz. when the mean actually is 12 oz.
B. the consumer group does not conclude the bottles have less than 12 oz. when the mean actually is
less than 12 oz.
C. the consumer group has evidence that the label is incorrect.
27. The owner of travel agency would like to determine whether or not the mean
age of the agency's customers is over 24. If so, he plans to alter the
destination of their special cruises and tours. If he concludes the mean age is
over 24 when it is not, he makes a _______ error. If he concludes the mean
age is not over 24 when it is, he makes a ______error.
A) Type II; Type II
B) Type I; Type I
C) Type I; Type II
D) Type II; Type I
30. A bottling company needs to produce bottles that will hold 12 ounces of
liquid. Periodically, the company gets complaints that their bottles are not
holding enough liquid. To test this claim, the bottling company randomly
samples 36 bottles. Suppose the p-value of this test turned out to be 0.0455.
State the proper conclusion.
A) At α = 0.085, fail to reject the null hypothesis.
B) At α = 0.035, accept the null hypothesis.
C) At α = 0.05, reject the null hypothesis.
D) At α = 0.025, reject the null hypothesis.
33. A weight reducing program that includes a strict diet and exercise claims on
its online advertisement that it can help an average overweight person lose
10 pounds in three months. Following the program’s method a group of
twelve overweight persons have lost 8.11 5.7, 11.6, 12.9, 3.8, 5.9, 7.8, 9.1,
7.0, 8.2, 9.3 and 8.0 pounds in three months. Test at 5% level of significance
whether the program’s advertisement is overstating the reality.
35. A sample of 32 money market mutual funds was chosen on January 1, 1996
and the average annual rate of return over the past 30 days was found to be
3.23% and the sample standard deviation was 0.51%. A year earlier a
sample of 38 money-market funds showed an average rate of return of
4.36%. Is it reasonable to conclude (at α = 0.05) that money-market interest
rates declined during 1995?
Reject Ho
36. A large hotel chain in trying to decide whether to convert more of its rooms
into non-smoking rooms. In a random sample of 400 guests last year, 166
had requested the non-smoking rooms. This year 205 guests in a sample of
380 preferred the non-smoking rooms. Would you recommend that the hotel
chain convert more rooms to non-smoking? Support your recommendation
by testing the appropriate hypotheses at 0.01 level of signifaicance.
Convert more rooms to Non-smoking
Module 3 part 2 : Practice problems on Anova
1. A clinical trial is run to compare weight loss programs and participants are
randomly assigned to one of the comparison programs and are counselled on
the details of the assigned program. Participants follow the assigned program
for 8 weeks. The outcome of interest is weight loss, defined as the difference
in weight measured at the start of the study (baseline) and weight measured at
the end of the study (8 weeks), measured in pounds.
ANSWER:
We reject H 0 because 8.43 > 3.24. We have statistically significant evidence at α=0.05 to
show that there is a difference in mean weight loss among the four diets.
2. Calcium is an essential mineral that regulates the heart, is important for blood
clotting and for building healthy bones. The National Osteoporosis Foundation
recommends a daily calcium intake of 1000-1200 mg/day for adult men and
women. While calcium is contained in some foods, most adults do not get
enough calcium in their diets and take supplements. Unfortunately some of the
supplements have side effects such as gastric distress, making them difficult
for some patients to take on a regular basis.
We do not reject H 0 because 1.395 < 3.68. We do not have statistically significant evidence at
a =0.05 to show that there is a difference in mean calcium intake in patients with normal bone
density as compared to osteopenia and osterporosis.
Observation A B C D
1 8 12 18 13
2 10 11 12 9
3 12 9 16 12
4 8 14 6 16
5 7 4 8 15
ANSWER:
As calculated F=1.2821<3.2389
Observation A B C
1 8 7 6
2 10 7 8
3 6 8 10
4 7 9 6
5 9 8 4
6 0 5 5
7 0 0 7
ANSWER:
As calculated F=1.0564<3.6823
Observation A B C
1 25 31 24
2 30 39 30
3 36 38 28
4 38 42 25
5 31 35 28
ANSWER:
As calculated F=7.5>3.8853
ANSWER:
F 0.284805
7. Do TWO WAY ANOVA
ANSWER:
F (MSC/MSE) 5.526316
F (MSB/MSE) 3.157895
9. What is the difference between one way & two way ANOVA test?
Is given below :
Sr. no. Letters Sr. no. Letters Sr. no. Letters Sr. no. Letters
of in of in of in of in
word word word word word word word word
1 4 44 2 87 4 130 2
2 5 45 3 88 4 131 3
3 3 46 6 89 6 132 5
4 5 47 2 90 5 133 3
5 5 48 9 91 4 134 4
6 3 49 3 92 2 135 5
7 3 50 2 93 2 136 2
8 7 51 9 94 10 137 3
9 7 52 3 95 7 138 2
10 5 53 4 96 3 139 7
11 4 54 6 97 6 140 3
12 4 55 2 98 4 141 5
13 9 56 3 99 2 142 4
14 1 57 3 100 5 143 6
15 3 58 2 101 2 144 4
16 6 59 1 102 4 145 3
17 9 60 5 103 3 146 4
18 2 61 11 104 2 147 8
19 7 62 2 105 1 148 4
20 3 63 4 106 6 149 2
21 9 64 3 107 5 150 3
22 2 65 2 108 2 151 4
23 3 66 4 109 6 152 3
24 11 67 4 110 8 153 2
25 4 68 2 111 2 154 3
26 3 69 8 112 6 155 5
27 3 70 1 113 10 156 6
28 3 71 7 114 2 157 4
29 7 72 2 115 6 158 4
30 5 73 4 116 6 159 3
31 3 74 5 117 4 160 4
32 2 75 2 118 6 161 2
33 3 76 1 119 3 162 2
34 7 77 5 120 5 163 3
35 2 78 7 121 3 164 2
36 1 79 5 122 6 165 3
37 5 80 3 123 3 166 6
38 5 81 5 124 4 167 6
39 3 82 3 125 3 168 2
40 7 83 4 126 9 169 2
41 7 84 4 127 4 170 9
42 4 85 5 128 4 171 4
43 6 86 5 129 11 172 2
Sr. no. Letters Sr. no. Letters Sr. no. Letters
of in of in of in
word word word word word word
173 3 216 5 259 6
174 10 217 4 260 3
175 4 218 4 261 3
176 5 219 3 262 6
177 4 220 4 263 5
178 3 221 4 264 3
179 6 222 7 265 6
180 4 223 2 266 4
181 4 224 8 267 3
182 4 225 4 268 5
183 3 226 2
184 2 227 4
185 5 228 6
186 8 229 7
187 2 230 4
188 2 231 5
189 6 232 4
190 3 233 5
191 2 234 3
192 2 235 4
193 2 236 4
194 4 237 2
195 9 238 4
196 2 239 4
197 3 240 4
198 5 241 6
199 4 242 5
200 9 243 3
201 6 244 5
202 2 245 4
203 4 246 1
204 4 247 3
205 5 248 5
206 7 249 2
207 4 250 7
208 2 251 3
209 4 252 4
210 9 253 10
211 8 254 2
212 2 255 3
213 4 256 6
214 5 257 2
215 3 258 3
USE ABOVE DATA TO SOLVE THE PROBLEMS GIVEN BELOW :
Q.1 Find Mean, Mode, Median, Variance, Standard Deviation of the above
population.
Q. 2 Find 10th, 25th, 50th, 75th , 90th percentile for the above data.
Q. 3 Plot Bar chart & Histogram for the above population
Q. 4 Plot Scattered Plot for above population. Find correlation coefficient
between col.1 & Col. 2
Q. 5 Draw box plot for above population.
Q 6 Prints word numbers whose
a. Letters are less than or equal to 4
b. Letters are less than or equal to 10
Q 7 Calculate Z score for [4,5,6,6,6,7,8,12,13,13,14,18]
Q8. Draw scattered plot & find correlation coefficient for the following data :
x y
14.2 215
16.4 325
11.9 185
15.2 332
18.5 406
22.1 522
19.4 412
25.1 614
23.4 544
18.1 421
Q9. A clinical trial is run to compare weight loss programs and participants are
randomly assigned to one of the comparison programs and are counselled on the
details of the assigned program. Participants follow the assigned program for 8
weeks. The outcome of interest is weight loss, defined as the difference in weight
measured at the start of the study (baseline) and weight measured at the end of
the study (8 weeks), measured in pounds. (one way Anova)
Low Calorie Low Fat Low Carbohydrate Control
8 2 3 2
9 4 5 2
6 3 4 -1
7 5 2 0
3 1 3 3
ANSWER:
We reject H 0 because 8.43 > 3.24. We have statistically significant evidence at α=0.05 to
show that there is a difference in mean weight loss among the four diets.
Observation A B C
1 8 7 6
2 10 7 8
3 6 8 10
4 7 9 6
5 9 8 4
6 0 5 5
7 0 0 7
Q 11..
col 1 col 2 col 3
Block-1 75 75 90
Block-2 70 70 70
Block-3 50 55 75
Block-4 65 60 85
Block-5 80 65 80
Block-6 65 65 65
ANSWER:
F (MSC/MSE) 5.526316
F (MSB/MSE) 3.157895
Ans: So F critical value = 3.5225. Since F critical is greater than the F value, we
6. Suppose that you are working in a research company and want to the
Ans: F Critical Value = 3.137. Since the F critical > F value, the null hypothesis
cannot be rejected.
7. A statistician was carrying out F-Test. He got the F statistic as 2.38. The
degrees of freedom obtained by him were 8 and 3. Find out the F value
from the F Table and determine whether we can reject the null
hypothesis at 5% level of significance (one-tailed test).
Ans: The F critical value obtained from the table is 8.845. Since the F statistic
(2.38) is lesser than the F Table Value (8.845), we cannot reject the null
hypothesis.
8. The bank has a Head Office in Delhi and a branch at Mumbai. There are
long customer queues at one office, while customer queues are short at
the other office. The Operations Manager of the bank wonders if the
customers at one branch are more variable than the number of
customers at another branch. A research study of customers is carried
out by him.
The variance of Delhi Head Office customers is 31, and that for the Mumbai
branch is 20. The sample size for Delhi Head Office is 11, and that for the
Mumbai branch is 21. Carry out a two-tailed F-test with a level of significance
of 10%.
Since F critical is greater than the F value, we cannot reject the null hypothesis.
9. Two random samples were drawn from two normal populations ant their
valure are given below. Test whether the two population have the same
variance at 5% level of significance.
A B
16 14
17 16
25 24
26 28
32 32
34 35
38 37
40 42
42 43
45
47
Kruskal-Wallis test
Ans: Calculated χ2 value is greater than the critical value of χ2for a 0.05
significance level. χ2 calculated >χ2 critical hence reject the null hypotheses.
11.A researcher wants to know whether or not three drugs have different
effects on knee pain, so he recruits 30 individuals who all experience
similar knee pain and randomly splits them up into three groups to receive
either Drug 1, Drug 2, or Drug 3.
After one month of taking the drug, the researcher asks each individual to rate
their knee pain on a scale of 1 to 100, with 100 indicating the most severe pain.
Ans: Since the p-value of the test (0.21342) is not less than 0.05, we fail to
reject the null hypothesis.
A 1232
A 751
Vaccine Antibodies (μg/ml)
A 339
A 848
A 447
A 542
– –
B 302
B 57
B 521
B 278
B 176
B 201
– –
C 839
C 342
C 473
C 1128
C 242
C 475
Ans: Here we see that the p-value is ~0.026 which is less than the cutoff 0.05,
so we reject the null hypothesis: the medians are not the same across all three
groups, at least one of them has a different median than the others. This
means that the vaccines do not perform equally well because the resulting
antibody production is not the same for each vaccine. We draw the same
conclusion as we did above when we performed the calculation ourselves!
Again we emphasize that the Kruskal-Wallis test can only tell us that at least
one of the vaccines performs differently than the others. It cannot tell us which
vaccine(s) that is(are).
76 80 70
90 80 85
84 67 52
95 59 93
57 91 86
72 94 79
68 80
Ans: Since, H calc < X2 . We accept the Null Hypothesis. We can say that there is
no difference in the result obtained by using the three training methods.
14.In a Study, 12 participants were divided into three groups of 4 each, they
were subjected
to three different conditions, A (Low Noise), B(Avearge Noise), and C(Loud
Noise).
They were given a test and the errors committed by them on the test were
noted and
are given in the table below.
Ans: Since the critical value is more than the actual value we accept the null
hypothesis that
all the three conditions A (Low Noise), B(Avearge Noise), and C(Loud Noise), do
not
differ from each other, therefore, in the said experiment there was no
differences in the
1groups performance based on the noise level.
15.A state court administrator asked the 24 court coordinators in the state’s
three largest
counties to rate their relative need for training in case flow management on a
Likert
scale (1 to 7).
1 = no training need
7 = critical training need
41
Training Need of Court Coordinators
Ans: The critical chi-square table value of H for α = 0.05, and df = 2, is 5.991
Since 4.42 < 5.991, the null hypothesis is accepted. There is no difference in the
training needs of the court coordinators in the three counties.
16.Original data is displayed in the table below. Is there a difference between
groups 1, 2 and 3 using alpha = 0.05?
Gr-1 Gr-2 Gr-3
27 20 34
2 8 31
4 14 3
18 36 23
7 21 30
9 22 6
Friedman Test
Ans: So, it is concluded that the cleanup system effected the THMs of drinking
water.
18. 7 random people were given 3 different drugs and for each person, the
reaction time corresponding to the drugs were noted. Test the claim at
the 5% significance level that all the 3 drugs have the same probability
distribution.
Drug A Drug B Drug C
2. Find the simple linear regression equation that fits the given data and coefficient of
determination.
Hour Temp
2 21
4 27
6 29
8 86
10 86
12 92
Answer: y = -3.533 + 8.1x
coefficient of determination = r2 = 0.917 = 91.7%
3. Sales data of 10 months for a coffee house situated near a prime location of a city
comprising the number of customers (in hundreds) and monthly sales (in Thousand
Rupees) are given below:
Sr No. of Monthly
customers sales
(in (In
hundreds) Thousand
Rs.)
1 6 1
2 6.1 6
3 6.2 8
4 6.3 10
5 6.5 11
6 7.1 20
7 7.6 21
8 7.8 22
9 8 23
10 8.1 25
Find the simple linear regression equation and coefficient of determination that fits the
given data.
Answer: y = -52.6 + 9.656x
4. A survey was conducted to relate the time required to deliver a proper presentation
on a topic , to the performance of the student with the scores he/she receives. The
following Table shows the matched data:
Hours Score
0.5 57
0.75 64
1 59
1.25 68
1.5 74
1.75 76
2 79
2.25 83
2.5 85
2.75 86
3 88
3.25 89
3.5 90
3.75 94
4 96
Find the regression equation and coefficient of determination that will predict a
student’s score if we know how many hours the student studied.
Answer: y = 54.772 +10.857x
Coefficient of determination: 0.9460
5. Find the simple linear regression equation that fits the given data and coefficient of
determination.
X Y
1 2
2 4
3 6
4 4
5 5
Answer: y = 2.2 + 0.6x
Coefficient of determination: 0.4091
6. Find the simple linear regression equation that fits the given data and coefficient of
determination.
X Y
-2 -1
1 1
3 2
Answer: y = 23/38x + 5/19
Coefficient of determination: 0.9944
7. Find the simple linear regression equation that fits the given data and coefficient of
determination.
X Y
0 2
1 3
2 5
3 4
4 6
Answer: y = 0.9x + 2.2
Coefficient of determination: 0.81
8. Find the simple linear regression equation that fits the given data and coefficient of
determination.
X Y
1 3
2 4
3 5
4 7
Answer: y = 1.3x + 1.5
Coefficient of determination: 0.9657
9. Find the simple linear regression equation that fits the given data and coefficient of
determination.
X Y
2 69
9 98
5 82
5 77
3 71
7 84
1 55
8 94
6 84
2 64
is regressed with least squares regression to y=a 0 +a 1 x. The value of a 1 most nearly is
27.480
28.956
32.625
40.000
Answer: 32.625
12. An instructor gives the same y vs x data as given below to four students and asks
them to regress the data with least squares regression to y=a 0 +a 1 x.
1 10 20 30 40
1 100 400 600 1200
Each student comes up with four different answers for the straight line regression
model. Only one is correct. The correct model is
y=60x-1200
y=30x-200
y=-139.43+29.684x
y=1+22.782x
Answer: Y = -139.43+29.684x
13. The process of constructing a mathematical model or function that can be used to
predict or determine one variable by another variable is called
A. regression B. correlation C. residual D. outlier plot
Ans: A
16. The difference between the actual Y value and the predicted Y value found using a
regression equation is called the
A. slope B. residual C. outlier D. scatter plot
Ans: B
X 5 7 4 15 12 9
Y 8 9 12 26 16 13
MULTIPLE REGRESSION
26. In the context of Multiple linear regression explain what is Over fitting &
multicollinearity?
Ans.
y x1 x2
-3.7 3 8
3.5 4 5
2.5 5 7
11.5 6 3
5.7 2 1
28. Find out what is the relation between the distance covered by an UBER driver and
the age of the driver and the number of years of experience of the driver.
Distance Age Experience
(in years)
32513 18 5
27897 20 7
29929 22 8
20159 23 6
21554 23 7
28466 25 5
27842 2 8
22671 28 6
32214 29 5
34550 32 7
20920 37 9
33714 41 6
26998 46 7
34294 49 8
21912 53 6
Y=31216.5+(13.24*X1)-(585.46*X2)
In this particular example, we will see which variable is the dependent variable and which
variable is the independent variable. The dependent variable in this regression equation is
the distance covered by the UBER driver, and the independent variables are the age of the
driver and the number of experiences he has in driving.
29. Find out what is the relation between the GPA of a class of students and the number
of hours of study and the height of the students.
GPA Height Study
Hours
2.9 66 7
3.16 57 7
3.62 64.5 6
2 62 7
3.45 69.5 8
2.8 65 9
3.63 63 6
2.81 68 5
3.33 59.5 4
2.75 64 10
3.86 69 7
Answer:
The regression equation for the above example will be
y=1.38+(0.038*X1)-(0.1*X2)
In this particular example, we will see which variable is the dependent variable and which
variable is the independent variable. The dependent variable in this regression is the GPA,
and the independent variables are study hours and height of the students.
30. Find out what is the relation between the salary of a group of employees in an
organization and the number of years of experience and the age of the employees.
Answer:
The regression equation for the above example will be
y=41350.4-(60.266*X1)-(891.1*X2)
In this particular example, we will see which variable is the dependent variable and which
variable is the independent variable. The dependent variable in this regression equation is
the salary, and the independent variables are the experience and age of the employees