0% found this document useful (0 votes)

268 views

Examples Regression

Uploaded by

ghania azhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

268 views

Examples Regression

Uploaded by

ghania azhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Linear Regression Examples STAT 314

1. The data below show the sugar content of a fruit (SUGAR) for different numbers of days after
picking (DAYS).
Days Sugar
0 7.9
1 12.0
3 9.5
4 11.3
5 11.8
6 11.3
7 4.2
8 0.4
a. Obtain the estimated regression line to predict sugar content based on the number of days the
fruit is left on the tree. Also create the regression ANOVA table.

Step 1 : Scatterplot

Fruit Scatterplot

12
10
8
Sugar

6
4
2
0
0 1 2 3 4 5 6 7 8
Days

Since we see a slightly linear pattern, linear regression may be appropriate (Assumption 1
is met, but barely).

Step 2 : Compute the Sums of Squares

Let x be DAYS and y be SUGAR.
(∑ x )
2
(34 )2
Sxx = ∑ x 2 − = 200 − = 200 − 144.5 = 55.5
n 8

Sxy = ∑ xy −
( ∑x )( ∑ y ) = 245.1− (34)(68.4 ) = 245.1− 290.7 = −45.6
n 8
(∑ y )
2
(68.4 ) 2

Syy = ∑ y 2 − = 709.08 − = 709.08 − 584.82 = 124.26 = SSTo

n 8
Step 3 : Compute the Least-Squares Linear Regression Equation
Sxy −45.6
b= = = −0.8216
Sxx 55.5
( )
2
68.4 ⎛ 34 ⎞ Sxy
a = y − bx = − (−0.8216)⎝ ⎠ = 12.0418 SSRegr = = 37.46595
8 8 Sxx
yˆ = a + bx = 12.0418 + (−0.8219 ) x = 12.0418 − 0.8219x
ANOVA Table
Source df SS MS F-statistic p-value
Regression 1 37.46595 37.46595 2.590 p > 0.10
Error 6 86.79405 14.46568
Total 7 124.26000
b. Calculate and plot the residuals against DAYS. Do the residuals suggest a fault in the model?
Days Sugar Predicted Residual
(yˆ = 12.0418− 0.8219x ) (y − yˆ )
0 7.9 12.0418 -4.1418
1 12.0 11.2199 0.7801
3 9.5 9.5761 -0.0761
4 11.3 8.7542 2.5458
5 11.8 7.9323 3.8677
6 11.3 7.1104 4.1896
7 4.2 6.2885 -2.0885
8 0.4 5.4666 -5.0666

Residual Plot

6
4
2
Residual

0
-2
-4
-6
0 1 2 3 4 5 6 7 8
Days

The residuals seem to be randomly scattered with an even variability (a slight increase in
variance—Assumption 3 is not met). Therefore, the residual plot seems to indicate that the
relationship may be nonlinear (fault in model). This fault is illustrated in the ANOVA test for the slope
which indicates that DAYS is not useful as a predictor of SUGAR (fail to reject H0—p-value > 0.10).
2. It is generally believed that taller persons make better basketball players because they are better able
to put the ball into the basket. The table below lists the heights of a sample of 25 non-basketball
athletes and the number of successful baskets made in a 60-second time period.
Obs. Height Goals Obs. Height Goals Obs. Height Goals
1 71 15 10 74 18 19 78 22
2 74 19 11 71 13 20 79 23
3 70 11 12 72 15 21 72 16
4 71 15 13 73 17 22 75 20
5 69 12 14 72 16 23 76 21
6 73 17 15 71 15 24 74 19
7 72 15 16 75 20 25 70 13
8 75 19 17 71 15
9 72 16 18 75 19
a. Perform a regression relating GOALS to HEIGHT to ascertain if there is such a relationship
and, if there is, estimate the nature of that relationship. Use the regression ANOVA table to
assess the usefulness of HEIGHT as a predictor of GOALS.

Step 1 : Scatterplot

Basket Goals Scatterplot

25
20
Goals

15
10
5
0
68 70 72 74 76 78 80
Height

Since we see a linear pattern, linear regression may be appropriate (Assumption 1 is met).
Step 2 : Compute the Sums of Squares
Let x be HEIGHT and y be GOALS.
(∑ x )
2
(1825) 2
Sxx = ∑ x − = 133,373− = 148
2

n 25

Sxy = ∑ xy −
( ∑x )( )
∑ y = 30,912 − (1825)(421) = 179
n 25
(∑ y )
2
( 421)2
Syy = ∑ y 2 − = 7321− = 231.36 = SSTo
n 25
Step 3 : Compute the Least-Squares Linear Regression Equation
Sxy 179
b= = = 1.2095
Sxx 148
(S )
2
421 ⎛ 1825⎞
− (1.2095)⎝
xy
a = y − bx = = −71.4535 SSRegr = = 216.49324
25 25 ⎠ Sxx
yˆ = a + bx = −71.4535 + 1.2095x
ANOVA Table
Source df SS MS F-statistic p-value
Regression 1 216.49324 216.49324 334.931 < 0.001
Error 23 14.86676 0.64638
Total 25 231.36
Since the p-value is extremely small, HEIGHT is useful as a predictor of GOALS.
The relationship the regression suggests is an increase of 1.2 goals for every extra inch of height.
b. Estimate the number of goals to be made by an athlete who is 60 inches tall. How much
confidence can be assigned to that estimate?
yˆ = −71.4535 + 1.2095(60) = 1.1165
An athlete who is 60 inches (5 feet) tall will make only 1.1165 goals on average in 60 seconds.
Very little confidence can be assigned to this estimate since it seems foolish…short people will
almost definitely make more than 1 goal in 60 seconds. This is an example of why we should
not extrapolate (predict for x-values that are outside of the range of our original sample).
3. It has been argued that many cases of infant mortality rates are caused by teenage mothers who, for
various reasons, do not receive proper prenatal care. From the Statistical Abstract of the United
States we have statistics on the teenage birth rate (per 1000) and the infant mortality rate (per 1000
live births) for the 48 contiguous states. The data are given below, where TEEN denotes the
birthrate for teenage mothers and MORT denotes the infant mortality rate.
State Teen Mort State Teen Mort State Teen Mort
AL 17.4 13.3 MA 8.3 8.5 OH 13.3 10.6
AR 19.0 10.3 MD 11.7 11.7 OK 15.6 10.4
AZ 13.8 9.4 ME 11.6 8.8 OR 10.9 9.4
CA 10.9 8.9 MI 12.3 11.4 PA 11.3 10.2
CO 10.2 8.6 MN 7.3 9.2 RI 10.3 9.4
CT 8.8 9.1 MO 13.4 10.7 SC 16.6 13.2
DE 13.2 11.5 MS 20.5 12.4 SD 9.7 13.3
FL 13.8 11.0 MT 10.1 9.6 TN 17.0 11.0
GA 17.0 12.5 NB 8.9 10.1 TX 15.2 9.5
IA 9.2 8.5 NC 15.9 11.5 UT 9.3 8.6
ID 10.8 11.3 ND 8.0 8.4 VA 12.0 11.1
IL 12.5 12.1 NH 7.7 9.1 VT 9.2 10.0
IN 14.0 11.3 NJ 9.4 9.8 WA 10.4 9.8
KS 11.5 8.9 NM 15.3 9.5 WI 9.9 9.2
KY 17.4 9.8 NV 11.9 9.1 WV 17.1 10.2
LA 16.8 11.9 NY 9.7 10.7 WY 10.7 10.8
a. Perform a regression to estimate MORT using TEEN as the independent variable. Do the
results confirm the stated hypothesis? Interpret the results. Use a regression ANOVA table.
Step 1 : Scatterplot

Teenage Mothers Scatterplot

14
13
12
Mort

11
10
9
8
7 9 11 13 15 17 19 21
Teen
Since we see a somewhat linear pattern, linear regression may be appropriate
(Assumption 1 is probably met).
Step 2 : Compute the Sums of Squares
Let x be TEEN and y be MORT.
(∑ x )
2
(596.8) 2
Sxx = ∑ x − = 7929.88 − = 509.6667
2

n 48

Sxy = ∑ xy −
( ∑x )( ∑ y ) = 6276.68 − (596.8)( 495.6) = 114.72
n 48
(∑ y )
2
(495.6) 2
Syy = ∑ y 2 − = 5202.72 − = 85.65 = SSTo
n 48
Step 3 : Compute the Least-Squares Linear Regression Equation
Sxy 114.72
b= = = 0.2251
Sxx 509.6667
(S )
2
495.6 ⎛ 596.8⎞
− (0.2251)⎝
xy
a = y − bx = = 7.5263 SSRegr = = 25.82213
48 48 ⎠ Sxx
yˆ = a + bx = 7.5263 + 0.2251x
ANOVA Table
Source df SS MS F-statistic p-value
Regression 1 25.82213 25.82213 19.854 < 0.001
Error 46 59.82787 1.30061
Total 47 85.65000
There seems to be an increase in infant mortality with increased teenage birth rate (TEEN is a useful
predictor of MORT with a positive slope—small p-value). These results seem to confirm the stated
hypothesis.
b. Construct a residual plot. Comment on the results.

Teenage Mothers Residual Plot

4
3
2
Residual

1
0
-1
-2
-3
9 10 11 12
Predicted Value of Mort
The residuals seem to be randomly scattered with an even variability (Assumption 3 is met). There
may be one outlier since there is a large residual (South Dakota—high mortality rate but low teen birth
rate).

[Note that sometimes a residual plot uses the predicted values on the x-axis instead of the predictor
values. Interpretations of this type of plot are similar to those of the usual residual plot method.]
4. An experimenter is testing a new pressure gauge against a standard (a gauge known to be accurate)
by taking three readings each at 50, 100, 150, 200, and 250 pounds per square inch. The purpose
of the experiment is to ascertain the precision and accuracy of the new gauge. The data are shown
below.
Standard Gauge 50 100 150 200 250
48 100 154 201 247
New Gauge 44 100 154 200 245
46 106 154 205 246
As we saw in Example 7.3 both precision and accuracy are important factors in determining the
effectiveness of a measuring instrument. Perform the appropriate analysis to determine the
effectiveness of this instrument. However, this device has a shortcoming that is of a slightly
different nature. Perform the appropriate ANOVA table analyses to find the shortcoming.
Step 1 : Scatterplot

New Instrument Scatterplot

300
250
New Gauge

200
150
100
50
0
0 100 200 300
Standard Gauge
Since we see a linear pattern, linear regression may be appropriate (Assumption 1 is met).
Step 2 : Compute the Sums of Squares
Let x be STANDARD and y be NEW GAUGE.
(∑ x )
2
(2250) 2
Sxx = ∑ x − = 412,500 − = 75,000
2

n 15

Sxy = ∑ xy −
( ∑x )( ∑ y ) = 412,500 − (2250)(2250) = 75,000
n 15
(∑ y )
2
(2250) 2

Syy = ∑ y 2 − = 412,716 − = 75,216 = SSTo

n 15
Step 3 : Compute the Least-Squares Linear Regression Equation
Sxy 75,000
b= = =1
Sxx 75,000
(S )
2
2250 ⎛ 2250 ⎞
− (1)⎝
xy
a = y − bx = =0 SSRegr = = 75000
15 15 ⎠ Sxx
yˆ = a + bx = 0 + (1) x = x
ANOVA Table
Source df SS MS F-statistic p-value
Regression 75000 75000 4513.889 < 0.001
Error 13 216 16.61538
Total 14 75216
The new gauge is a useful (p-value is very small) and accurate instrument (slope of line is 1).
New Instrument Residual Plot

6
4
Residual 2
0
-2
-4
-6
0 50 100 150 200 250 300
Predicted Value of New Gauge

The residuals seem to have an uneven variability (Assumption 3 is not met), and the residuals seem to
have a definite concave-down parabolic pattern (Not linear—Assumption 1 is not met). Even though
the scatterplot looks nearly linear, the residuals show that a quadratic component should be added to
the model. This curvature illustrates the shortcoming of this measuring instrument—the new gauge is
precise for the middle values but imprecise for the smaller and larger values. Therefore, the new
gauge is accurate (since the slope of the regression line is 1), but it is not precise (uneven variability).

5. For which of the following sets of data points is it reasonable to determine a regression line?

The idea behind finding a regression line is based on the assumption that the data points
are actually scattered about a straight line. Only the left data set appears to be scattered
about a straight line. Thus, it is reasonable to determine a regression line only for the left set
of data.

6. Suppose r2 = 1 for a data set.

a. What can you say about SSE?

SSRegr SSTo − SSE

r2 = = = 1 Therefore, SSE must equal 0.
SSTo SSTo
b. What can you say about SSRegr?

SSRegr SSTo − SSE

r2 = = = 1 Therefore, SSRegr must equal SSTo.
SSTo SSTo
c. What can you say about the utility of the regression equation for making predictions?

The regression is extremely useful for making predictions since there is a perfect linear
relationship between the explanatory and response variables.
7. Suppose r2 = 0 for a data set.
a. What can you say about SSE?

SSRegr SSTo − SSE

r2 = = = 0 Therefore, SSE must equal SSTo.
SSTo SSTo
b. What can you say about SSRegr?

SSRegr SSTo − SSE

r2 = = = 0 Therefore, SSRegr must equal 0.
SSTo SSTo
c. What can you say about the utility of the regression equation for making predictions?

The regression is totally useless for making predictions since there is absolutely no linear
relationship between the explanatory and response variables.

8. The figures below show three residual plots. For each plot, decide whether the graph suggests a
violation of one or more of the assumptions for regression analysis. Provide a detailed explanation
for your answers.

a. The graph does not suggest a violation of one or more of the assumptions for
regression inferences; all points are randomly scattered in a horizontal band.

b. Assumption (1) appears to be violated since the points seem to form a (slight) curve
indicating that the data do not follow a straight-line pattern.

c. Assumption (3) appears to be violated since the points form a funnel shape indicating
non-constant variability.
Extensive Linear Regression Examples
1. Ten Corvettes between 1 and 6 years old were randomly selected from the classified ads of The
Arizona Republic. The following data were obtained, where x denotes age, in years, and y denotes
price, in hundreds of dollars.

x 6 6 6 2 2 5 4 5 1 4
y 125 115 130 260 219 150 190 163 260 160

a. Discuss what it would mean for the assumptions of regression analysis to be satisfied by the
variables under consideration.

If the assumptions for regression inferences are satisfied for a model relating a
Corvette’s age to its price, this means that there are constants α, β, and σ such that, for
each age x, the prices for Corvettes of that age are normally distributed with mea α + β x
and standard deviation σ.

b. Determine the regression equation for the data.

x y xy x2 y2
6 125 750 36 15.625
6 115 690 36 13,225
6 130 780 36 16,900
2 260 520 4 67,600
2 219 438 4 47,961
5 150 750 25 22,500
4 190 760 16 36,100
5 163 815 25 26,569
1 260 260 1 67,600
4 160 640 16 25,600
41 1772 6403 199 339,680

Sxy = ∑ xy −
∑ x ∑ y = 6403− 41(1772) = −862.2
n 10
(∑ x )
2
(41) 2

Sxx = ∑ x 2 − = 199 − = 30.9

n 10
(∑ y )
2
(1772) 2
Syy = ∑ y − = 339,680 − = 25,681.6
2

n 10

x=
∑ x = 41 = 4.1 y=
∑ y = 1772 = 177.2
n 10 n 10
Sxy −862.2
b= = = −27.9029 a = y − bx = 177.2 − (−27.9029 )( 4.1) = 291.602
Sxx 30.9

Therefore, the regression equation is: yˆ = 291.602 − 27.9029x .

c. Graph the regression equation and the data points.

For x = 0, yˆ = 291.602 – 27.9029(1) = 263.6991.

For x = 3, yˆ = 291.602 – 27.9029(3) = 207.8933.
Corvettes
300

250
yˆ = 291.602 − 27.9029x
200
Price
($100) 150

100

0
0 1 2 3 4 5 6 7
Age (years)

d. Describe the apparent relationship between age and price for Corvettes.

The price for Corvettes tends to decrease as they get older (as age increases).

e. What does the slope of the regression line represent in terms of Corvette prices?

The slope indicates that Corvettes depreciate an estimated $2,790.29 per year.

f. Use the regression equation obtained in part (b) to predict the price of a 2-year-old Corvette; a
3-year-old Corvette.

For a 2-year-old Corvette, yˆ = 291.602 – 27.9029(2) = 235.7962 or $23,579.62.

For a 3-year-old Corvette, yˆ = 291.602 – 27.9029(3) = 207.8933 or $20,789.33.

g. Identify the predictor and response variables.

The predictor variable is age. The response variable is price.

h. Identify outliers and potential influential observations.

There do not appear to be any outliers or potential influential observations.

i. Compute SSTo, SSRegr, and SSE.

(S )
2
xy (−862.2)2
SSTo = Syy = 25,681.6 SSRegr == = 24,057.9
Sxx 30.9
SSE = SSTo − SSRegr = 25,681.6 − 24,057.9 = 1,623.7
j. Compute the coefficient of determination, r2 .
SSRegr 24,057.9
r = = = 0.9367
2
SSTo 25,681.6
k. Determine the percentage of the total variation in the observed y-values that is explained by the
regression, and interpret your result.
About 93.67% of the variation in the price data is explained by age.
l. State how useful the regression equation appears to be for making predictions.
The regression equation appears to be very useful for making predictions since the
value of r 2 is close to 1.
m. Compute the linear correlation coefficient, r.

Sxy −862.2
r= = = −0.967872
Sxx Syy (30.9)(25,681.6)
n. Interpret the value of r in terms of the linear relationship between the two variables in question.
The above value of r suggests a strong negative linear correlation since the value is
negative and close to -1.
o. Discuss the graphical interpretation of the value of r and check that it is consistent with the
graph you obtained above.

Since the above value of r suggests a strong negative linear correlation, the data points
should be clustered closely about a negatively sloping regression line. This is
consistent with the graph obtained above.
p. Square r and compare the result with the value of the coefficient of determination (r2 ) you
obtained above.
(r) 2 = (−0.967872)2 = 0.9637 = r 2
This value matches the coefficient of determination that was calculated above.

q. At the 10% significance level, do the data provide sufficient evidence to conclude that the slope
of the population regression line is not 0 and, hence, that age is useful as a predictor of price for
Corvettes?

Step 1: Hypotheses
H0 : β = 0 (Age is not a useful predictor of price.)
Ha : β ≠ 0 (Age is a useful predictor of price.)
Step 2: Significance Level
α = 0.10
Step 3: Critical Value(s) and Rejection Region(s)
±tα ,df = n− 2 = ±t0.05,df =8 = ±1.86
2
Reject the null hypothesis if T ≤ -1.86 or if T ≥ 1.86 (p-value ≤ 0.10).
Step 4: Test Statistic
b−0 −27.9029 − 0
T= s = 14.2465 = −10.8873 (p-value < 2(0.001) = 0.002)
ε
Sxx 30.9
SSE 1,623.7
sε = = = 14.2465
n−2 8
Step 5: Conclusion
Since -10.8873 ≤ -1.860 (p-value < 0.002 ≤ 0.10), we shall reject the null
hypothesis.
Step 6: State conclusion in words
At the α = 0.10 level of significance, there exists enough evidence to
conclude that the slope of the population regression line is not zero and,
hence, that age is useful as a predictor of price for Corvettes.

r. Obtain a point estimate for the mean price of all 4-year-old Corvettes.

yˆ * = 291.602 − 27.9029( 4) = 179.9904 = $17,999.04

s. Determine a 90% confidence interval for the mean price of all 4-year-old Corvettes.

1 ( x* − x)
2

yˆ ± tα
*
,df = n− 2
⋅ sε +
2 n Sxx

1 ( 4 − 4.1)
2

179.9904 ± 1.86 ⋅14.2464 +

10 30.9
[171.5974 to 188.3834 ]
We can be 90% confident that the mean price of all four-year-old Corvettes is
somewhere between $17,159.74 and $18,838.34.

t. Find the predicted price of a randomly selected 4-year-old Corvette.

yˆ * = 291.602 − 27.9029( 4) = 179.9904 = $17,999.04 [same as in part (r)]

u. Determine a 90% prediction interval for the price of a randomly selected 4-year-old Corvette.

1 ( x* − x)
2

ˆy ± tα
*
⋅ se 1+ +
,df = n− 2
2 n Sxx

1 (4 − 4.1)
2

179.9904 ± 1.860 ⋅14.2464 1+ +

10 30.9
[152.1947 to 207.7861]
We can be 90% certain that the price of a randomly selected four-year-old Corvette is
somewhere between $15,219.47 and $20,778.61.
v. Draw a graph showing both the 90% confidence interval from part (s) and the 90% prediction
interval from part (u).

w. Why is the prediction interval wider than the confidence interval?

The error in the estimate of the mean price of four-year-old Corvettes is due only to the
fact that the population regression line is being estimated by a sample regression line;
whereas, the error in the prediction of the price of a randomly selected four-year-old
Corvette is due to that fact plus the variation in prices for four-year-old Corvettes.

x. At the 5% significance level, do the data provide sufficient evidence to conclude that age and
price of Corvettes are negatively linearly correlated?

Step 1: Hypotheses
H0 : ρ = 0 (Age and price are not linearly correlated.)
Ha : ρ < 0 (Age and price are negatively linearly correlated.)
Step 2: Significance Level
α = 0.05
Step 3: Critical Value(s) and Rejection Region(s)
−tα ,df = n− 2 = − t0.05,df = 8 = −1.86
Reject the null hypothesis if T ≤ -1.86 (p-value ≤ 0.05).
Step 4: Test Statistic
r −0.967872
T= = = −10.8874 (p-value < 0.001)
1− (−0.967872)
2 2
1− r
n−2 8
Step 5: Conclusion
Since -10.8874 ≤ -1.86 (p -value < 0.001 ≤ 0.05), we shall reject the null
hypothesis.
Step 6: State conclusion in words
At the α = 0.05 level of significance, there exists enough evidence to
conclude that age and price for Corvettes are negatively linearly correlated.
2. The National Center for Health Statistics publishes data on heights and weights in Vital and Health
Statistics. A random sample of 11 males age 18–24 years gave the following data, where x denotes
height, in inches, and y denotes weight, in pounds.
x 65 67 71 71 66 75 67 70 71 69 69
y 175 133 185 163 126 198 153 163 159 151 155
a. Discuss what it would mean for the assumptions of regression analysis to be satisfied by the
variables under consideration.

If the assumptions for regression inferences are satisfied for a model relating an 18–24-
year-old male’s height to his weight, this means that there are constants α, β, and σ such
that, for each height x, the weights of 18–24-year-old males of that height are normally
distributed with mean α + βx and standard deviation σ.

b. Determine the regression equation for the data.

x y xy x2 y2
65 175 11,375 4,225 30,625
67 133 8,911 4,489 17,689
71 185 13,135 5,041 34,225
71 163 11,573 5,041 26,569
66 126 8,316 4,356 15,876
75 198 14,850 5,625 39,204
67 153 10,251 4,489 23,409
70 163 11,410 4,900 26,569
71 159 11,289 5,041 25,281
69 151 10,419 4,761 22,801
69 155 10,695 4,761 24,025
761 1761 122,224 52,729 286,273

Sxy = ∑ xy −
∑ x ∑ y = 122,224 − 761(1761) = 394.8182
n 11
(∑ x )
2
( 761)2
Sxx = ∑ x 2 − = 52,729 − = 81.6364
n 11
(∑ y )
2
(1761) 2
Syy = ∑ y 2 − = 286,273− = 4,352.91
n 11

x=
∑ x = 761 = 69.1818 y=
∑ y = 1761 = 160.0909
n 11 n 11
Sxy 394.8182
b= = = 4.8363 a = y − bx = 160.0909 − ( 4.8363)(69.1818) = −174.4930
Sxx 81.6364
Therefore, the regression equation is: yˆ = −174.4930 + 4.8363x .
c. Graph the regression equation and the data points.
For x = 65, yˆ = -174.4930 + 4.8363(65) = 139.8665.
For x = 74, yˆ = -174.4930 + 4.8363(74) = 183.3932.
18–24-Year-Old Males
200
180
160
140
Weight 120 yˆ = −174.4930 + 4.8363x
(pounds) 100
80
60
40
20
0
60 62 64 66 68 70 72 74 76 78 80
Height (inches)

d. Describe the apparent relationship between height and weight for 18–24-year-old males.

Taller 18–24-year-old males tend to weigh more than smaller ones (or weight tends to
increase as height increases).

e. What does the slope of the regression line represent in terms of heights and weights for 18–24-
year-old males?

The weights of 18–24-year-old males increase an estimated 4.8363 pounds for each
increase in height of one inch.

f. Use the regression equation obtained in part (b) to predict the weight of an 18–24-year-old male
who is 67 inches tall; 73 inches tall..

For a 67 inches tall male, yˆ = -174.4930 + 4.8363(67) = 149.5391 pounds.

For a 73 inches tall male, yˆ = -174.4930 + 4.8363(73) = 178.5569 pounds.

g. Identify the predictor and response variables.

The predictor variable is height. The response variable is weight.

h. Identify outliers and potential influential observations.

The observation (65, 175) appears to be an outlier since it is far away from the
regression line. The observation (75, 198) seems to be a potential influential
observation since it is to the left of the cluster of the rest of the points.
i. Should the above regression equation be used to predict the weight of an 18–24-year-old male
who is 68 inches tall? 60 inches tall? Explain your answers.

It is acceptable to use the regression equation to predict the weight of an 18–24-year-

old male who is 68 inches tall since that height lies within the range of the heights in the
sample data. It is not acceptable (and would be extrapolation) to use the regression
equation to predict the weight of an 18–24-year-old male who is 60 inches tall since that
height lies outside the range of the heights in the sample data (the range of heights upon
which the regression equation is based).

j. For which heights is it reasonable to use the regression equation to predict weight?

It is reasonable to use the regression equation to predict weight for heights between 65
and 75 inches, inclusive.

k. Compute SSTo, SSRegr, and SSResid.

(S )
2
xy ( 394.8182) 2
SSTo = Syy = 4,352.91 SSRegr = = = 1,909.46
Sxx 81.6364
SSResid = SSTo − SSRegr = 4,352.91− 1,909.46 = 2,443.44

l. Compute the coefficient of determination, r2 .

SSRegr 1,909.46
r = = = 0.4387
2
SSTo 4,352.91

m. Determine the percentage of the total variation in the observed y-values that is explained by the
regression, and interpret your result.

About 43.87% of the variation in the weight data is explained by height.

n. State how useful the regression equation appears to be for making predictions.

The regression equation appears to be moderately useful for making predictions since
the value of r 2 is close to 0.5.

o. Compute the linear correlation coefficient, r.

Sxy 394.8182
r= = = 0.662319
Sxx Syy (81.6364)( 4,352.91)
p. Interpret the value of r in terms of the linear relationship between the two variables in question.
The above value of r suggests a moderate positive linear correlation since the value is
positive and close to 0.5.
q. Discuss the graphical interpretation of the value of r and check that it is consistent with the
graph you obtained above.

Since the above value of r suggests a moderate positive linear correlation, the data
points should be clustered moderately closely about a positively sloping regression
line. This is consistent with the graph obtained above.
r. Square r and compare the result with the value of the coefficient of determination (r2 ) you
obtained above.

(r) 2 = (0.662319) 2 = 0.4387 = r2 This value matches the coefficient of determination that
was calculated above.

s. Compute and interpret the standard error of the estimate, sε.

SSResid 2,443.44
sε = = = 16.4771
n−2 9
Roughly speaking, on the average, the predicted weight of an 18–24-year-old male in
the sample differs from the observed weight by about 16.4771 pounds.
t. Interpret the result from part (s) if the assumptions for regression analysis hold.
Presuming that the variables height (x) and weight (y) for 18–24-year-old males satisfy
Assumptions (1)–(3) for regression analysis, the standard error of the estimate sε =
16.4771 pounds provides an estimate for the common population standard deviation σ
of weights for all 18–24-year-old males of any particular height.
u. Obtain the residuals and create a residual plot.
Height Residual 18–24-year-old Males
x e 40
65 35.13 30
67 -16.54
71 16.12 20
71 -5.88 Residual 10
66 -18.70
75 9.77 0
67 3.46 -10
70 -1.05 -20
71 -9.88
69 -8.21 64 66 68 70 72 74 76
69 -4.21
Height (inches)

v. Decide whether it is reasonable to consider that the assumptions for regression analysis are met
by the variables in questions. (The answer here is subjective, especially in view of the extremely
small sample sizes.)

It appears reasonable to consider the assumptions for regression inferences met for the
variables height and weight since we see a random scatter in a horizontal band in the
residual plot. However, there is a potential outlier (65, 175) (e = 35.13) which could cast
some doubt on the assumptions.
w. Do the data provide sufficient evidence to conclude that the slope of the population regression
line is not 0 and, hence, that height is useful as a predictor of weight for 18–24-year-old males?
Use α = 0.10.

Step 1: Hypotheses
H0 : β = 0 (Height is not a useful predictor of weight.)
Ha : β ≠ 0 (Height is a useful predictor of weight.)
Step 2: Significance Level
α = 0.10
Step 3: Critical Value(s) and Rejection Region(s)
±tα ,df = n− 2 = ±t0.05,df =9 = ±1.83
2
Reject the null hypothesis if T ≤ -1.83 or if T ≥ 1.83 (p-value ≤ 0.10).
Step 4: Test Statistic
b 4.8363
T= s = 16.4771 = 2.6520
ε
Sxx 81.6364
(0.02 = 2(0.01) < p-value < 2(0.025) = 0.050)
Step 5: Conclusion
Since 2.6520 ≥ 1.83 (0.02 < p-value < 0.05 ≤ 0.10), we shall reject the null
hypothesis.
Step 6: State conclusion in words
At the α = 0.10 level of significance, there exists enough evidence to
conclude that the slope of the population regression line is not zero and,
hence, that height is useful as a predictor of weight for 18–24-year-old males.

x. Obtain a 90% confidence interval for the slope, β, of the population regression line that relates
weight to height for males age 18–24. Be sure to interpret your result.

sε sε
b − tα ,df = n− 2
⋅ to b + tα ,df =n −2
⋅
2 Sxx 2 Sxx
16.4771 16.4771
4.8363− 1.83⋅ to 4.8363 + 1.83⋅
81.6364 81.6364
[1.4990 pounds to 8.1736 pounds]
We can be 90% confident that, for 18–24-year-old males, the increase in mean weight
per one inch increase in height is somewhere between 1.4990 pounds and 8.1736
pounds.
y. Do the data provide sufficient evidence to conclude that the variables height and weight are
positively linearly correlated for 18–24-year-old males? Perform the required hypothesis test at
the 5% significance level.

Step 1: Hypotheses
H0 : ρ = 0 (Height and weight are not linearly correlated.)
Ha : ρ > 0 (Height and weight are positively linearly correlated.)
Step 2: Significance Level
α = 0.05
Step 3: Critical Value(s) and Rejection Region(s)
tα ,df = n− 2 = t0.05,df =9 = 1.83
Reject the null hypothesis if T ≥ 1.83 (p-value ≤ 0.05).
Step 4: Test Statistic
r 0.662319
T= = = 2.6520 (0.01 < p-value < 0.025)
1− (0.662319)
2 2
1− r
n−2 9
Step 5: Conclusion
Since 2.6520 ≥ 1.83 (0.01 < p-value < 0.025 ≤ 0.05), we shall reject the null
hypothesis.
Step 6: State conclusion in words
At the α = 0.05 level of significance, there exists enough evidence to
conclude that height and weight are positively linearly correlated for 18–24-
year-old males.

Business Analytics - The Science of Data Driven Decision Making PDF
22% (9)
Business Analytics - The Science of Data Driven Decision Making PDF
3 pages
Ms Data Science S, 24 (WEEK# 4)
No ratings yet
Ms Data Science S, 24 (WEEK# 4)
23 pages
Squares and Square Roots Bingo
100% (1)
Squares and Square Roots Bingo
34 pages
Data Science Bootcamp - 16!05!2022
No ratings yet
Data Science Bootcamp - 16!05!2022
13 pages
Divisibility Rules From 1 To 13 - Division Rules in Maths
No ratings yet
Divisibility Rules From 1 To 13 - Division Rules in Maths
1 page
Math - Complex Numbers Bingo With Answers
No ratings yet
Math - Complex Numbers Bingo With Answers
4 pages
Expanded Notation Decimals PDF
No ratings yet
Expanded Notation Decimals PDF
2 pages
Linear Functions Slides
100% (1)
Linear Functions Slides
77 pages
Quiz Module 2 Probability and Probability Distributions PDF
0% (1)
Quiz Module 2 Probability and Probability Distributions PDF
16 pages
Apple's Global Supply Chain
75% (4)
Apple's Global Supply Chain
22 pages
Calculate The Mean, Variance, and Standard Deviation of The Following Distribution of Scores
No ratings yet
Calculate The Mean, Variance, and Standard Deviation of The Following Distribution of Scores
9 pages
MTH 133 Trigonometry
0% (1)
MTH 133 Trigonometry
131 pages
Gauss Jordan Method
No ratings yet
Gauss Jordan Method
4 pages
Countingprinciple and Tree Diagrams
No ratings yet
Countingprinciple and Tree Diagrams
29 pages
Lec7 Math230 02012011
No ratings yet
Lec7 Math230 02012011
2 pages
Break Even Analysis, Systems of Linear Equations
No ratings yet
Break Even Analysis, Systems of Linear Equations
29 pages
Module On Mathematics PDF
No ratings yet
Module On Mathematics PDF
6 pages
Lesson 2-6 Ratios and Proportions - Demo
No ratings yet
Lesson 2-6 Ratios and Proportions - Demo
45 pages
Distributive Law
No ratings yet
Distributive Law
5 pages
Simplex Method Steps
No ratings yet
Simplex Method Steps
35 pages
Intro To Quadratic Functions
No ratings yet
Intro To Quadratic Functions
47 pages
Chapter One 1. Overview of Basic of Probability Theory 1.1
No ratings yet
Chapter One 1. Overview of Basic of Probability Theory 1.1
11 pages
001 Introduction Integral Calculus
No ratings yet
001 Introduction Integral Calculus
36 pages
Number System
No ratings yet
Number System
50 pages
Pemdas Poster Rubric
No ratings yet
Pemdas Poster Rubric
1 page
Consumer Price Index
100% (1)
Consumer Price Index
14 pages
Math 108 Course Syllabus
No ratings yet
Math 108 Course Syllabus
6 pages
Chapter 4 - Linear Transformations
No ratings yet
Chapter 4 - Linear Transformations
24 pages
Numerical Methods: System of Linear Equations
No ratings yet
Numerical Methods: System of Linear Equations
63 pages
Sequence and Series
No ratings yet
Sequence and Series
4 pages
Equivalence Relations
100% (1)
Equivalence Relations
14 pages
Creating A Bar Graph in Excel Lesson Plan
No ratings yet
Creating A Bar Graph in Excel Lesson Plan
4 pages
Presentation Topics For Statistics 2024
No ratings yet
Presentation Topics For Statistics 2024
1 page
Final Cyclic Groups
No ratings yet
Final Cyclic Groups
8 pages
Mathematical Induction
No ratings yet
Mathematical Induction
11 pages
1.1 Solving Problems by Inductive Reasoning.: Definitions
No ratings yet
1.1 Solving Problems by Inductive Reasoning.: Definitions
13 pages
Linear Algebra (Echelon Form of A Matrix)
No ratings yet
Linear Algebra (Echelon Form of A Matrix)
11 pages
.3 Integration - Method of Substitution
No ratings yet
.3 Integration - Method of Substitution
16 pages
11
No ratings yet
11
3 pages
Syllabus Statistical Analysis For Industrial Engineering 1
No ratings yet
Syllabus Statistical Analysis For Industrial Engineering 1
5 pages
Economics
No ratings yet
Economics
22 pages
Sinking Fund Schedule
100% (1)
Sinking Fund Schedule
8 pages
05-4 Wilson's Theorem PDF
No ratings yet
05-4 Wilson's Theorem PDF
18 pages
5.03 Fourier-Legendre Series
No ratings yet
5.03 Fourier-Legendre Series
12 pages
TG TG 9780195979688 2
No ratings yet
TG TG 9780195979688 2
149 pages
Frequency DIstribution TAble (Grouped Data)
No ratings yet
Frequency DIstribution TAble (Grouped Data)
4 pages
Trigonometric Functions
No ratings yet
Trigonometric Functions
41 pages
Business Math Exam Sample
No ratings yet
Business Math Exam Sample
5 pages
Unit 3 Test Radicals and Variation PT 1 Revised
No ratings yet
Unit 3 Test Radicals and Variation PT 1 Revised
3 pages
35 Algorithm Types
No ratings yet
35 Algorithm Types
22 pages
Actuarial Mathematics II - COURSE SYLLABUS
No ratings yet
Actuarial Mathematics II - COURSE SYLLABUS
2 pages
Applications of Matrices: Shivdeep Kaur
No ratings yet
Applications of Matrices: Shivdeep Kaur
5 pages
Lecture Notes #4 Correlation
No ratings yet
Lecture Notes #4 Correlation
8 pages
Module 1 Lesson 2 - Equivalent Systems Elementary Row Operations
No ratings yet
Module 1 Lesson 2 - Equivalent Systems Elementary Row Operations
8 pages
2 - Module 1 - Descriptive Statistics - Frequency Tables, Measure of Central Tendency & Measures of Dispersion
No ratings yet
2 - Module 1 - Descriptive Statistics - Frequency Tables, Measure of Central Tendency & Measures of Dispersion
21 pages
Q4 L5 G7 Solving Simple Equations Using Bar Model
No ratings yet
Q4 L5 G7 Solving Simple Equations Using Bar Model
5 pages
3 Grade SOL Practice Test (40 Questions) 3.1
100% (1)
3 Grade SOL Practice Test (40 Questions) 3.1
21 pages
STAB27
No ratings yet
STAB27
51 pages
Correlation and Regression Skill Set
No ratings yet
Correlation and Regression Skill Set
8 pages
Statistic For Agriculture Studies: The Assumptions of Regression
No ratings yet
Statistic For Agriculture Studies: The Assumptions of Regression
6 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
21 pages
STA215 STA220 Practice Test
No ratings yet
STA215 STA220 Practice Test
13 pages
ISE 500 Fall 2018 Assignment 7: Regression Plot
No ratings yet
ISE 500 Fall 2018 Assignment 7: Regression Plot
26 pages
Multiplication Tables and Flashcards: Times Tables for Children
From Everand
Multiplication Tables and Flashcards: Times Tables for Children
Jack Goldstein
4/5 (1)
Ms Data Science S, 24 (WEEK# 3) - Unlock
No ratings yet
Ms Data Science S, 24 (WEEK# 3) - Unlock
27 pages
Ms Data Science S, 24 (WEEK# 2)
No ratings yet
Ms Data Science S, 24 (WEEK# 2)
19 pages
Mscs 1
No ratings yet
Mscs 1
2 pages
01 Introduction
No ratings yet
01 Introduction
68 pages
A Survey of Open Source Data Science Tools: International Journal of Intelligent Computing and Cybernetics June 2015
No ratings yet
A Survey of Open Source Data Science Tools: International Journal of Intelligent Computing and Cybernetics June 2015
32 pages
MSC DS - Sample Paper
No ratings yet
MSC DS - Sample Paper
13 pages
Healthcare Lean Six Sigma GB Agenda
No ratings yet
Healthcare Lean Six Sigma GB Agenda
1 page
QNT 351 Week 3 Connect Problem Set
No ratings yet
QNT 351 Week 3 Connect Problem Set
5 pages
Market Research Format Sample
No ratings yet
Market Research Format Sample
4 pages
Data Analytics Full Time Bootcamp PDF
100% (1)
Data Analytics Full Time Bootcamp PDF
11 pages
LBYACST [Lecture Notes] (2)
No ratings yet
LBYACST [Lecture Notes] (2)
7 pages
Class Time Table - III - Bca & Ai&Ds Even 23-24
No ratings yet
Class Time Table - III - Bca & Ai&Ds Even 23-24
5 pages
Chapter 1: Simple Regression Analysis
No ratings yet
Chapter 1: Simple Regression Analysis
12 pages
Chapter One: 1.0 Background of The Study
No ratings yet
Chapter One: 1.0 Background of The Study
64 pages
Data Practices
No ratings yet
Data Practices
48 pages
Voulgaris - Data Scientist (AVG) (2014)
No ratings yet
Voulgaris - Data Scientist (AVG) (2014)
297 pages
MUF0142 Sample Exam Questions 1
No ratings yet
MUF0142 Sample Exam Questions 1
18 pages
Final Project Ongc
No ratings yet
Final Project Ongc
57 pages
Report - Responses
No ratings yet
Report - Responses
114 pages
Forecasting 3 Exponential Smoothing Method (1) 1
No ratings yet
Forecasting 3 Exponential Smoothing Method (1) 1
18 pages
Food Research International
No ratings yet
Food Research International
9 pages
Influence of Storage Conditions On The Quality Properties PDF
No ratings yet
Influence of Storage Conditions On The Quality Properties PDF
8 pages
Summer Internship Report
No ratings yet
Summer Internship Report
35 pages
A Report On Deposit in RBB Bank Nepal
0% (1)
A Report On Deposit in RBB Bank Nepal
45 pages
Lecture 10 - ANOVA
No ratings yet
Lecture 10 - ANOVA
27 pages
Traffic Management System Project
No ratings yet
Traffic Management System Project
44 pages
COVID-19 Clinical Trials EDA Pandas
No ratings yet
COVID-19 Clinical Trials EDA Pandas
30 pages
ECON 550: Econometrics Exercise 5 - Panel Estimation: V - Shall. Estimate The Following Three Specifications
No ratings yet
ECON 550: Econometrics Exercise 5 - Panel Estimation: V - Shall. Estimate The Following Three Specifications
2 pages
Divyam Mishra
No ratings yet
Divyam Mishra
3 pages
Laptop Price Pred
No ratings yet
Laptop Price Pred
11 pages
Chapter 3 - Demand Forecasting: # Het Begint Met Een Idee
No ratings yet
Chapter 3 - Demand Forecasting: # Het Begint Met Een Idee
32 pages
Top 100 Interview Questions On Machine Learning
100% (1)
Top 100 Interview Questions On Machine Learning
155 pages
Introduction to Business Analytics Internal Assignment 2 (Part II)
No ratings yet
Introduction to Business Analytics Internal Assignment 2 (Part II)
5 pages