DAI 101 Tutorial 5
Linear , Lasso, Ridge Regression, Bias and Variance
Q.1 A set of observations of independent variable (x) and the corresponding dependent
variable (y) is given below.
x 5 2 4 3
y 16 10 13 12
Based on the data, the coefficient a of the linear regression model y = a + bx is estimated as
6.1 the coefficient b is___________________________.
Ans b=1.9
Q.2 For a bivariate data set on (x, y), if the means, standard deviations and correlation
coefficient are overline x = 1 , overline y = 2, Sx = 3, Sy = 9, r = 0.8 Then the regression line
of y on x is:
1. y = 1 + 2.4(x - 1)
2. y = 2 + 0.27(x - 1)
3. y = 2 + 2.4(x - 1)
4. y = 1 + 0.27(x - 2)
Ans. 3. y = 2 + 2.4(x - 1)
Q.3 In the regression model (y = a + bx) where x = 2.50, y = 5.50 and a = 1.50 (x and y
denote mean of variables x and y and a is a constant), which one of the following values of
parameter 'b' of the model is correct?
1. 1.75
2. 1.60
3. 2.00
4. 2.50
Ans. b. 1.60
Q.4 Dimension reduction methods have the goal of using the correlation structure among the
predictor variables to accomplish which of the following:
A. To reduce the number of predictor components
B. To help ensure that these components are dependent
C. To provide a framework for interpretability of the results
D. To help ensure that these components are independent
E. To increase the number of predictor components
A Choose the correct answer from the options given below.
1. A, B, D and E only
2. A, C and D only
3 A, B, C and E only
4. B, C, D and E only
Q.5 If a constant 60 is subtracted from each of the values of X and Y, then the regression
coefficient is
1. reduced by 60
2. increased by 60
3. 1/60 th of the original regression coefficient
4. not changed
Q.6 What are the assumptions of linear regression?
1. a) Output and input should be linear, dependent and parameters should be linear
2. b) Multiple input variables should be non- linear, if multiple i/p variables linear that
would be called co-linearity
3. c) Error should be no-relation (or) error should be independent, if any relation b/w
errors that is called Auto-Correlation
4. d) All the above
Q.7 Which of the following of the coefficients is added as the penalty term to the loss
function in Lasso regression?
a) Squared magnitude
b) Absolute value of magnitude
c) Number of non-zero entries
d) None of the above
Answer - b) Absolute value of magnitude
Q.8 What is the primary goal of Ridge Regression?
a) To eliminate outliers
b) To minimize the coefficient values of the predictors
c) To maximize the model's accuracy on the training data
d) To prevent overfitting by adding a penalty to the size of the coefficients
Q.9 What does the penalty in Ridge Regression target?
a) The error terms.
b) The intercept term
c) The coefficient values of the predictors
d) The number of predictors
Q.10 Ridge Regression is especially useful when:
a) There is a low number of predictors.
b) Predictors are highly correlated.
c) There is a linear relationship between predictors and the target variable.
d) All predictors are independent.
Q.11 What happens as the penalty term a increases in Ridge Regression?
a) The model complexity increases.
b) The coefficients become exactly zero.
c) The variance increases, and the bias decreases.
d) The coefficients are shrunk towards-zero.
Q.12 Which of the following is a limitation of Ridge Regression?
a) It cannot handle large datasets.
b) It assumes a linear relationship between predictors and the target variable.
c) It can only be used for classification problems,
d) It eliminates all predictors from the model\
Q.13 How does Ridge Regression handle multicollinearity among predictors?
a) By increasing the model's variance
b) By removing correlated predictors from the model
c) By penalizing the size of the coefficients, thus reducing the impact of
multicollinearity
d) By converting the multicollinear predictors into independent predictors
Q.14 Why might Ridge Regression not completely eliminate a predictor from a model?
a) Because it only adjusts the intercept
b) Because it focuses on minimizing the error term
c) Because the penalty term only shrinks coefficients towards zero without setting
them to exactly zero
d) Because it only works with categorical variables
Q.15 In Ridge Regression, if the penalty term A is set too high, what might be a
consequence?
a) The model may become too complex.
b) The model may become too biased, underfitting the data
c) The coefficients will all become exactly zero.
d) The model will perfectly fit the training data.
Q.16 What is a common problem in Linear Regression that Ridge Regression specifically
addresses?
A. Lack of fit
B. Overfitting
C. Underfitting
D. Irreversible transformations
Q.17 What are the limitations of Lasso Regression? (Select two)
(A) If the number of features (p) is greater than the number of observations (n), Lasso
will pick at most n features as non-zero, even if all features are relevant
(B) If there are two or more highly collinear feature variables, then LASSO regression
selects one of them randomly which is not good for the interpretation of the data
(C) Lasso can be used to select important features of a dataset
(D) The difference between ridge and lasso regression is that lasso tends to make coefficients
to absolute zero as compared to Ridge which never sets the value of the coefficient to
absolute zero
Q.18 : What’s the penalty term for the Ridge regression?
(A) the sum of the square of the magnitude of the coefficients
(B) the sum of the square root of the magnitude of the coefficients
(C) the absolute sum of the coefficients
(D) the sum of the coefficients
Q.19 : What’s the penalty term for the Lasso regression?
(A) the square of the magnitude of the coefficients
(B) the square root of the magnitude of the coefficients
(C ) the absolute sum of the coefficients
(D) the sum of the coefficients
Q.20: For Ridge Regression, if the regularization parameter = 0, what does it mean?
(A) Large coefficients are not penalized
(B) Overfitting problems are not accounted for
(C) The loss function is as same as the ordinary least square loss function
(D) All of the above
Q 21: For Ridge Regression, if the regularization parameter is very high, which options
are true? (Select two)
(A) Large coefficients are significantly penalized
(B) Can lead to a model that is too simple and ends up underfitting the data
(C) Large coefficients are not penalized
(D) Can lead to a model that is too simple and ends up overfitting the data
Q 22: For Lasso Regression, if the regularization parameter = 0, what does it mean?
(A) The loss function is as same as the ordinary least square loss function
(B) Can be used to select important features of a dataset
(C ) Shrinks the coefficients of less important features to exactly 0
(D) All of the above
Q 23: For Lasso Regression, if the regularization parameter is very high, which options
are true? (Select two)
(A) Can be used to select important features of a dataset
(B) Shrinks the coefficients of less important features to exactly 0
(C) The loss function is as same as the ordinary least square loss function
(D) The loss function is as same as the Ridge Regression loss function
Q.24 If the regression equation is equal to y=23.6−54.2x, then 23.6 is the _____ while -54.2
isthe ____ of the regression line.
a) Slope, intercept
b) Slope, regression coefficient
c) Intercept, slope
d) Radius, intercept
Q.25 Which of the following correctly represents the relationship between total error, bias,
and variance?
a) Total Error = Bias² + Variance + Irreducible Error
b) Total Error = Bias + Variance
c) Total Error = √(Bias² + Variance)
d) Total Error = Bias × Variance
Answer: a) Total Error = Bias² + Variance + Irreducible Error
REASON: This is the fundamental bias-variance decomposition formula. The bias term is
squared to account for both positive and negative biases contributing positively to total error.
Irreducible error represents inherent noise in the data that cannot be eliminated by any model.
Q.26 For a given dataset with 1000 samples, a model shows bias of 0.4 and variance of 0.3.
Calculate the total error (assuming irreducible error is 0.1):
a) 0.56
b) 0.7
c) 0.8
d) 0.56
Answer: a) 0.56
REASON: Using the bias-variance decomposition formula: Total Error = Bias² + Variance +
Irreducible Error. Calculation: Total Error = (0.4)² + 0.3 + 0.1 = 0.16 + 0.3 + 0.1 = 0.56.
Q.27 A model exhibits high bias and low variance. This situation most likely indicates:
a) Overfitting
b) Underfitting
c) Perfect fitting
d) Random fitting
Answer: b) Underfitting
REASON: High bias means the model oversimplifies the problem and fails to capture the
underlying pattern in the data, while low variance indicates stable performance across
different datasets. This combination typically occurs with underfit models that are too simple
for the data's complexity.
Q.28 Given a polynomial regression model with bias = 0.25 and variance = 0.36, what
happens to these values if we increase the polynomial degree?
a) Both bias and variance increase
b) Bias decreases, variance increases
c) Both bias and variance decrease
d) Bias increases, variance decreases
Answer: b) Bias decreases, variance increases
REASON: Increasing the polynomial degree allows the model to fit the training data more
closely, reducing bias. However, greater complexity also makes the model more sensitive to
fluctuations in training data, leading to increased variance, illustrating the bias-variance
tradeoff.
Q. 29 If a model has MSE = 0.64 and variance = 0.36, assuming irreducible error is 0.04,
what is the bias²?
a) 0.24
b) 0.28
c) 0.32
d) 0.36
Answer: a) 0.24
REASON: Using the formula MSE = Bias² + Variance + Irreducible Error, we have 0.64 =
Bias² + 0.36 + 0.04. Solving for Bias² gives Bias² = 0.64 - 0.36 - 0.04 = 0.24.
Q. 30 In k-Nearest Neighbors (k-NN), as k increases:
a) Bias increases, variance decreases
b) Bias decreases, variance increases
c) Both bias and variance increase
d) Both bias and variance decrease
Answer: a) Bias increases, variance decreases
REASON: As k increases, the model averages over more neighbors, leading to smoother
decision boundaries. This results in higher bias (as it may fail to capture finer patterns) while
lowering variance, as the model becomes less sensitive to individual data points.
Q.31 Calculate the variance of a model if its total error is 0.81, bias² is 0.36, and irreducible
error is 0.05:
a) 0.4
b) 0.45
c) 0.5
d) 0.55
Answer: a) 0.4
REASON: Applying the bias-variance decomposition formula: Total Error = Bias² + Variance
+ Irreducible Error. We compute it as follows: 0.81 = 0.36 + Variance + 0.05 ⇒ Variance =
0.81 - 0.36 - 0.05 = 0.4.
Q.32 A model shows bias of 0.3 and variance of 0.25. If we double the training data size,
what's the most likely effect?
a) Bias remains same, variance reduces
b) Bias reduces, variance remains same
c) Both remain same
d) Both reduce by half
Answer: a) Bias remains same, variance reduces
REASON: Bias is tied to model complexity and architecture, which is not changed by data
size. In contrast, increasing the size of training data typically helps reduce variance, making
the model's predictions more stable and less sensitive to the specifics of the training set.
Q.33 For a neural network with 100 neurons, if we increase the neurons to 200, typically:
a) Bias increases, variance decreases
b) Bias decreases, variance increases
c) Both remain unchanged
d) Both decrease proportionally
Answer: b) Bias decreases, variance increases
REASON: Increasing the number of neurons enhances the model's capacity to learn complex
patterns, leading to lower bias as it can fit training data better. However, with increased
capacity, the model becomes more sensitive to fluctuations in the training data, hence
increasing variance.