Assignment 8

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Assignment 8

Introduction to Machine Learning


Prof. B. Ravindran
1. (2 marks) The figure below shows a Bayesian Network with 9 variables, all of which are binary.

Which of the following is/are always true for the above Bayesian Network?
(a) P (A, B|G) = P (A|G)P (B|G)
(b) P (A, I) = P (A)P (I)
(c) P (B, H|E, G) = P (B|E, G)P (H|E, G)
(d) P (C|B, F ) = P (C|F )
Sol.(b)
Refer to the lecture
2. (2 marks) Consider the following data for 20 budget phones, 30 mid-range phones, and 20
high-end phones:

Type Single SIM 5G Comaptability NFC Total


Budget 15 5 0 20
Mid-Range 20 20 15 30
High End 15 15 15 20

1
Consider a phone with 2 SIM card slots and NFC but no 5G compatibility. Calculate the
probabilities of this phone being a budget phone, a mid-range phone, and a high-end phone
using the Naive Bayes method. The correct ordering of the phone type from the highest to
the lowest probability is?

(a) Budget, Mid-Range, High End


(b) Budget, High End, Mid-Range
(c) Mid-Range, High End, Budget
(d) High End, Mid-Range, Budget
Sol. (c)
P (Class |x1 , x2 , x3 ) ∼ P (Class) ∗ P (x1 | Class ) ∗ P (x2 | Class) ∗ P ( x3 | Class)
P( Budget | !SSIM, !5G, NFC) ∼ 20/70 ∗ 5/20 ∗ 15/20 ∗ 0/20 = 0
P( Mid-range | !SSIM, !5G, NFC) ∼ 30/70 ∗ 10/30 ∗ 10/30 ∗ 15/30 = 0.0238
P( High-end | !SSIM, !5G, NFC) ∼ 20/70 ∗ 5/20 ∗ 5/20 ∗ 15/20 = 0.0134
3. (2 marks) Consider the following dataset where outlook, temperature, humidity, and wind are
independent features, and play is the dependent feature.

Find the probability that the student will not play given that x = (Outlook=sunny, Tempera-
ture=66, Humidity=90, Windy=True) using the Naive Bayes method. (Assume the continuous
features are represented as Gaussian distributions).

(a) 0.0001367
(b) 0.0000358
(c) 0.0000236
(d) 1

2
Sol. (a)

P(x|no) ∗ P(no) = P(sunny|no) ∗ P(Temperature=66/no) ∗ P(Humidity=90|no) ∗P(True|no)


∗ P(no)
P(x|no) ∗ P(no) = 3/5 ∗ 0.0279 ∗ 0.0381 ∗ 3/5 ∗ 5/14 P(no|x) = 0.0001367
4. Which among Gradient Boosting and AdaBoost is less susceptible to outliers considering their
respective loss functions?
(a) AdaBoost
(b) Gradient Boost
(c) On average, both are equally susceptible.
Sol. (b)
Gradient Boosting (discussed in the lecture) uses a least squares loss function, while AdaBoost
uses an exponential loss function. AdaBoost penalizes outliers to an exponential amount,
whereas Gradient Boost penalizes them to a lesser extent and, thus, cares less about them.
5. How do you prevent overfitting in random forest models?
(a) Increasing Tree Depth.
(b) Increasing the number of variables sampled at each split.
(c) Increasing the number of trees.
(d) All of the above.
Sol. (c)
Refer to the lecture.
6. A dataset with two classes is plotted below.

Does the data satisfy the Naive Bayes assumption?

3
(a) Yes
(b) No
(c) The given data is insufficient
(d) None of these
Sol. (a)
From the plot, we can infer that the features of the data are independent given the class.
7. Ensembling in random forest classifier helps in achieving:
(a) reduction of bias error
(b) reduction of variance error
(c) reduction of data dimension
(d) none of the above
Sol. (b)
Refer to the lecture

You might also like