ML Papers

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Course Code 20CS2032 Duration 3hrs

Max.
Course Name MACHINE LEARNING TECHNIQUES 100
Marks

Q.
Questions CO BL Marks
No.
PART – A (10 X 1 = 10 MARKS)
(Answer all the questions)
Identify the type of machine learning approach involved in the following
problem statement:
1. CO1 U 1
“To study a bank credit dataset and make a decision about whether to
approve the loan of an applicant based on his profile”.
2. Define reinforcement learning. CO1 R 1
Assume the mean and standard deviation as 13.25 and 4.6 respectively.
3. Estimate the normalization value of 20 using Z-Score normalization CO2 U 1
technique.
List at least 2 benefits of performing feature selection of the given dataset
4. CO2 R 1
before using it in machine learning algorithm.
5. Consider 3 input neurons with weight link values of (0.3, 0.2, 0.3) and input
vector of (2, 1, 1). If the following thresholding activation function is used,
calculate the output of a single-layer perceptron. CO3 A 1
Output: 1 if Net Input > 0.5
: 0 Otherwise
6. Summarize the techniques to handle imbalanced dataset. CO3 U 1
7. Differentiate soft and hard clustering. CO4 U 1
Assume two cluster centroids C1:(9, 2) and C2:(3,4) in K-Means clustering
8. algorithm. Determine the cluster number to which the new data point (4, 3) CO4 A 1
belongs to.
9. Distinguish between classification and regression tree algorithm. CO5 U 1
10. List the performance evaluation metrics of regression task. CO6 R 1
PART – B (6 X 3 = 18 MARKS)
(Answer all the questions)
11. Apply the find-S algorithm to the following dataset and derive the most
specific hypothesis that fits all the positive instances.
Color Toughness Fungus Appearance Poisonous

Green Hard No Wrinkled Yes

Green Hard Yes Smooth No CO1 A 3

Brown Soft No Wrinkled No

Orange Hard No Wrinkled Yes

Green Soft Yes Smooth Yes


Green Hard Yes Wrinkled Yes

Orange Hard No Wrinkled Yes


Apply wavelet transform and convert the following dataset to wavelet
12. coefficients. CO2 A 3
S = [2, 2, 0, 2, 3, 5, 4, 4]
13. Illustrate the multiclass classification approaches with suitable example. CO3 U 3
14. Analyze the given figure, classify and describe the type of data points p, q
and r for the given value of minpts = 4.

r CO4 An 3

15. Consider the given dataset and calculate the weighted average for the
‘Outlook’ attributes using the decision tree algorithm.
Outlook Temperature Windy Play Golf
Sunny Hot FALSE No
Overcast Hot FALSE Yes CO5 An 3
Sunny Mild TRUE No
Sunny Cool TRUE Yes
Overcast Hot TRUE Yes
Discuss the importance of grid search optimization technique in tuning the
16. CO6 U 3
hyperparameters of machine learning models.
PART – C (6 X 12 = 72 MARKS)
(Answer any five Questions from Q. No. 17 to 23, Q. No. 24 is Compulsory)
17. a. Analyze the given dataset and derive specific and general hypotheses
using Candidate Elimination Algorithm.
S.No Citation Size Library Price Edition Buy
1 Some small No Affordable One No CO1 An 6
2 Many Big No Expensive Many Yes
3 Many Medium No Expensive Few Yes
4 Many small No Affordable Many Yes
b. Discuss the concept of geometric and logical models of machine
CO1 R 6
learning techniques with proper examples.

18. a. A professor interviewed 7 students to find out the number of hours they
spent studying for the final exam. Their responses are: 5, 7, 8, 9, 9, 11, CO2 An 6
and 13. Calculate the mean, standard deviation, and IQR.
b. In real-world data, tuples with missing and noisy values for some
attributes are a common occurrence. Describe the various methods for CO2 U 6
handling these problems.
19. Apply the Naive Bayes classifier on the given dataset and determine
the class output of the given test instance :
<sunny, cool, high, strong>
Outlook Temperature Humidity Wind Play Tennis
Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No CO3 A 12
Overcast Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Rain Mild High Strong No

20. a. Apply perceptron learning algorithm for the given OR gate and update
the weights for one iteration. Assume the initial weights as W1=0.4,
W2=0.1, threshold >= 0.3 and learning rate= 0.2.
X1 X2 Y
0 0 0 CO3 A 6
0 1 1
1 0 1
1 1 1
b. Calculate the TPR and FPR for the thresholds greater than 0.9, 0.8, 0.7,
0.6, 0.34 and plot ROC curve for the calculated values.
Tuple Class Probability
1 P 0.94
2 P 0.82
3 N 0.75 CO3 A 6
4 P 0.65
5 P 0.62
6 N 0.58
7 N 0.66
8 N 0.44

21. a. Analyze the given distance matrix and construct a dendrogram using a
single linkage agglomerative clustering algorithm.
A B C D E F CO4 An 6
A 0
B 0.23 0
C 0.22 0.14 0
D 0.37 0.19 0.13 0
E 0.34 0.14 0.28 0.23 0
F 0.24 0.24 0.1 0.22 0.39 0
b. Explain the working principle of Self Organizing Map algorithm. CO4 U 6

22. a. Cluster the following eight points ((x, y) representing locations) using
K-Means algorithm to find the three cluster centers after the first
iteration.
CO4 A 6
A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2),
A8(4, 9)
Initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2).
State why the ensemble methods may improve the accuracy of the
b. CO3 U 6
classifier? Explain the working principle of Bagging and Boosting.

23. Apply the decision tree algorithm on the following dataset and identify
the root node of the tree. (Note: Input attributes: Age, Sex, BP, and
Cholesterol, Output attribute: Drug)
Age Sex BP Cholesterol Drug
Young F High Normal Drug A
Young F High High Drug A
Middle-age F High Normal Drug B
Senior F Normal Normal Drug B
Senior M Low Normal Drug B
Senior M Low High Drug A CO5 A 12
Middle-age M Low High Drug B
Young F Normal Normal Drug A
Young M Low Normal Drug B
Senior M Normal Normal Drug B
Young M Normal High Drug B
Middle-age F Normal High Drug B
Middle-age N High Normal Drug B
Senior F Normal High Drug B
COMPULSORY QUESTION
24. a. Explain the performance metrics to evaluate the multiclass classifier in
detail. Calculate precision per class, recall per class, accuracy, and F1-
score for the given confusion matrix.
Predicted ---------------
> CO6 A 6
A B C D
A 9 1 0 0
Actual
B 1 15 3 1
C 5 0 24 1
D 0 4 1 15
b. Illustrate the various cross validation strategies with suitable examples. CO6 U 6
CO – COURSE OUTCOME BL – BLOOM’S LEVEL

COURSE OUTCOMES
CO1 Recall the concepts, mathematical background, applicability, limitations of existing machine
learning techniques.
CO2 Explain the simple feature engineering steps.
CO3 Apply suitable linear / nonlinear / probabilistic machine learning algorithms for a given task.
CO4 Demonstrate the working principle of distance based algorithms to handle unlabeled data.
CO5 Distinguish tree and rule based machine learning algorithms and appropriately apply to the suitable
application.
CO6 Evaluate the performance of machine learning models using suitable metrics.

Assessment Pattern as per Bloom’s Taxonomy


CO / P R U A An E C Total
CO1 7 1 3 6 - - 17
CO2 1 7 3 6 - - 17
CO3 - 10 25 - - - 34
CO4 - 7 7 9 - - 23
CO5 - 1 12 3 - - 16
CO6 1 9 6 - - - 16
124
Course Code 20CS2032 Duration 3hrs
Max.
Course Name MACHINE LEARNING TECHNIQUES 100
Marks

Q.
Questions CO BL Marks
No.
PART – A (10 X 1 = 10 MARKS)
(Answer all the questions)
1. Define Version Space. CO1 R 1
Differentiate between supervised and unsupervised learning with suitable
2. CO1 U 1
examples.
3. Write the formula for finding the Z-score of a number. CO2 A 1
4. Define univariate, bivariate and multivariate analysis. CO2 R 1
5. List the most popular methods of ensemble Learning. CO3 R 1
6. Recall the two types of Multiclass classification technique. CO3 R 1
7. State the reason why KNN algorithm is called a lazy learner algorithm. CO4 R 1
Analyze the type of clustering shown in the below representation.
8. CO4 A 1.

Name the criteria for selecting best attribute in decision tree that separates
9. CO5 R 1
the data into different classes most effectively.
10. Define grid search. CO6 R 1
PART – B (6 X 3 = 18 MARKS)
(Answer all the questions)
State the significant role of machine learning models in various
11. CO1 R 3
applications.
List the two methods of implementing the feature reduction and explain
12. CO2 R 3
information gain based feature selection.
13. State the purpose of SVM. CO3 R 3
14. Identify the different types of hierarchical clustering. CO4 R 3
15. Differentiate between classification and regression tree algorithm. CO5 U 3
16. Describe ridge regression. CO6 U 3
PART – C (6 X 12 = 72 MARKS)
(Answer any five Questions from Q. No. 17 to 23, Q. No. 24 is Compulsory)
Apply the Candidate Elimination Algorithm for the given dataset and
derive specific and general hypotheses.
17. a. CO1 A 6
S.No Citation Size Library Price Edition Buy
1 Some Small No Affordable One No
2 Many Big No Expensive Many Yes
3 Many Medium No Expensive Few Yes
4 Many Small No Affordable Many Yes
Discuss the geometric and probabilistic models with suitable
b. CO1 U 6
examples.

Describe the different feature selection methods and explain any four
18. a. CO2 U 6
methods.
Summarize the various methods of data preprocessing techniques with
b. CO2 U 6
examples.

Develop single layer perceptron for AND gate using the perceptron
19. a. training rule by assuming the weights 1.3, 0.7, threshold = 1 and CO3 A 6
learning rate n = 0.5 for one iteration.
Calculate regression coefficient, Ypred and Error value by analyzing
the data given below.
b Price(Rs) 10 12 13 12 16 15 CO3 A 6
Amount
40 38 43 45 37 43
Demanded

Analyze the given distance matrix and draw the dendrogram using
single linkage agglomerative clustering.

A B C D E F

A 0

B 0.23 0
20. a. CO4 An 6
C 0.22 0.14 0

D 0.37 0.19 0.13 0

E 0.34 0.14 0.28 0.23 0

F 0.24 0.24 0.10 0.22 0.39 0

Illustrate the clustering of the following eight points ((x, y)


representing locations) using K-Means algorithm to find the three
b. cluster centers after the first iteration. A1(2, 10), A2(2, 5), A3(8, 4), CO4 A 6
A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9) Initial cluster centers
are: A1(2, 10), A4(5, 8) and A7(1, 2)
Apply decision tree algorithm on the following dataset and identify the
root node of the tree. (Note: Input attributes: Outlook, Temperature,
Humidity and Windy, Output attribute: Play)
Day Weather Temperature Humidity Wind Play
1 Sunny Hot High Weak No
2 Cloudy Hot High Weak Yes
3 Sunny Mild Normal Strong Yes
21. a. CO5 A 12
4 Cloudy Mild High Strong Yes
5 Rainy Mild High Strong No
6 Rainy Cool Normal Strong No
7 Rainy Mild High Weak Yes
8 Sunny Hot High Strong No
9 Cloudy Hot Normal Weak Yes
10 Rainy Mild High Strong No

Apply chi-square test on 2x2 contingency table to examine the


relationship between gender and preferred reading for the significance
level of 0.001.
Female Male Total
Fiction 250 200 450
Non-fiction 50 1000 1050
22. a. Total 300 1200 1500 CO2 A 6
Use the statistical table for reference:
DF P Value
0.05 0.01 0.001
1 3.84 6.64 10.83
2 5.99 9.21 13.82
3 7.82 11.35 16.27
4 9.49 13.28 18.47
Calculate the sensitivity, precision, recall, F1-score and specificity for
the given confusion matrix.
Predicted Yes Predicted No
b. CO3 A 6
Actual Yes 95 5

Actual No 5 45

23. a. Explain Bagging algorithm of ensemble learning with diagram. CO3 U 6


Explain the different linkage methods in hierarchical clustering with
b. CO4 U 6
diagrams.
COMPULSORY QUESTION
24. a. Explain Gradient Descent approach with example. CO6 U 6
b. Explain leave one out and leave p-out cross validation techniques CO6 U 6
CO – COURSE OUTCOME BL – BLOOM’S LEVEL

COURSE OUTCOMES
CO1 recall the concepts, mathematical background, applicability, limitations of existing machine
learning techniques.
CO2 explain the simple feature engineering steps
CO3 apply suitable linear / nonlinear / probabilistic machine learning algorithms for a given task
CO4 demonstrate the working principle of distance based algorithms to handle unlabeled data
CO5 distinguish tree and rule based machine learning algorithms and appropriately apply to the suitable
application
CO6 evaluate the performance of machine learning models using suitable metrics

Assessment Pattern as per Bloom’s Taxonomy


CO / P R U A An E C Total
CO1 4 7 6 - - - 17
CO2 4 12 7 - - - 23
CO3 5 6 18 - - - 29
CO4 4 6 7 6 - - 23
CO5 1 3 12 - - - 16
CO6 1 15 - - - - 16
124

You might also like