Final 2015 W

Machine Learning 1 — WS2015/2016 — Module IN2064 Final Exam · Page 1
Machine Learning 1 — Final Exam
1 Preliminaries
• Please write your immatriculation number but not your name on every page you hand in.
• The exam is closed book. You may, however, use one A4 sheet of notes, handwritten before the
exam.
• The exam is limited to 2 × 60 minutes.
• If a question says “Describe in 2–3 sentences” or “Show your work” or “Explain your answer” or
so, these mean the same: give a succinct description or explanation.
• This exam consists of 4 pages, 12 problems. You can earn up to 24 points.
√
Problem 0 [ −4 points] We suffer from gender bias. It’s not clear which gender we favour in scoring,
but it certainly—so science tells us—influences our grading skills. Help yourself by not writing your
name on your sheets. Just fill in your immatriculation number on every sheet you hand in. Make sure
it is easily readable.
2 Linear Algebra and Probability Theory
For the next two exercises, let X and Y be two random variables with the joint cumulative distribution
function (cdf)
(
1 − e−x − e−y + e−x−y x, y ≥ 0
FX,Y = .
0 else
Problem 1 [3 points] Determine the marginal probability density functions (pdfs) fX and fY .
Identify the marginal distributions.
Problem 2 [2 points] Are X and Y independent? Prove your claim.
imat:
Final Exam · Page 2 Machine Learning 1 — WS2015/2016 — Module IN2064
3 kNN
Problem 3 [1 point] You are testing three new sensors. You p would
Pn like to classify them using the
nearest neighbour approach with Euclidean distance (d(p, q) = (q − p ) 2 ).
i=1 i i
Sensor Output Nearest Neighbour (current) Nearest Neighbour (expected)

sensor1 [1, 150] sensor2 sensor3
sensor2 [2, 110] sensor3 sensor1 or sensor3
sensor3 [1, 100] sensor2 sensor1
Your sensors output two parameters each. This output is in the above table. Since this is a controlled
test of the sensors, you know that sensor1 and sensor3 belong to the same group and that sensor2
belongs to a different group. We can see that sensor2 is causing trouble for the classification of the
other sensors. We would expect the output to look like the fourth column in the above table. How
can you fix this problem without changing your distance measure from Euclidean?
Problem 4 [1 point] In general, especially with data which might have many uninformative param-
eters, why can the use of Euclidean distance as the distance measure be problematic? (Even after the
problem from above has been fixed?)
4 Gaussian Processes
We have a data set x ∈ R1 . You are given Gaussian processes f ∼ GP with mean function m(x) = 0,
covariance function k(x, x0 ), and a noisy observation ∼ N (0, σy2 ).
Problem 5 [2 points] Assume the covariance function is k(x, x0 ) = (xx0 + 1)2 , and the observation
data x1 = − 21 , x2 = 2 is given, write down the distribution of p(f (x1 ), f (x2 )). What is the relationship
of f (x1 ) and f (x2 )? Describe your reasoning.
Problem 6 [2 points] We have a squared exponential (SE) kernel. With different values of σy2 , the
GP models are shown in Fig. 1. Which model is best? What causes the other two to be not good?
Explain your answer.
3 3 3
confidence interval confidence interval confidence interval
2 prediction 2 prediction 2 prediction
data data data
1 1 1
0 0 0
y
-1 -1 -1
-2 -2 -2
-3 -3 -3
-4 -4 -4
-5 0 5 -5 0 5 -5 0 5
x x x
A B C
Figure 1: Gaussian Processes
imat:
Machine Learning 1 — WS2015/2016 — Module IN2064 Final Exam · Page 3
5 Neural networks
Problem 7 [3 points] A neural network with activation functions tanh(·) in the hidden units is
initialised with all parameters (weights, biases) set to 0. Can it learn? Explain your answer.
Problem 8 [1 point] A neural network with activation functions tanh(·) in the hidden units is
initialised with all parameters (weights, biases) set to 1. Can it learn? Explain your answer.
6 Unsupervised learning
1.5 1.5 4
2 3
1.0 1.0
2
1 0.5 0.5
1
0 0.0 0.0 0
1
1 0.5 0.5
2
1.0 1.0
2 3
1.5 1.5 4
1 0 1 2 3 4 5 6 7 2 1 0 1 2 2 1 0 1 2 6 4 2 0 2 4 6
A B C D
Figure 2: Four datasets
Problem 9 [3 points] PCA was performed on dataset A, B and C (Fig.: 2). The three results are:
Result 1 Result 2 Result 3
normalised eigenvalues = [0.5,
0.5] [0.95,
0.05]
[0.99,
0.01]
0.78 0.63 0.96 0.27 0.71 0.71
normalised eigenvectors = , , ,
0.63 −0.78 −0.27 0.96 0.71 −0.71
Unfortunately the results got mixed up. Which result corresponds to which dataset (A, B, C)? Explain
your answer.
Problem 10 [1 point] What would the result on dataset D look like (Fig.: 2)? Use the same notation
as in Problem 9.
imat:
Final Exam · Page 4 Machine Learning 1 — WS2015/2016 — Module IN2064
7 Kernels
Consider the following algorithm.
Algorithm 1: Counting something

input : Character string x of length m (one based indexing)
input : Character string y of length n (one based indexing)
output: A number s ∈ R
s ← 0;
for i ← 1 to m do
for j ← 1 to n do
if x[i] == x[j] then
s ← s + 1;
Problem 11 [1 point] Explain, in no more than two sentences, what the above algorithm is doing.
Problem 12 [4 points] Let S denote the set of strings over a finite alphabet of size v. Define a
function K : S × S → R as the output of running algorithm 1 on a pair of strings x, y. Show that
K(x, y) is a valid kernel.
imat:

Final 2015 W

Uploaded by

Copyright:

Available Formats

Final 2015 W

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final 2015 W

Uploaded by

Copyright:

Available Formats

Machine Learning 1 — WS2015/2016 — Module IN2064 Final Exam · Page 1

Machine Learning 1 — Final Exam

2 Linear Algebra and Probability Theory

Problem 2 [2 points] Are X and Y independent? Prove your claim.

Sensor Output Nearest Neighbour (current) Nearest Neighbour (expected)

Figure 2: Four datasets

Consider the following algorithm.

Algorithm 1: Counting something

You might also like