HW1A

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

EE 6106: Online Learning and Optimisation

Homework 1A
Questions
1. Prove the following identities
ln x ≤ x − 1 for x > 0.
1 − ax > (1 − a)x for 0 ≤ a, x ≤ 1.
− ln(1 − x) ≤ 1 + 2x for 0 < x < 0.5.

2. For T rounds, you have to predict a Boolean variable. In each round, after you make
your prediction, the ‘true’ value is revealed. We are interested in the best strategy
defined as the one that minimizes the number of mistakes in the worst case. Clearly,
in the worst case, a deterministic strategy can have T errors. What is the optimal
randomised strategy? This suggests that in the ‘prediction with experts’ setting,
counting the total number of mistakes is not a meaningful benchmark. Why?
3. Let Xt be a binary 0/1 sequence. There are N experts and expert i predicts Yi,t for
Xt . Recall the MAJORITY algorithm that predicts
( )
X̂t = 1
X X
Yi,t ≥ (1 − Yi,t )
i i

where 1 {·} is the indicator function. In class, we showed that if there is a perfect
expert, the regret is upper bounded by log2 N. The best expert wlill make up to m
mistakes.
(a) Assume that it is known that the best expert will make upto m mistakes. De-
scribe a suitable modification to the algorithm in described in class and obtain
a bound on the regret for this case.
(b) Now assume that you do not know that there is such a best expert. What
algorithm would you use and what regret would you get. Compare with the
result from the previous part.
4. Let Xt be a sequence of IID Bernoulli(p) random variables with p > 0.5.
(a) Consider the oracle which knows the value of p. What should be the oracle’s
strategy and what is the expected number of mistakes it makes in t rounds?
(b) A forecasting algorithm that does not know p applies the majority algorithm on
the past data to predict X̂t . Specifically,
( t )
X̂t = 1
X X
−1Xs ≥ (1 − Xs )
s=1 i
i.e., the forecast for round t is the majority value in round 1, · · · , t − 1 with
randomisation if there is not majority. First, obtain the probability of the
forecaster making a mistake in round t. Then characterise the expected number
of total mistakes that the forecaster makes in t rounds. Also characterise the
regret, computed as the difference between the expected number of mistakes of
the forecaster and that of the oracle
5. Here is a different kind of a online learning problem. x1 , . . . xn are Boolean variables;
let x := [x1 , . . . , xn ]. A function f (x) = xi1 ∨ xi2 ∨ xir is to be learned as follows.
1. An example x is revealed. This is called an instance of x.
2. The algorithm predics f .
3. The true value of f (x) is revealed and the algorithm uses this to adapt the
prediction algorithm.
Consider the following learning/prediction algorithm.
1. Initialise w1 , . . . wn to 1.
2. Receive x and predict fˆ(x) = 1 if
w 1 x + w 2 x2 + · · · + w n xn ≥ n
and predict fˆ(x) = 0 otherwise
3. Receive the true value f (x). If f (x) 6= fˆ(x), then
(a) If fˆ = 0 and f = 1, then for each xi = 1 in x, double wi .
(b) If fˆ = 1 and f = 0, then for each xi = 1 in x, halve wi .
4. Go to step 2 above.
Prove the claim that the algorithm makes at most 2 + 3r(1 + log n) mistakes. You
can obtain this by bounding the number of mistakes on each of the two types of
mistakes above.
6. In the EWMApalgorithm discussed in class, we chose the value of the learning pa-
rameter β = (8 ln K)/T where we needed to know T, the maximum number of
rounds.
p If we dont know T, then we can use a time varying learning parameter
βt = (8 ln K)/t. Rework the proof and obtan the regret at time t with this choice
of time varying t.
7. In the EWMA algorithm discussed in class, we chose to use Hoeffding’s Lemma
in calculating Wt+1 /Wt . Recall that there are other concentration inequalities that
provide tighter bounds when additional information about the random variables is
available. Specifically, investigate Bernstein’s Inequality and describe the conditions
under which this can be used in deriving the regret bounds. Work out the regret for
this case.

Page 2

You might also like