Tutorial
Tutorial
Tutorial
Machine Learning
a tour through some
a 1-semester favorite
course results,
in 2 hrs
directions, and open problems
Your guide: Avrim Blum
Carnegie Mellon University
(no tipping)
• “uniform convergence”
• Gives hope for local optimization over training data.
Tighter measures of size(H)
• Suppose we want to separate data using a
line in the plane.
unlike most
C.R.s or apx
bounds, numbers
are pretty good.
Analysis
• Say at time t we have fraction Ft of weight on
experts that made mistake.
• So, we have probability Ft of making a mistake, and
we remove an Ft fraction of the total weight.
– Wfinal = n(1- F1)(1 - F2)...
– ln(Wfinal) = ln(n) + t [ln(1 - Ft)] · ln(n) - t Ft
(using ln(1-x) < -x)
= ln(n) - M. ( Ft = E[# mistakes])
• If best expert makes m mistakes, then ln(Wfinal) > ln((1-)m).
• Now solve: ln(n) - M > m ln(1-).
A fun application
• n buckets. (Think of as startup companies.)
• You are standing in one of them.
• At each time step, a ball falls into one of the buckets. If
it's your bucket, you get $1.
• Then you can choose to move if you want
• Game ends when fullest bucket has d balls.
Nice properties:
• <f,f> = x PrD(x)f(x)f(x) = 1.
• <f,g> = x PrD(x)f(x)g(x)
= PrD[f(x) = g(x)] – PrD[f(x) != g(x)].
“orthogonal” = “pairwise uncorrelated”. E.g.,
under uniform dist, all 2n parity functions
are orthogonal (so they form a basis).
SQ dimension
SQ dimension of a concept class C, over distr
D, is the size of the largest subset S of C
of “nearly orthogonal” functions:
– For all f,g in S, |<f,g>| < 1/|S|.