5.4 MLBasics Estimators
5.4 MLBasics Estimators
5.4 MLBasics Estimators
Topics in Basics of ML
Deep Learning
3
Deep Learning Srihari
Point Estimation
• Point Estimation is the attempt to provide the
single best prediction of some quantity of
interest
– Quantity of interest can be:
• A single parameter
• A vector of parameters
– E.g., weights in linear regression
• A whole function
5
Deep Learning Srihari
Function Estimation
• Point estimation can also refer to estimation of
relationship between input and target variables
– Referred to as function estimation
• Here we predict a variable y given input x
– We assume f(x) is the relationship between x and y
• We may assume y=f(x)+ε
– Where ε stands for a part of y not predictable from x
– We are interested in approximating f with a model fˆ
• Function estimation is same as estimating a parameter θ
– where fˆ is a point estimator in function space
• Ex: in polynomial regression we are either estimating a
parameter w or estimating a function mapping from x to y
Deep Learning Srihari
8
Deep Learning Srihari
1. Bias of an estimator
• The bias of an estimator θ̂m = g(x (1),...x (m)) for
parameter θ is defined as
( )
bias θ̂m = E ⎡⎣θ̂m ⎤⎦ − θ
9
Deep Learning Srihari
10
Deep Learning Srihari
⎡ 1 m (i ) ⎤
= E ⎢ ∑x ⎥ − θ
⎣ m i −1 ⎦
m
1
= ∑ E ⎡⎣x (i ) ⎤⎦ − θ
m i =1
1 m 1
( )
= ∑ ∑ x (i )θx (1 − θ)(1−x ) − θ
m i =1 x (i ) =0
(i ) (i )
1 m
= ∑ (θ) − θ = θ − θ = 0
m i =1
17
Deep Learning Srihari
18
Deep Learning Srihari
19
Deep Learning Srihari
⎣m
⎦
=Bias ( θ̂ ) + Var ( θ̂ )
2
m m
20
Deep Learning Srihari
Underfit-Overfit : Bias-Variance
Relationship of bias-variance to capacity is similar to
underfitting and overfitting relationship to capacity
21
Deep Learning Srihari
Consistency
• So far we have discussed behavior of an
estimator for a fixed training set size
• We are also interested with the behavior of the
estimator as training set grows
• As the no. of data points m in the training set
grows, we would like our point estimates to
converge to the true value of the parameters:
plimm→∞θ̂m = θ
– Symbol plim indicates convergence in probability
Deep Learning Srihari