0% found this document useful (0 votes)

47 views23 pages

5.4 MLBasics Estimators

This document discusses machine learning basics related to estimators, bias, and variance. It defines key concepts such as point estimation, bias, variance, and standard error. Point estimators are functions used to estimate unknown parameters or relationships from data. Bias measures how far an estimator deviates from the true value on average, while variance captures how much an estimator varies between samples. Understanding these concepts is important for evaluating learning algorithms and generalization.

Uploaded by

Jay Vee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views23 pages

5.4 MLBasics Estimators

Uploaded by

Jay Vee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Deep Learning Srihari

Machine Learning Basics:

Estimators, Bias and Variance
Sargur N. Srihari
srihari@cedar.buffalo.edu

This is part of lecture slides on Deep Learning:

http://www.cedar.buffalo.edu/~srihari/CSE676
1
Srihari

Topics in Basics of ML
Deep Learning

1.  Learning Algorithms

2.  Capacity, Overfitting and Underfitting
3.  Hyperparameters and Validation Sets
4.  Estimators, Bias and Variance
5.  Maximum Likelihood Estimation
6.  Bayesian Statistics
7.  Supervised Learning Algorithms
8.  Unsupervised Learning Algorithms
9.  Stochastic Gradient Descent
10. Building a Machine Learning Algorithm
11. Challenges Motivating Deep Learning 2
Deep Learning Srihari

Topics in Estimators, Bias, Variance

0. Statistical tools useful for generalization
1.  Point estimation
2.  Bias
3.  Variance and Standard Error
4.  Bias-Variance tradeoff to minimize MSE
5.  Consistency

3
Deep Learning Srihari

Statistics provides tools for ML

•  The field of statistics provides many tools to
achieve the ML goal of solving a task not only
on the training set but also to generalize
•  Foundational concepts such as
–  Parameter estimation
–  Bias
–  Variance
•  They characterize notions of generalization,
over- and under-fitting
4
Deep Learning Srihari

Point Estimation
•  Point Estimation is the attempt to provide the
single best prediction of some quantity of
interest
–  Quantity of interest can be:
•  A single parameter
•  A vector of parameters
–  E.g., weights in linear regression
•  A whole function

5
Deep Learning Srihari

Point estimator or Statistic

•  To distinguish estimates of parameters from
their true value, a point estimate of a parameter
θ is represented by θ̂
•  Let {x(1), x(2),..x(m)} be m independent and
identically distributed data points
–  Then a point estimator or statistic is any function of
the data
θ̂m = g(x (1),...x (m) )
•  Thus a statistic is any function of the data
•  It need not be close to the true θ
–  A good estimator is a function whose output is close
to the true underlying θ that generated the data 6
Deep Learning Srihari

Function Estimation
•  Point estimation can also refer to estimation of
relationship between input and target variables
–  Referred to as function estimation
•  Here we predict a variable y given input x
–  We assume f(x) is the relationship between x and y
•  We may assume y=f(x)+ε
–  Where ε stands for a part of y not predictable from x
–  We are interested in approximating f with a model fˆ
•  Function estimation is same as estimating a parameter θ
–  where fˆ is a point estimator in function space
•  Ex: in polynomial regression we are either estimating a
parameter w or estimating a function mapping from x to y
Deep Learning Srihari

Properties of Point Estimators

•  Most commonly studied properties of point
estimators are:
1.  Bias
2.  Variance
•  They inform us about the estimators

8
Deep Learning Srihari

1. Bias of an estimator
•  The bias of an estimator θ̂m = g(x (1),...x (m)) for
parameter θ is defined as
( )
bias θ̂m = E ⎡⎣θ̂m ⎤⎦ − θ

•  The estimator is unbiased if bias( θ̂m )=0

–  which implies that E ⎡⎣θ̂m ⎤⎦ = θ
•  An estimator is asymptotically unbiased if
( )
limm→∞ bias θ̂m = 0

9
Deep Learning Srihari

Examples of Estimator Bias

•  We look at common estimators of the following
parameters to determine whether there is bias:
–  Bernoulli distribution: mean θ
–  Gaussian distribution: mean µ
–  Gaussian distribution: variance σ2

10
Deep Learning Srihari

Estimator of Bernoulli mean

•  Bernoulli distribution for binary variable x ε{0,1}
with mean θ has the form P(x;θ) = θx (1 − θ)1−x
•  Estimator for θ given samples {x(1),..x(m)} is θ̂ = m1 ∑ x
m
(i )
m
i =1

•  To determine whether this estimator is biased

determine bias(θ̂ ) = E ⎡⎣θ̂ ⎤⎦ − θ
m m

⎡ 1 m (i ) ⎤
= E ⎢ ∑x ⎥ − θ
⎣ m i −1 ⎦
m
1
= ∑ E ⎡⎣x (i ) ⎤⎦ − θ
m i =1
1 m 1
( )
= ∑ ∑ x (i )θx (1 − θ)(1−x ) − θ
m i =1 x (i ) =0
(i ) (i )

1 m
= ∑ (θ) − θ = θ − θ = 0
m i =1

–  Since bias( θ̂ )=0 we say that the estimator is unbiased

m
Deep Learning Srihari

Estimator of Gaussian mean

•  Samples {x(1),..x(m)} are independently and
identically distributed according to p(x(i))=N(x(i);µ,σ2)
–  Sample mean is an estimator of the mean parameter
1 m (i )
µ̂m = ∑ x
m i =1
–  To determine bias of the sample mean:

–  Thus the sample mean is an unbiased estimator of the

Gaussian mean
Deep Learning Srihari

Estimator for Gaussian variance

•  The sample variance is ( 1 m (i )
)
2
σ̂ = ∑ x − µ̂m
2
m
m i =1
•  We are interested in computing
bias( σ̂m2 ) =E( σ̂m2 ) - σ2
•  We begin by evaluating à
•  σ̂ 2
Thus the bias of m is –σ2/m
•  Thus the sample variance is a biased estimator
•  The unbiased sample variance estimator is
1 m (i )
( )
2
σ̂ =
2
m ∑ x − µ̂m
m − 1 i =1
13
Deep Learning Srihari

2. Variance and Standard Error

•  Another property of an estimator:
–  How much we expect the estimator to vary as a
function of the data sample
•  Just as we computed the expectation of the
estimator to determine its bias, we can compute
its variance
•  The variance of an estimator is simply Var(θ̂ )
where the random variable is the training set
•  The square root of the the variance is called the
standard error, denoted SE(θ̂)
14
Deep Learning Srihari

Importance of Standard Error

•  It measures how we would expect the estimate
to vary as we obtain different samples from the
same distribution
•  The standard error of the mean is given by
⎡ 1 m (i ) ⎤ σ
( )
SE µ̂m = Var ⎢ ∑ x ⎥ =
⎣ m i =1 ⎦ m

–  where σ2 is the true variance of the samples x(i)

–  Standard error often estimated using estimate of σ
•  Although not unbiased, approximation is reasonable
–  The standard deviation is less of an underestimate than variance
Deep Learning Srihari

Standard Error in Machine Learning

•  We often estimate generalization error by
computing error on the test set
–  No of samples in the test set determine its accuracy
–  Since mean will be normally distributed, (according
to Central Limit Theorem), we can compute
probability that true expectation falls in any chosen
interval
•  Ex: 95% confidence interval centered on mean µ̂m is
(µ̂ m ( ) ( ))
− 1.96SE µ̂m , µ̂m + 1.96SE µ̂m

•  ML algorithm A is better than ML algorithm B if

–  upperbound of A is less than lower bound of B
Deep Learning Srihari

Confidence Intervals for error

95% confidence intervals for error estimate

17
Deep Learning Srihari

Trading-off Bias and Variance

•  Bias and Variance measure two different
sources of error of an estimator
•  Bias measures the expected deviation from the
true value of the function or parameter
•  Variance provides a measure of the expected
deviation that any particular sampling of the
data is likely to cause

18
Deep Learning Srihari

Negotiating between bias - tradeoff

•  How to choose between two algorithms, one
with a large bias and another with a large
variance?

–  Most common approach is to use cross-validation

–  Alternatively we can minimize Mean Squared Error
which incorporates both bias and variance

19
Deep Learning Srihari

Mean Squared Error

•  Mean Squared Error of an estimate is
MSE = E ⎡⎢( θ̂ − θ ) ⎤⎥
2

⎣m
⎦
=Bias ( θ̂ ) + Var ( θ̂ )
2
m m

•  Minimizing the MSE keeps both bias and

variance in check
As capacity increases, bias (dotted )
tends to decrease and variance (dashed)
tends to increase

20
Deep Learning Srihari

Underfit-Overfit : Bias-Variance
Relationship of bias-variance to capacity is similar to
underfitting and overfitting relationship to capacity

Bias-Variance to capacity Model complexity to capacity

Both have a U-shaped curve of generalization

Error as a function of capacity

21
Deep Learning Srihari

Consistency
•  So far we have discussed behavior of an
estimator for a fixed training set size
•  We are also interested with the behavior of the
estimator as training set grows
•  As the no. of data points m in the training set
grows, we would like our point estimates to
converge to the true value of the parameters:
plimm→∞θ̂m = θ
–  Symbol plim indicates convergence in probability
Deep Learning Srihari

Weak and Strong Consistency

•  plimm→∞θ̂m = θ means that
For any ε > 0, P(| θ̂m − θ |> ε) → 0 as m → ∞
•  It is also known as weak consistency
•  Implies almost sure convergence of θ̂ to θ
•  Strong consistency refers to almost sure convergence
of a sequence of random variables x(1),x(2),… to a
value x occurs when
p(limm→∞x (m) = x) = 1
•  Consistency ensures that the bias induced by the
estimator decreases with m
23

Unit 1-Week2: Linear Regression, Bias, Variance, Under and Over Fitting, Curse of Dimensionality and ROC
No ratings yet
Unit 1-Week2: Linear Regression, Bias, Variance, Under and Over Fitting, Curse of Dimensionality and ROC
53 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Stock Watson 3u Exercise Solutions Chapter 13 Instructors
No ratings yet
Stock Watson 3u Exercise Solutions Chapter 13 Instructors
15 pages
3.3 Bias Variance
No ratings yet
3.3 Bias Variance
14 pages
6.estimators (C)
No ratings yet
6.estimators (C)
5 pages
1 5 Bias Variance Trade Off
No ratings yet
1 5 Bias Variance Trade Off
34 pages
Lecture Notes For Mathematical Statistics
No ratings yet
Lecture Notes For Mathematical Statistics
184 pages
CS7015 (Deep Learning) : Lecture 8
No ratings yet
CS7015 (Deep Learning) : Lecture 8
86 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
6 pages
ZG512 L1 Introduction, Bias-Variance 270724
No ratings yet
ZG512 L1 Introduction, Bias-Variance 270724
19 pages
4.4 Parametric and Non-parametric Estimator
No ratings yet
4.4 Parametric and Non-parametric Estimator
47 pages
DL_Unit1 (1)
100% (1)
DL_Unit1 (1)
79 pages
Bias-Variance
No ratings yet
Bias-Variance
8 pages
PA DL Consolidated
No ratings yet
PA DL Consolidated
94 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
Week2-Day 1-Introduction To Data Mining
No ratings yet
Week2-Day 1-Introduction To Data Mining
30 pages
Estimação Pontual
No ratings yet
Estimação Pontual
58 pages
Diagnosing Bias vs Variance
No ratings yet
Diagnosing Bias vs Variance
11 pages
Theory of Estimation by P.G.dixit, Nirali Publication
No ratings yet
Theory of Estimation by P.G.dixit, Nirali Publication
186 pages
Chapter 7. Statistical Estimation: 7.6: Properties of Estimators I
No ratings yet
Chapter 7. Statistical Estimation: 7.6: Properties of Estimators I
6 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
Estimators: The Basic Statistical Model
No ratings yet
Estimators: The Basic Statistical Model
9 pages
Basic Sampling Methods: Sargur Srihari Srihari@cedar - Buffalo.edu
No ratings yet
Basic Sampling Methods: Sargur Srihari Srihari@cedar - Buffalo.edu
30 pages
Lectura 2 Point Estimator Basics
No ratings yet
Lectura 2 Point Estimator Basics
11 pages
6.CHAPTER 4
No ratings yet
6.CHAPTER 4
9 pages
PDF Estimation Corr
No ratings yet
PDF Estimation Corr
43 pages
Chapter 7. Statistical Estimation 7.7: Properties of Estimators II
No ratings yet
Chapter 7. Statistical Estimation 7.7: Properties of Estimators II
6 pages
Lecture Notes Statistics II PDF
No ratings yet
Lecture Notes Statistics II PDF
139 pages
Agricultural Land Use in Kerala
No ratings yet
Agricultural Land Use in Kerala
5 pages
Ghojogh, Benyamin, and Mark Crowley
No ratings yet
Ghojogh, Benyamin, and Mark Crowley
23 pages
7.7
No ratings yet
7.7
6 pages
Estimators1 PDF
No ratings yet
Estimators1 PDF
2 pages
Machine Learning Lecture Notes Undergrad (1)
No ratings yet
Machine Learning Lecture Notes Undergrad (1)
19 pages
Learning Minimum Variance Unbiased Estimators (3)
No ratings yet
Learning Minimum Variance Unbiased Estimators (3)
5 pages
unit -1 leftover topic notes
No ratings yet
unit -1 leftover topic notes
8 pages
Basic Stats Estimation
No ratings yet
Basic Stats Estimation
8 pages
Biasvariancetradeoff 210313075413
No ratings yet
Biasvariancetradeoff 210313075413
13 pages
Module 4: Point Estimation: Statistics (OA3102)
No ratings yet
Module 4: Point Estimation: Statistics (OA3102)
41 pages
Debre Berhan University: College of Natural and Computational Science Department of Statistics
No ratings yet
Debre Berhan University: College of Natural and Computational Science Department of Statistics
9 pages
Approximate Inference: Sargur Srihari Srihari@cedar - Buffalo.edu
No ratings yet
Approximate Inference: Sargur Srihari Srihari@cedar - Buffalo.edu
18 pages
PA Combined
No ratings yet
PA Combined
264 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
213 pages
Machine Learning Math Essentials _12.02.2025
No ratings yet
Machine Learning Math Essentials _12.02.2025
88 pages
Generalization Error
No ratings yet
Generalization Error
9 pages
"Regularization
No ratings yet
"Regularization
48 pages
ML _ Bias vs Variance - GeeksforGeeks
No ratings yet
ML _ Bias vs Variance - GeeksforGeeks
11 pages
Notes Estimation Theory
100% (3)
Notes Estimation Theory
39 pages
16 Intro to Point Estimation, Bias, MSE, Efficiency, (8.1-8.2, 9.1-9.2)-1
No ratings yet
16 Intro to Point Estimation, Bias, MSE, Efficiency, (8.1-8.2, 9.1-9.2)-1
26 pages
Estimation Bertinoro09 Cristiano Porciani 1
No ratings yet
Estimation Bertinoro09 Cristiano Porciani 1
42 pages
3. PDF Estimation 23mar23
No ratings yet
3. PDF Estimation 23mar23
45 pages
All DL
No ratings yet
All DL
72 pages
4 Estimation
No ratings yet
4 Estimation
33 pages
5.3 MLBasics Hyperparam
No ratings yet
5.3 MLBasics Hyperparam
13 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
ACFrOgDxHI9RLajsdAAleI AMD3fD8GMumHY4hP954G9Nc5wG y r Km6yewAtD6KPaLn4JtmlryIevFHyE5hLCpCG9kYiN y2aUEiWWoofQYGd7Z10 ETX5BGeaw6ImvJ9HjlO8aNIJuqL7FlX9wq3pZ2PgZnbra RuhNZrYg==
No ratings yet
ACFrOgDxHI9RLajsdAAleI AMD3fD8GMumHY4hP954G9Nc5wG y r Km6yewAtD6KPaLn4JtmlryIevFHyE5hLCpCG9kYiN y2aUEiWWoofQYGd7Z10 ETX5BGeaw6ImvJ9HjlO8aNIJuqL7FlX9wq3pZ2PgZnbra RuhNZrYg==
16 pages
dis2-sol
No ratings yet
dis2-sol
12 pages
Statistical Inference II
No ratings yet
Statistical Inference II
3 pages
ESGB Evaluation Methods
No ratings yet
ESGB Evaluation Methods
84 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Homework Helpers: Trigonometry
From Everand
Homework Helpers: Trigonometry
Denise Szecsei
1/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
As The Deer Responsive Reading: Psalm 23 A Psalm of David
No ratings yet
As The Deer Responsive Reading: Psalm 23 A Psalm of David
2 pages
My Hope Is Built in Nothing Less
No ratings yet
My Hope Is Built in Nothing Less
1 page
Our Father
No ratings yet
Our Father
1 page
Way Maker
No ratings yet
Way Maker
3 pages
I Shall Not Be Moved
No ratings yet
I Shall Not Be Moved
1 page
Best Practices and Emerging Trends in Recruitment and Selection 2169 026X 1000173
No ratings yet
Best Practices and Emerging Trends in Recruitment and Selection 2169 026X 1000173
5 pages
Chap4 7 Mix f07
No ratings yet
Chap4 7 Mix f07
8 pages
Chapter 2: Descriptive Analysis and Presentation of Single-Variable Data
No ratings yet
Chapter 2: Descriptive Analysis and Presentation of Single-Variable Data
71 pages
6 Simple Interest Discounted Loan1
No ratings yet
6 Simple Interest Discounted Loan1
3 pages
Lecture 1 - Stress
No ratings yet
Lecture 1 - Stress
22 pages
Notes For Geom Con and Parallel Line Problem
No ratings yet
Notes For Geom Con and Parallel Line Problem
2 pages
MS1: Oceans and Us: Cecilia Conaco
No ratings yet
MS1: Oceans and Us: Cecilia Conaco
7 pages
Answers Are in Blue Text
No ratings yet
Answers Are in Blue Text
1 page
(Reviewer) 1st Le Chem 16
No ratings yet
(Reviewer) 1st Le Chem 16
3 pages
Final Exam Sample Physics 71
No ratings yet
Final Exam Sample Physics 71
11 pages
Physics 73 PS 1 - Original
No ratings yet
Physics 73 PS 1 - Original
13 pages
Linear Regression in Scikit-Learn (Sklearn) - An Introduction - Datagy
No ratings yet
Linear Regression in Scikit-Learn (Sklearn) - An Introduction - Datagy
22 pages
5 - Logistic Regression
No ratings yet
5 - Logistic Regression
19 pages
Speech Enhancement
No ratings yet
Speech Enhancement
9 pages
El Efecto Más Grande Es El de "D" Con Un Valor DE 2.43625
No ratings yet
El Efecto Más Grande Es El de "D" Con Un Valor DE 2.43625
6 pages
16 Graeco Latin Squares 323
No ratings yet
16 Graeco Latin Squares 323
7 pages
ECO 204 - Assignment III - Lab - Fall 2023
No ratings yet
ECO 204 - Assignment III - Lab - Fall 2023
2 pages
Rbi Grade B (Depr) - Test-10
No ratings yet
Rbi Grade B (Depr) - Test-10
4 pages
6 - 2022 - Advanced Bio2 - MixedModels - Dragon
No ratings yet
6 - 2022 - Advanced Bio2 - MixedModels - Dragon
33 pages
Post Hoc Test: Tukey HSD
No ratings yet
Post Hoc Test: Tukey HSD
2 pages
Analytic II - HW3 - 1106
No ratings yet
Analytic II - HW3 - 1106
6 pages
An Extropy Based Goodness of Fit Test
No ratings yet
An Extropy Based Goodness of Fit Test
1 page
Chapter15 Econometrics InstrumentalVariable
No ratings yet
Chapter15 Econometrics InstrumentalVariable
5 pages
OBE Syllabus in Elementary Statistics and Probability
No ratings yet
OBE Syllabus in Elementary Statistics and Probability
8 pages
MSexam Stat 2019F Solutions
No ratings yet
MSexam Stat 2019F Solutions
11 pages
Topic 3 - endogeneity (1)
No ratings yet
Topic 3 - endogeneity (1)
53 pages
04 Gauss Markov Proof
No ratings yet
04 Gauss Markov Proof
8 pages
5.3) Ordinal logistic regression 2
No ratings yet
5.3) Ordinal logistic regression 2
40 pages
Ch.2 - STATA Code For Website
No ratings yet
Ch.2 - STATA Code For Website
3 pages
ID Pengaruh Karakteristik Individu Dan Ling
No ratings yet
ID Pengaruh Karakteristik Individu Dan Ling
15 pages
Project641_2021S
No ratings yet
Project641_2021S
2 pages
3.4 Exercises: Fare Distance + +
No ratings yet
3.4 Exercises: Fare Distance + +
1 page
Correlation
No ratings yet
Correlation
16 pages
Y X1 X2 X1rata X2rata x1 x2 Y1: Regression
No ratings yet
Y X1 X2 X1rata X2rata x1 x2 Y1: Regression
4 pages
L9 Logistical Regression Models Updated
No ratings yet
L9 Logistical Regression Models Updated
10 pages
EBE Ch2
No ratings yet
EBE Ch2
10 pages
Quizz-ML
No ratings yet
Quizz-ML
3 pages
Final Exam Practice Problems
No ratings yet
Final Exam Practice Problems
8 pages
Chapter 17 - Logistic Regression
No ratings yet
Chapter 17 - Logistic Regression
32 pages