0% found this document useful (0 votes)

37 views20 pages

Chapter 3 - Introduction Via Linear Regression

The document introduces linear regression as a method for supervised machine learning by formulating the problem, discussing statistical inference and learning approaches, and focusing on maximum likelihood estimation to learn the parameters of a discriminative probabilistic model from training data.

Uploaded by

PranavPrabhakaran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views20 pages

Chapter 3 - Introduction Via Linear Regression

Uploaded by

PranavPrabhakaran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

INTRODUCTION TO MACHINE LEARNING VIA

LINEAR REGRESSION
Study key concepts band on tu
peu nd learning

3 I Supervised learning

Problem formulation Gwu training ht D containing

N training punts tn tn nal N
X n are independent variables i w variates domainpoint

explanatory variables
1n are dependent variables dependent variables labels
responses
E
ntn
15
N IO
X
I

OJ X
X
L I X l I 2 En
U L O Y OG O f
UT
X
l
X

1.5

bout predict t 1W an unobserved domain point

need to make assuphins about the mechanism
generating data inductive bias
difference between me minting and learning
3 2 Statistical inference
T
predicting r r
given observation X and known
joint distribution pix t

mhm a non
negative loss function llt.tn
Cust Russ risk if correct value is tandeshmate ist

la loss lg Ct It It I 19

Ex quadraticloss lzlt.FI ft Il
I t I
U I loss to It It It I lo O t I

Generalization risk loss average Ivr pm prediction fly

Lp Ltte Emina htt I HID

Optimalprediction tix obtained by mhinking

I E It
aryfin Etna felt
txt

Onlyneed to know the pushionir distribution pm

joint distribution pt not required
o Fir le E HI Emp Ttt
as
Et S TEl p the DT ft 2 IT EY
Ait DT
Elt Ix 2 2 I ELT ly

O
duEI 2E 2E IT 1 1 0 I E Thr

Et PHI x OS fI t t t 0.2 f tt x
n pI t 1 1

Li t xI t O S t l il O 2 OG x

Performance ofpredictor t r is measured

by the
Clitheroe between Lpl f and the minimum
generals tution loss Lp Et of the optimal
predictor
m the following three learning approaches for t HI
are discussed frequentist approach Bayesian
approach ND L approach

3 3 Frequentist approach
Assumption Turning data punts xn t n I e D are
drawn i i d from a true but unknown distribution

p It t 1 tn Tn T.ee pl t t I e l N

Distribution unknown ay run strategy from us we

cannot be applied
Two ways to approach problem of unknown dishib

separate learning and inference

lean
approx of distribution Bott Ix 1 bated on
data and u te
I lxl ay renin Eta pm I l IT E Ix ell

direct interference via empirical risk minimitation

E Rn learn approximationTp Lt 1 of optimal
decision as
Eb CH agmein Lb Ctl
with the empirical risk loss

LA fr e I t n f Hn

Remaki In contrast to the generahtation loss where

expectation over true distribution is calculated

here we take the average over the available data

we hist look at vice alinear refresh in example

Assure i pl t t I p It I pl t Ix xn Unit 10,1 and

PTI t TN Lh ZI x O l
n th
15 N lo
X

OJ X
X
L I X l I 2 En
U L O Y OG O f
UT
X
l
X

1.5

If distribution is known the optimal predictor

ee act

uncle ez loss is th x I an L2 Tx
Minimum generalisation loss
Lp I I l J tf Ptv dt
hh ziti
E Thx 2 tin 12TH ELT x 4h42 It
E T2 H E 4T x Var NH o I

3 3 1 Discriminative vs generative models

Formulate a hypothesisclass familyofparametric

probabilistic models learn parans of mold which best
ht the data
Muhl la bdt as a
polynomial of domain puntt
Gaussian noir
c Curtin
µ w Wj WTH HI
je O
with weights he two wrist and featureveeter
T
of It I El x x2 XM J M model order

Now define the parametric prob model

pl t l x El N µ tie B Y modulepunnets
E he M
B precinin inverse variance

Discriminative model
learning the anchtimal distribution

plt l t E by learning theparametervector

directlyfromthe data estimator in 14 can be directly
calculated
mode discriminates t band on their Apps
Main focus in this section

Generative probabilistic mode

learning the joint distribution p l t x1 parameterled

by E i e plx.tl e
Remark This mutils also the distribution pit

ofthe cover ates model privates a realitation

of x via p Ct l E bycomputing the marginaldistribution

Un bag es theorem to obtain p I t l t E and
then the estimate fol x l i n l
3 3.2 ML learning
Assume model crow is hired we want to learn
model parameters I
in Mm mu f dahe points
Discriminative more
p Itb Ltd he D
II p I th l th e P

F Nlt
n I
n 1µL xn et D Y
Take log on both sides
N
lug likelihood LL function

Lm p Lt Lt i K B I
n I
hn p l th Hn E Al

E In µ Hn wt tn

E en

ML learning problem chhned as minimitation of

the negative L L I N LL
which is only knit in if muchl pawns

Yip tr Fahr pl t n l xn K Al

cross entropy w tog lo is criterion

why Strong law of late numbers
ftp.uhnpltnlxn er B E exit tenPITHEAD
p
i

Ex up THIplTHI Ilpl t Ike PDDeawpetaiahk.gg

expected cross entropy ML problem attempts
to make model band pl t l x y pl closeto actual pit 1 1
de problem requires only learning the a posteriori mean

IA tem can be ignored

tn
Mff Lpl Inline 4
w
meh

Lp e training loss
u 21 can be solved in cloud 1am

hes N HED ell 131

T
Ep II Lt I
Il th of Itn 1 Nx Ime 1 matrix

M T
H El x1

Mini mitation of 131 is a least squares ILS

problem with the solution

Ini l EptEb T EptEso

overdetermined case Na M t I

them
got
Es 5 is also known under the

name pseudo inverse j

o
Dittantiating the NLL withrespectto p yields
plum Lpl Emr
0Wh Hung and undetithing
Assumption lz loss

Going back to learple with p It 1 1 NI ah 12TH U D

and the optimal but unknown predictor E Lx
h n 12TH
Hw does this impure with the ML predictor
In H µ x Wme
n t Eru lN
M l
L M 3
X Ma g

O L Oy U6 O f
n
f

o
M t predictor underhits the data
large training loss

o M 9 we hits the data i

small training loss
but late generuhtation loss

Lp Incl Et He pull IT Mlt End

model memorites the training tehonly
o M 3 good choice
Is r
undehthny
omitting
i i
v

o
s
f
u u

t l t t t t e I
n

what happens Aw a
large training set
reiss n
1

o.s
tent genuhration
o.o

ut
Fp training
N
to 2 Jo lo fo to fo
Renate
If N is lap enough compared tothe of
parameters in E Lp Lyle let the
weight vector yn that minimum also

approx mr minutes Lp
Etf It
Im Lp he
o
yn af
LD Lynut Lp Inc a LpLett
l w Lage N L we make this precise late

Lp lend IN as it becomes more chillicult to

hit data

Error analysis
Two hourtypes bias and estimation error

Write generulitation loss as follows

bestgeneralizationloss forgiven
lv
motel
Lpl him Lp E t
Lphe l LplFt givenhyp
class
Mh generalisation bias

Lp hem Lp Htt
estimation error
generalisation 9
best ML estimate
Neafl
ML estimate fu hind N
squarerootloss

is
Lpl him
OG iph.it
o y

v l Lpltt
stem
l l l l l I I J N
lo w 30 40 50 Go 70

Validation and testing

Problem i hw to what the model crown
The distribution p Lt t is unknown and
Lp El cannot be evaluated
Shihan divide available data in twoparts

training ht hold out or validation set

used to obtain empirical average
Vr
Lp ht e
th I NI 1
Ll th µ tn ul
our validation ht of hse Nv
he has been obtained via training set
Lp Ld Unit 1W model ordertelectris

Test ht is additionally needed to compute estimate

of l p l w determined choice of M E
3 3 3 MAP learning
Use a
prini information on vectorof weights
can be used to reduce the effects of our hitting
11Wh explodes for increaring M butut Sechin2.3.41

Remedy priori distribution which puts

apply an a

less weight on large values i e

Ku NII L 1 iid variance d

ML matimiting LL ie
pl to 1 E BI
MAP matinite
w
pLtd El XD B p let IT
n I
pl t n l tn ie B

MAP learning criterion

en pie
Iif
En lnpltnltn.ie pl

with training loss Int N II to EDEN

and t.tn p we obtain

meh htt Hell

ML criterion Imlaitation term RIE

flute on Lbystandad LS analytis

Knap It t
Is I II to why

As N fits
lay contribution of term 21 becomes

Mpi gbh ML estimate

ate 1h51
1T

I Lp

oT
LD
l l l l l s
ko Lo o hn Ld

In creating 7 has the same effect as reducing mold orders

0 the example for a

priori dis hi b Lagrange pdf
M
s R y II hell I o Iwl
j
Remthly MAPproblem
LASSO
Mj
n Lp El t
In 11well
leastabsolute shrinkage
and selection operator
3 U Bayesian approach
putunknown
Frequentist approach existence of true distribution
plt ltl assumed ML IMAP problem tries to hnd
parameter vector E huh that a model dis ti b
p I t l x we B is close to the true distribution

Bayesian approach
Iii Data points are jointly distributed via a known
distribution
Ii Model parameters e are jointly distributed with the data

Ignoring P in the IMaung assuming only E to beoplimited

Character led by joint distribution
p I t b he t I xD x 1 1 for a new domain point x
and new label t 1 to be predicted
o
Bayesian mold evaluates the a posteriori posterior
distribution p l t l xp tp.tl p It I D t jiver
XD t to predict new label
a to
posteriori distribution obtained by manipulating
dis hi b da

By why the chain rule of cond probability

a p l bl c
p l ab l c p al b c we obtain

p I t bet I xp x I I tou l t s x pl t I ta tis uh

p
plusI xD y p I t b l exist pI t l D I
a p ri distribution
PI tis he t I xD x

ply pltblxb.ie pl t Ix y H

a penni Uist likelihood hist if new label

likelihood term p I ta l xD E

II Mtn Hulin uh B I
o
potty new label PLH x El NI t I felt It AY

u Factoritation can be praphically represented by a

Bayesian network
We hrs t chop the dependency on the domain

punts in L1
pltp.w.tl plus pl t.rs w1pltlw
Pluta vertex tar each involved r V and

acliched eye for all conditioned r v s he

the main r r S in each dishib

h l N

Ot
Bayetianguaffproach is inference based
approach the learningstandpoint is hidden as
v

all quarh his in the model are r r S

hav bturn a posteriori distribution

o
bymanipulating the

Plt D plwt p It D Hel dy

p t1D H
pl D x
t
f pl DX

Jp let pl tlDx E de LY
a posteriori distrib predictive dis hits
y u
tor nu del
UhrigBages
Phd Pl tool XD k
ply 1 D pl TDI xp
a posteriori posterior
belief

Computation in funeral di th cult but for the example

that p l to I xp uh pl t Ix I are Gaussian
distributions L in wens the convolution of these

distributions and I lie loss assumed we obtain

Pitt xD I N Ml x Wmap s l x with

s ith B l it E HIT 17 It Eb I IH

tu Bishop lyns 13.58 13.59 12 115

optimal predictor is MAP uncle ez loss

not true in general
3 4.1 Comparison with ML and MA P
Dagen us a posteriori distribution p.lt It D
allows for refined prediction of labels t
a more

compared to ML a posteriori distrib pl t l t En I

Nl µ It um B 12milady far MAP
Tame variance for all the data points

Bayesian i accuracy of prediction depends on the

value
of set by the uneven prediction of data
t n

y Mltikmn.pl IS Lt

n't
I X X X
y
x
l l l l l S t
OL 0.4 U G US 1.0

adaptive uncertainty for Bayetian

For N s is i
Wmap Wn L
t
S2 Lx p Bayetian approaches ML
3 9.2 Marginal likelihood and mold will
Advantage
Bayesian model selection without validation possible

marginal likelihood
pl t I xD f pure IIF p I tn l tn ie de

la je M car mult in smaller likelihood

entrust to ML
in
approach where pl te I xp End
can only increase with in creating M allows to telect
model order
plt.rs1 too
selected model order

t
Z
r l
4
l
b
l
f
M

3 5 Mr mmmm descriptionlength In DL

Cm push'm Y r V XE X with distribution pit

tussle Is compressionscheme can be dehorned which
needs F log pc.tl 7 hits to represent x

o MD L selects a model which lostlessly Compresses

D to theshortest possible description
u th u lil Wile M is selected which minimises
description length
N
T lo
L y p I t n I t n Wen l B nu t C th I
nIl
of bits toquantize
smallest of bits to describe parametersWmc Bnc
model
acts as regularize

not discussed in class

u su Chapter 2 G i n te t

read about pitfalls associated with learned distributions

Chapter 2.71

Invitation To Computer Science 6th Edition Solution
100% (3)
Invitation To Computer Science 6th Edition Solution
9 pages
Understanding Machine Learning Solution Manual: 2 Gentle Start
No ratings yet
Understanding Machine Learning Solution Manual: 2 Gentle Start
67 pages
Word Problems by Adding and Subtracting Fractions
No ratings yet
Word Problems by Adding and Subtracting Fractions
10 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
Toc 1
No ratings yet
Toc 1
17 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
ML-chap10 2024 110300
No ratings yet
ML-chap10 2024 110300
29 pages
Statistical Learning: First Steps: Sasha Rakhlin
No ratings yet
Statistical Learning: First Steps: Sasha Rakhlin
26 pages
Chapter02 Introduction To DeepLearning
No ratings yet
Chapter02 Introduction To DeepLearning
84 pages
Bayesian Learning Rules
No ratings yet
Bayesian Learning Rules
37 pages
Unit-2 MLT
No ratings yet
Unit-2 MLT
84 pages
Lecturenotes
No ratings yet
Lecturenotes
56 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Log-Linear Models and Conditional Random Fieldsels
No ratings yet
Log-Linear Models and Conditional Random Fieldsels
27 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
AIML-Unit 3 Notes-Assignment 3
No ratings yet
AIML-Unit 3 Notes-Assignment 3
37 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Bio24 Rathouz
No ratings yet
Bio24 Rathouz
45 pages
LR, Decision Tree
No ratings yet
LR, Decision Tree
48 pages
AI UNIT 3 Tycs
No ratings yet
AI UNIT 3 Tycs
16 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
Generalization Error of The Tilted Empirical Risk
No ratings yet
Generalization Error of The Tilted Empirical Risk
54 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
CS 229 - Supervised Learning Cheatsheet
No ratings yet
CS 229 - Supervised Learning Cheatsheet
2 pages
PRML RefSheet
No ratings yet
PRML RefSheet
6 pages
Classification
No ratings yet
Classification
47 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
MLP RL1
No ratings yet
MLP RL1
6 pages
Lecture 2 Annotated
No ratings yet
Lecture 2 Annotated
60 pages
Lecture 1
No ratings yet
Lecture 1
5 pages
BTMMeeting25Nov2020 StatisticalLearning
No ratings yet
BTMMeeting25Nov2020 StatisticalLearning
49 pages
Lecture 1 2022
No ratings yet
Lecture 1 2022
55 pages
ML Merge
No ratings yet
ML Merge
145 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
213 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
Ai512 Book
No ratings yet
Ai512 Book
127 pages
CS775 Lec 2
No ratings yet
CS775 Lec 2
66 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
Statistical Methods-1
No ratings yet
Statistical Methods-1
63 pages
Chap1 Bishop
No ratings yet
Chap1 Bishop
35 pages
Unit 3
No ratings yet
Unit 3
16 pages
I2ml3e Chap10
No ratings yet
I2ml3e Chap10
27 pages
CS168: The Modern Algorithmic Toolbox Lecture #5: Generalization (Or, How Much Data Is Enough?)
No ratings yet
CS168: The Modern Algorithmic Toolbox Lecture #5: Generalization (Or, How Much Data Is Enough?)
16 pages
ML 3
No ratings yet
ML 3
66 pages
CS772 Lec8
No ratings yet
CS772 Lec8
14 pages
ML Basics Lecture2 Linear Classification
No ratings yet
ML Basics Lecture2 Linear Classification
34 pages
Lecture 2 - Principle of Machine Learning
No ratings yet
Lecture 2 - Principle of Machine Learning
39 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
CSE 440 AI Volume1 (p1)
No ratings yet
CSE 440 AI Volume1 (p1)
4 pages
03 Lecturenote MLE MAP
No ratings yet
03 Lecturenote MLE MAP
7 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
An Information-Theoretic Approach To Generalization Theory - Part2
No ratings yet
An Information-Theoretic Approach To Generalization Theory - Part2
22 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Calc Copy Paste
No ratings yet
Calc Copy Paste
3 pages
5th IPM Final Paper 2023
No ratings yet
5th IPM Final Paper 2023
6 pages
Applying Properties of Perpendicular Bisector in Proving Triangle Congruence - 075808
No ratings yet
Applying Properties of Perpendicular Bisector in Proving Triangle Congruence - 075808
6 pages
Class 6 1 Reinforcement Worksheet Percentage
0% (1)
Class 6 1 Reinforcement Worksheet Percentage
2 pages
H H H H: Exercises
No ratings yet
H H H H: Exercises
3 pages
Maths Sample Paper-09
No ratings yet
Maths Sample Paper-09
4 pages
Systems of Linear Equations: Ariadna Haydar Chams Juan Diego González Daniela Morales Andrés Mauricio Torres
No ratings yet
Systems of Linear Equations: Ariadna Haydar Chams Juan Diego González Daniela Morales Andrés Mauricio Torres
18 pages
Paper For Final Submission
No ratings yet
Paper For Final Submission
6 pages
Year 9 Revision Topics TA2 2022 2023 1
No ratings yet
Year 9 Revision Topics TA2 2022 2023 1
3 pages
Addition and Subtraction of Integers
No ratings yet
Addition and Subtraction of Integers
8 pages
1468559154E textofChapter7Module5
No ratings yet
1468559154E textofChapter7Module5
6 pages
CSE 5th Semester - Disaster Risk Reduction and Management - MX3084 - Hand Written Notes - Unit 3 - Disaster Management
No ratings yet
CSE 5th Semester - Disaster Risk Reduction and Management - MX3084 - Hand Written Notes - Unit 3 - Disaster Management
23 pages
Wma11 01 Que 20230509
No ratings yet
Wma11 01 Que 20230509
32 pages
05 OR BMS MCQs
100% (1)
05 OR BMS MCQs
12 pages
Math From Real Test 3 Abec8a 26-06-2025
No ratings yet
Math From Real Test 3 Abec8a 26-06-2025
25 pages
Class Ix HHW
No ratings yet
Class Ix HHW
8 pages
The Taylor Series Expansion For The Riemann Zeta Function
No ratings yet
The Taylor Series Expansion For The Riemann Zeta Function
7 pages
P6 MTC Workbook Term 2 2020 PDF
100% (2)
P6 MTC Workbook Term 2 2020 PDF
223 pages
Aisha Bawany Academy
No ratings yet
Aisha Bawany Academy
4 pages
MATHS GRD-3 2nd Semester Syllabus 2023-2024 Updted Syllabus
No ratings yet
MATHS GRD-3 2nd Semester Syllabus 2023-2024 Updted Syllabus
2 pages
Unit III Calculus
No ratings yet
Unit III Calculus
2 pages
2014 Grade 12 Math Final Exam Paper 2 Memo
No ratings yet
2014 Grade 12 Math Final Exam Paper 2 Memo
23 pages
Test - 7 (JM) (13 08 2023)
No ratings yet
Test - 7 (JM) (13 08 2023)
43 pages
MATH4052 - Partial Differential Equations: Wave Equation
No ratings yet
MATH4052 - Partial Differential Equations: Wave Equation
9 pages
I. Objective: Camarines Sur National High School
No ratings yet
I. Objective: Camarines Sur National High School
6 pages
Chapter 1: Formal Logic: Tannaz R.Damavandi Cal Poly Pomona
No ratings yet
Chapter 1: Formal Logic: Tannaz R.Damavandi Cal Poly Pomona
52 pages
12th Class Math Test Chapter 3 1st Half
No ratings yet
12th Class Math Test Chapter 3 1st Half
1 page
Number Theory Homework 5
No ratings yet
Number Theory Homework 5
23 pages

Chapter 3 - Introduction Via Linear Regression

Uploaded by

Chapter 3 - Introduction Via Linear Regression

Uploaded by

INTRODUCTION TO MACHINE LEARNING VIA

Problem formulation Gwu training ht D containing

bout predict t 1W an unobserved domain point

Generalization risk loss average Ivr pm prediction fly

Lp Ltte Emina htt I HID

Optimalprediction tix obtained by mhinking

Onlyneed to know the pushionir distribution pm

Performance ofpredictor t r is measured

Distribution unknown ay run strategy from us we

separate learning and inference

direct interference via empirical risk minimitation

Remaki In contrast to the generahtation loss where

here we take the average over the available data

we hist look at vice alinear refresh in example

If distribution is known the optimal predictor

3 3 1 Discriminative vs generative models

Formulate a hypothesisclass familyofparametric

Now define the parametric prob model

plt l t E by learning theparametervector

Generative probabilistic mode

learning the joint distribution p l t x1 parameterled

ofthe cover ates model privates a realitation

of x via p Ct l E bycomputing the marginaldistribution

ML learning problem chhned as minimitation of

cross entropy w tog lo is criterion

Ex up THIplTHI Ilpl t Ike PDDeawpetaiahk.gg

IA tem can be ignored

hes N HED ell 131

Mini mitation of 131 is a least squares ILS

Ini l EptEb T EptEso

name pseudo inverse j

Going back to learple with p It 1 1 NI ah 12TH U D

o M 9 we hits the data i

Lp Incl Et He pull IT Mlt End

Lp lend IN as it becomes more chillicult to

Write generulitation loss as follows

Validation and testing

training ht hold out or validation set

Test ht is additionally needed to compute estimate

Remedy priori distribution which puts

less weight on large values i e

MAP learning criterion

with training loss Int N II to EDEN

meh htt Hell

ML criterion Imlaitation term RIE

Mpi gbh ML estimate

In creating 7 has the same effect as reducing mold orders

0 the example for a

Ignoring P in the IMaung assuming only E to beoplimited

By why the chain rule of cond probability

p I t bet I xp x I I tou l t s x pl t I ta tis uh

a penni Uist likelihood hist if new label

u Factoritation can be praphically represented by a

acliched eye for all conditioned r v s he

the main r r S in each dishib

all quarh his in the model are r r S

hav bturn a posteriori distribution

Plt D plwt p It D Hel dy

Computation in funeral di th cult but for the example

distributions and I lie loss assumed we obtain

Pitt xD I N Ml x Wmap s l x with

tu Bishop lyns 13.58 13.59 12 115

optimal predictor is MAP uncle ez loss

compared to ML a posteriori distrib pl t l t En I

Bayesian i accuracy of prediction depends on the

adaptive uncertainty for Bayetian

la je M car mult in smaller likelihood

Cm push'm Y r V XE X with distribution pit

o MD L selects a model which lostlessly Compresses

not discussed in class

read about pitfalls associated with learned distributions

You might also like