0% found this document useful (0 votes)

134 views16 pages

Introduction To Statistical Learning

This document provides an introduction to statistical learning. It discusses estimating the relationship between an output variable Y and input variables X through a regression function f(X). Estimating f(X) allows for prediction of Y for new X values, and inference on how changes in the X variables impact Y. The relationship can be estimated using either parametric or non-parametric methods on a training data set. Parametric methods assume a form for f(X) with a finite number of parameters to estimate. Non-parametric methods make no assumptions about f(X) but require more data. The goal is to find an accurate estimate of f(X) that generalizes well to new data.

Uploaded by

Shubham Khattri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

134 views16 pages

Introduction To Statistical Learning

Uploaded by

Shubham Khattri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Introduction to Statistical Learning

(ISLR 2.1)

Yingbo Li

Clemson University

MATH 8050

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 1 / 16

Outline

1 Why Estimate f

2 How to Estimate f

3 Trade-Off: Prediction Accuracy and Model Interpretability

4 Supervised vs Unsupervised Learning

5 Regression vs Classification

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 2 / 16

Why Estimate f

The Advertising data

For n = 200 different markets
Sales: sales of the product in this market (Y )
TV: advertising budget for TV (X1 )
Radio: advertising budget for radio (X2 )
Newspaper: advertising budget for newspaper (X3 )

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 3 / 16

Why Estimate f

We believe that there is a relationship between Y and X

Y : output variable, response
X = (X1 , X2 , X3 ): input variables, predictors
25

25
20

20
Sales

Sales

Sales
15

15
10

10
5

5
0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100

TV Radio Newspaper

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 4 / 16

Why Estimate f

Model the relationship between Y and X

The regression function
Y = f (X) +

f : unknown function
: random error with mean zero, i.e., E() = 0.
In the Advertising example:

f (X1 , X2 , X3 ) = E(Y | X1 , X2 , X3 )

Statistical learning, and this course, are all about how to estimate f .
Why?
Prediction.
Inference.

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 5 / 16

Why Estimate f

Prediction
If we can get a good estimate for f, we can make accurate predictions for
the response Y , based on a new value of X.
For a new market, given three media budgets, what’s the sales?
Just want to predict sales, not to know which media is more
important.
Suppose our estimate for f is fˆ, the output Y for input X is predicted as

Ŷ = fˆ(X)

Mean square error

E(Y − Ŷ )2 = E[f (X) − fˆ(X)]2 + V ()

| {z } |{z}
Reducible Irreducible

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 6 / 16

Why Estimate f

Inference
We are often interested in understanding the relationship between that Y
and each of X1 , . . . , Xp . For example
1 Which predictors actually affect the response?
2 Is the relationship positive or negative?
3 Is the relationship a simple linear one or is it more complicated?

How much impact does TV budges have on the sales.

Which media generate the biggest boost in sales?

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 7 / 16

How to Estimate f

How to estimate f
Use the training data and a statistical method to estimate f .
We have observed a set of training data

{(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )},

where each xi = (xi,1 , xi,2 , . . . , xi,p )0 , and yi is a scalar.

Statistical learning methods:
I parametric
I non-parametric

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 8 / 16

How to Estimate f

Income vs Education, Seniority

Incom
e

y
rit
Ye

io
a

n
rs

Se
of
Ed
uc
ati
on

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 9 / 16

How to Estimate f

Parametric methods
Reduces the problem of estimating f down to one of estimating a (finite)
set of parameters. A two-step model based approach:
1 Come up with a model (some functional form assumption about f ).
The most common example is a linear model.

f (X) = β0 + β1 X1 + β2 X2 + · · · + βp Xp

I We only need to estimate p + 1 parameters β0 , β1 , . . . , βp .

I Although it is almost never correct, a linear model often serves as a
good and interpretable approximation to the unknown true f (X).
2 Use the training data to fit the model.
Estimate the unknown parameters such as β̂0 , β̂1 , . . . , β̂p .
I The most common approach is ordinary least squares (OLS).
I We will see later that there are other superior approaches.

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 10 / 16

How to Estimate f

A linear model fˆL (X) = β0 + β1 X gives a reasonable fit here.

A quadratic model fˆQ (X) = β0 + β1 X + β2 X 2 fits slightly better.

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 11 / 16

How to Estimate f

A linear regression fit to the Income data

Incom
e

ity
or
Ye

ni
ars

Se
of
Ed
uc
ati
on

Income = β0 + β1 × Education + β2 × Seniority

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 12 / 16

How to Estimate f

Non-parametric methods
They do not make explicit assumptions about the functional form of f .
Advantages: accurately fit a wider range of possible shapes of f .
Disadvantages: require large n to obtain an accurate estimate.

Incom
Incom

e
e

ity
ity

or
Ye
or

ni
a
ni

a rs

Se
rs
Se

of of
E Edu
du ca
ca tio
tio n
n

A smooth thin-plate spline fit: A rough thin-plate spline fit:

flexible overfitting

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 13 / 16

Trade-Off: Prediction Accuracy and Model Interpretability

Some trade-offs
Prediction accuracy vs model interpretability
Linear models are easy to interpret; thin-plate splines are not.
Good fit vs over-fit
A model that overfits the training data may not predict well.
High

Subset Selection
Lasso

Least Squares
Interpretability

Generalized Additive Models

Trees

Bagging, Boosting

Support Vector Machines

Low

Low High

Flexibility
Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 14 / 16
Supervised vs Unsupervised Learning

Supervised vs unsupervised learning

Supervised learning: both X and Y are available
Unsupervised learning: only X is available; there is no Y .
I Example: market segmentation where we try to divide potential
customers into groups based on their characteristics.
I A common approach is clustering.
12

8
10
8

6
X2

X2
6

4
4

2
2

0 2 4 6 8 10 12 0 2 4 6

X1 X1

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 15 / 16

Regression vs Classification

Regression vs classification
Regression: Y is continuous (quantitative).
I Predicting the value of the Dow in 6 months.
I Predicting the value of a given house based on various inputs.
Classification: Y is categorical (qualitative).
I Will the Dow be up (U) or down (D) in 6 months?
I Is this email a SPAM or not?

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 16 / 16

Forecasting Iron Ore Price
No ratings yet
Forecasting Iron Ore Price
12 pages
Machine Learning For Predictive Data Analytics PDF
No ratings yet
Machine Learning For Predictive Data Analytics PDF
45 pages
Introduction To Statistical Learning: With Applications in R
No ratings yet
Introduction To Statistical Learning: With Applications in R
13 pages
Data Interpretation Guide For All Competitive and Admission Exams
From Everand
Data Interpretation Guide For All Competitive and Admission Exams
Mohmmad Khaja Shareef
2.5/5 (6)
Week2 StatisticalLearning
No ratings yet
Week2 StatisticalLearning
46 pages
Ch2 Statistical Learning
No ratings yet
Ch2 Statistical Learning
51 pages
Capitulo 2 Big Data
No ratings yet
Capitulo 2 Big Data
25 pages
Islp 4
No ratings yet
Islp 4
5 pages
1 Statistical Learning
No ratings yet
1 Statistical Learning
42 pages
Merge
No ratings yet
Merge
240 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Machine Learning Lecture Notes Undergrad
No ratings yet
Machine Learning Lecture Notes Undergrad
19 pages
Intro To Data Science Lecture 1
No ratings yet
Intro To Data Science Lecture 1
7 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
Notes Stat Learning
No ratings yet
Notes Stat Learning
64 pages
AIML-Unit 5 Notes
No ratings yet
AIML-Unit 5 Notes
45 pages
7_Grandmaster_Stage1_Part1 for data science
No ratings yet
7_Grandmaster_Stage1_Part1 for data science
90 pages
Predictive Modelling Process: A First Tour
No ratings yet
Predictive Modelling Process: A First Tour
11 pages
00 Introduction
No ratings yet
00 Introduction
29 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
43 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Operational Foundation of Statistics
No ratings yet
Operational Foundation of Statistics
59 pages
Advanced Statistics Day 1
No ratings yet
Advanced Statistics Day 1
61 pages
Islp 5
No ratings yet
Islp 5
5 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
BTMMeeting25Nov2020 StatisticalLearning
No ratings yet
BTMMeeting25Nov2020 StatisticalLearning
49 pages
Lec-01-Introduction To Statistical Learning
No ratings yet
Lec-01-Introduction To Statistical Learning
38 pages
Lec 1
No ratings yet
Lec 1
54 pages
Machine Learning
No ratings yet
Machine Learning
92 pages
Day 2. Lecture - Machinelearning
No ratings yet
Day 2. Lecture - Machinelearning
32 pages
An Introduction To Statistical Learning PDF
No ratings yet
An Introduction To Statistical Learning PDF
35 pages
Chapter Simple Linear Regression 1
100% (1)
Chapter Simple Linear Regression 1
77 pages
F (A) P (X A) : Var (X) 0 If and Only If X Is A Constant Var (X) Var (X+Y) Var (X) + Var (Y) Var (X-Y)
No ratings yet
F (A) P (X A) : Var (X) 0 If and Only If X Is A Constant Var (X) Var (X+Y) Var (X) + Var (Y) Var (X-Y)
8 pages
Chapter 12
No ratings yet
Chapter 12
48 pages
SRM Formula Sheet-2
100% (1)
SRM Formula Sheet-2
11 pages
Linear Review 1
No ratings yet
Linear Review 1
235 pages
Ba 02
No ratings yet
Ba 02
26 pages
2.SupervisedLearning Error
No ratings yet
2.SupervisedLearning Error
32 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
54 pages
SML01
No ratings yet
SML01
53 pages
ISLR
No ratings yet
ISLR
9 pages
UnivariateRegression 3
No ratings yet
UnivariateRegression 3
81 pages
Springer Texts in Statistics: Series Editors
No ratings yet
Springer Texts in Statistics: Series Editors
14 pages
Lecture 2 - Removed
No ratings yet
Lecture 2 - Removed
19 pages
An Introduction To Statistical Learning
No ratings yet
An Introduction To Statistical Learning
19 pages
Chapter 2
No ratings yet
Chapter 2
5 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
Fundamentals Part 2
No ratings yet
Fundamentals Part 2
40 pages
Introduction 1
No ratings yet
Introduction 1
113 pages
ML Notes
No ratings yet
ML Notes
38 pages
Gary Chamberlain Econometric S
No ratings yet
Gary Chamberlain Econometric S
152 pages
Business Analytics
No ratings yet
Business Analytics
19 pages
5 Bivariate Data. Double The Data, Double The Fun: 5.1 Covariance and Correlation
No ratings yet
5 Bivariate Data. Double The Data, Double The Fun: 5.1 Covariance and Correlation
10 pages
DA Unit 3 Trio
No ratings yet
DA Unit 3 Trio
13 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
Lecture 1: Optimal Prediction (With Refreshers) : 36-401, Fall 2017 Sunday 3 September, 2017
No ratings yet
Lecture 1: Optimal Prediction (With Refreshers) : 36-401, Fall 2017 Sunday 3 September, 2017
13 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Regression Analysis: Ordinary Least Squares
No ratings yet
Regression Analysis: Ordinary Least Squares
12 pages
Partial Least Square
No ratings yet
Partial Least Square
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Compendium Iim Shillong Analytics and Prod Man
No ratings yet
Compendium Iim Shillong Analytics and Prod Man
68 pages
A Gradient Boosting Model To Predict The Milk Production
No ratings yet
A Gradient Boosting Model To Predict The Milk Production
8 pages
Data Pruning
No ratings yet
Data Pruning
52 pages
Midterm Paper - MSIT RSH1
No ratings yet
Midterm Paper - MSIT RSH1
4 pages
Bruno Gonçalves: Deep Learning From Scratch
No ratings yet
Bruno Gonçalves: Deep Learning From Scratch
95 pages
Data Science
No ratings yet
Data Science
64 pages
Kaolinite PX-WX
No ratings yet
Kaolinite PX-WX
44 pages
Nca Aiio
No ratings yet
Nca Aiio
13 pages
Logistic Regression Cia3
No ratings yet
Logistic Regression Cia3
14 pages
CNN Project
No ratings yet
CNN Project
16 pages
G14 (2) Removed
No ratings yet
G14 (2) Removed
33 pages
Foml Paper Solution 2
No ratings yet
Foml Paper Solution 2
34 pages
Exposys Data Labs Diabetes Disease Prediction: Shilpa J Shetty Nishma Nayana
No ratings yet
Exposys Data Labs Diabetes Disease Prediction: Shilpa J Shetty Nishma Nayana
13 pages
Solved With ChatGPT
No ratings yet
Solved With ChatGPT
3 pages
Generalization Error
No ratings yet
Generalization Error
9 pages
Machine Learning Midterm
No ratings yet
Machine Learning Midterm
18 pages
Food Science Journal - Kelompok 8
No ratings yet
Food Science Journal - Kelompok 8
10 pages
Kenny-230720-8 Unique Machine Learning Interview Questions About K Nearest Neighbors
No ratings yet
Kenny-230720-8 Unique Machine Learning Interview Questions About K Nearest Neighbors
3 pages
CI Syllabus
No ratings yet
CI Syllabus
2 pages
Geophysical Prospecting - 2024 - Li - One Dimensional Deep Learning Inversion of Marine Controlled Source Electromagnetic
No ratings yet
Geophysical Prospecting - 2024 - Li - One Dimensional Deep Learning Inversion of Marine Controlled Source Electromagnetic
21 pages
The Complete Guide To Machine Learning in Retail Demand Forecasting Links
100% (1)
The Complete Guide To Machine Learning in Retail Demand Forecasting Links
20 pages
Regression in Machine Learning
No ratings yet
Regression in Machine Learning
18 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
19 pages
Data Mining 1
No ratings yet
Data Mining 1
36 pages
Learning From Noisy Labels With Deep Neural Networks Survey
No ratings yet
Learning From Noisy Labels With Deep Neural Networks Survey
19 pages

Introduction To Statistical Learning

Uploaded by

Introduction To Statistical Learning

Uploaded by

Introduction to Statistical Learning

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 1 / 16

3 Trade-Off: Prediction Accuracy and Model Interpretability

4 Supervised vs Unsupervised Learning

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 2 / 16

The Advertising data

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 3 / 16

We believe that there is a relationship between Y and X

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 4 / 16

Model the relationship between Y and X

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 5 / 16

Mean square error

E(Y − Ŷ )2 = E[f (X) − fˆ(X)]2 + V ()

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 6 / 16

How much impact does TV budges have on the sales.

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 7 / 16

{(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )},

where each xi = (xi,1 , xi,2 , . . . , xi,p )0 , and yi is a scalar.

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 8 / 16

Income vs Education, Seniority

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 9 / 16

I We only need to estimate p + 1 parameters β0 , β1 , . . . , βp .

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 10 / 16

A linear model fˆL (X) = β0 + β1 X gives a reasonable fit here.

A quadratic model fˆQ (X) = β0 + β1 X + β2 X 2 fits slightly better.

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 11 / 16

A linear regression fit to the Income data

Income = β0 + β1 × Education + β2 × Seniority

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 12 / 16

A smooth thin-plate spline fit: A rough thin-plate spline fit:

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 13 / 16

Generalized Additive Models

Support Vector Machines

Supervised vs unsupervised learning

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 15 / 16

Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 16 / 16

You might also like

E(Y − Ŷ )2 = E[f (X) − fˆ(X)]2 + V ()