2018 02 Msu Data Science
2018 02 Msu Data Science
2018 02 Msu Data Science
Contact:
o E-mail: mail@sebastianraschka.com
o Website: http://sebastianraschka.com
o Twitter: @rasbt
o GitHub: rasbt
3
Machine learning is used & useful (almost) anywhere
4
5
3 Types of Learning
Supervised Unsupervised
Reinforcement
6
Working with Labeled Data
?
Regression
y (“output”)
Supervised
x (“input”)
Learning
Classification
x2 (“input”)
?
x1 (“input”)
7
Working with Unlabeled Data
Clustering
Unsupervised
Learning
Compression
8
Topics
9
Simple Linear Regression
y (response variable)
ŷ = w0 + w1x
vertical offset
|ŷ − y|
Δy
w1 (slope)
Δx = Δy / Δx
(xi, yi)
w0 (intercept)
x (explanatory variable)
10
Data Representation
Columns: features (explanatory variables, independent variables, covariates,
predictors, variables, inputs, attributes)
x0 x1 … xm y0
instances, samples)
x1,0 x1,1 y2 Targets (target
variable,response variable,
X=
x2,0
x3,0
x2,1
x3,1
y= y3
dependent variable, labels,
. ground truth)
. .
. .
.
yn
xn,0 xn,1 … xn,m
11
“Basic” Supervised Learning Workflow
Training Data
Training Labels
Data
1 Labels
Test Data
Test Labels
Hyperparameter
Training Data Values
2 Model
Training Labels Learning
Algorithm
Test Data
Prediction
3 Model
Performance
Test Labels
Hyperparameter
Data Values Final
4 Labels Learning Model
Algorithm
12
Jupyter Notebook
13
Topics
14
Scikit-learn API
class SupervisedEstimator(...):
def __init__(self, hyperparam, ...):
...
def fit(self, X, y):
...
return self
def predict(self, X):
...
return y_pred
def score(self, X, y):
...
return score
... 15
Iris Dataset
16
Iris Dataset
features (columns) petal
sepal
sepal sepal petal petal
length width lengt width
[cm] [cm] h [cm]
[cm]
setosa
samples (rows)
1 5.1 3.5 1.4 0.2
setosa
2 4.9 3.0 1.4 0.2
18
Linear Regression Recap
Activation
Bias 1 function
unit w0
x1 w1 z a Predicted
Σ y
output
x2 w2
..
. wm Net input
xm function
Weight
Input coefficients
values
19
Linear Regression Recap
Activation
Bias 1 function
unit w0
x1 w1 z a Predicted
Σ y
output
x2 w2
..
. wm Net input
xm function
Weight
coefficients Here: Identity
Input
function
values
20
Logistic Regression, a Generalized Linear Model
(a Classifier)
Activation
Bias 1 function
unit w0
x1 w1 z a Predicted
Σ y
class label
x2 w2
..
. wm Net input Unit step
xm function function
Weight
Predicted
Input coefficients
probability
values
21
A “Lazy Learner:” K-Nearest Neighbors Classifier
1×
1×
3×
x2
? Predict
? =
x1
22
Jupyter Notebook
23
There are many, many more classification
and regression algorithms ...
http://scikit-learn.org/stable/supervised_learning.html
24
Topics
25
Categorical Variables
class
color size price
label
red M $10.49 0
blue XL $15.00 1
green L $12.99 1
26
Encoding Categorical Variables (Ordinal vs Nominal)
color size price class label
red M $10.49 0
blue XL $15.00 1
green L $12.99 1
27
Feature Normalization
28
Scikit-learn API
class UnsupervisedEstimator(...):
def __init__(self, ...):
...
def fit(self, X):
...
return self
def transform(self, X):
...
return X_transf
def predict(self, X):
...
return pred
29
Scikit-learn Pipelines
Class labels
Test data
Training data Pipeline
predict
fit
Scaling
30
Jupyter Notebook
31
Topics
32
Dimensionality Reduction – why?
[cm]
[cm]
[cm]
[cm]
predictive performance
predictive performance
34
Recursive Feature Elimination
available features: [ f1 f2 f3 f4 ]
[ w1 w2 w3 w4 ]
fit model, remove lowest weight, repeat
[ w1 w2 w4 ]
fit model, remove lowest weight, repeat
[ w1 w4 ]
fit model, remove lowest weight, repeat
[ w4 ]
35
Sequential Feature Selection
available features: [ f1 f2 f3 f4 ]
[ f1 f3 ] [ f1 f2 ] [ f1 f4 ]
fit model, pick best, repeat
[ f1 f3 f4 ] [ f1 f3 f2 ]
36
Principal Component Analysis
x2
PC2
PC1
x1
37
Jupyter Notebook
38
Topics
39
“Basic” Supervised Learning Workflow
Training Data
Training Labels
Data
1 Labels
Test Data
Test Labels
Hyperparameter
Training Data Values
2 Model
Training Labels Learning
Algorithm
Test Data
Prediction
3 Model
Performance
Test Labels
Hyperparameter
Data Values Final
4 Labels Learning Model
Algorithm
40
Holdout Method and Hyperparameter Tuning 1-3
Training Data
Training Labels
Validation
Validation Data Prediction
Data
1
Data
Validation
Performance
Labels
Labels Model Validation
Labels
Best
Test
Hyperparameter
Data
values
Test Validation
Labels Data Prediction
3 Performance
Best
Model Validation Model
Labels
Hyperparameter
values
Model
Validation
Data
Training Data Learning Prediction
2 Training Labels
Hyperparameter
values Algorithm Model Performance
Model Validation
Labels
Hyperparameter
Model
values
41
Holdout Method and Hyperparameter Tuning 4-6
Best
Hyperparameter
Validation
Training Data Data Values
4 Training Labels
Validation
Labels
Learning
Model
Algorithm
Test Data
Prediction
5 Model
Performance
Test Labels
Best
Hyperparameter
Values
6
Data Final
Labels Learning Model
Algorithm
42
K-fold Cross-Validation
Validation Training
Fold Fold
1st Performance 1
K Iterations (K-Folds)
2nd Performance 2
5th Performance 5
Performance
Model
Hyperparameter
Values Validation
Fold Labels
Learning
Algorithm
Model
43
This work by Sebastian Raschka is licensed under a
K-fold Cross-Validation Workflow 1-3
Training Data
Training Labels
Data
1
Labels
Test Data
Test Labels
Hyperparameter
Model
values
Training Data
Hyperparameter
Learning
2 Training Labels
values Algorithm Model
Hyperparameter
Model
values
Best
Hyperparameter
Training Data Values
3 Training Labels Learning
Model
Algorithm
44
K-fold Cross-Validation Workflow 4-5
Test Data
Prediction
4 Model
Performance
Test Labels
Best
Hyperparameter
Values
5
Data Final
Labels Learning Model
Algorithm
45
More info about model evaluation (one of the most
important topics in ML):
https://sebastianraschka.com/blog/index.html
• Model evaluation, model selection, and algorithm selection in machine learning Part I - The basics
• Model evaluation, model selection, and algorithm selection in machine learning Part II -
Bootstrapping and uncertainties
• Model evaluation, model selection, and algorithm selection in machine learning Part III - Cross-
validation and hyperparameter tuning
46
Jupyter Notebook
47
BONUS SLIDES
48
https://www.tensorflow.org
49
Figure 1: Example TensorFlow code fragm
C
TensorFlow:
Large-Scale Machine Learning on Heterogeneous Distributed Systems
(Preliminary White Paper, November 9, 2015)
Martı́n Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,
...
Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow,
Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser,
Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray,
Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar,
ReLU
Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals,
Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng
Google Research⇤
Abstract sequence prediction [47], move selection for Go [34], Add
pedestrian detection [2], reinforcement learning [38],
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf
TensorFlow [1] is an interface for expressing machine learn-
and other areas [17, 5]. In addition, often in close collab-
ing algorithms, and an implementation for executing such al-
oration with the Google Brain team, more than 50 teams
gorithms. A computation expressed using TensorFlow can be
executed with little or no change on a wide variety of hetero-
geneous systems, ranging from mobile devices such as phones
b
at Google and other Alphabet companies have deployed
deep neural networks using DistBelief in a wide variety MatMul
of products, including Google Search [11], our advertis-
and tablets up to large-scale distributed systems of hundreds
ing products, our speech recognition systems [50, 6, 46],
of machines and thousands of computational devices such as
Google Photos [43], Google Maps and StreetView [19],
GPU cards. The system is flexible and can be used to express
a wide variety of algorithms, including training and inference
algorithms for deep neural network models, and it has been
Google Translate [18], YouTube, and many others.
Based on our experience with DistBelief and a more W x
used for conducting research and for deploying machine learn- complete understanding of the desirable system proper-
ties and requirements for training and using neural net- 50
ing systems into production across more than a dozen areas of
at performing highly parallelized numerical computations. In addition, TensorFlow also
supports distributed systems as well as mobile computing platforms, including Android and
Apple’s iOS.
Tensors?
But what is a tensor? In simplifying terms, we can think of tensors as multidimensional
Index [0,2,1]
arrays of numbers, as a generalization of scalars, vectors,
Index [0,0] and matrices.
which is not to be confused with the dimensions of a matrix. For instance, an m × n matrix,
https://sebastianraschka.com/pdf/books/dlb/appendix_g_tensorflow.pdf
where m is the number of rows and n is the number of columns, would be a special case of
51
a rank-2 tensor. A visual explanation of tensors and their ranks is given is the figure below.
GPUs
52
Vectorization
X = np.random.random((num_train_examples, num_features))
W = np.random.random((num_features, num_hidden))
x =
53
Vectorization
x =
54
Computation Graphs
a(x, w, b) = relu(w*x + b)
u
v
b
x + v = u+b a = relu(v)
u = wx
w
*
55
Computation Graphs
import tensorflow as tf
g = tf.Graph()
with g.as_default() as g:
u = x * w
v = u + b
a = tf.nn.relu(v)
print(x, w, b, u, v, a)
56
Computation Graphs
b=1
x + v = u+b a = relu(v)
u = wx
w=2
*
print(b_res)
1.0
57
() (+ () $#
(*
= (* (+
=1 =1
$% !"
=1
!#
b=1 7 7
x=3 6 + v = u+b a = relu(v)
u = wx
w=2
*
() (- () $#
= $& $&
=1
(, (, (- =3
$'
(- (+ ()
= = 3*1*1 = 3
(, (- (+ https://github.com/rasbt/pydata-annarbor2017-dl-tutorial 58
g = tf.Graph()
with g.as_default() as g:
u = x * w
v = u + b
a = tf.nn.relu(v)
d_a_w = tf.gradients(a, w)
d_b_w = tf.gradients(a, b)
[3.0] [1.0] 59
http://pytorch.org
60
import torch
import torch.nn.functional as F
from torch.autograd import Variable
from torch.autograd import grad
x = Variable(torch.Tensor([3]))
w = Variable(torch.Tensor([2]), requires_grad=True)
b = Variable(torch.Tensor([1]), requires_grad=True)
u = x * w
v = u + b
a = F.relu(v)
https://github.com/rasbt/python-machine-learning-book-2nd-edition/blob/master/code/ch12/images/12_02.png
62
g = tf.Graph() class MultilayerPerceptron(torch.nn.Module):
with g.as_default():
def __init__(self, num_features, num_classes):
# Input data super(MultilayerPerceptron, self).__init__()
tf_x = tf.placeholder(tf.float32, [None, n_input], name='features')
tf_y = tf.placeholder(tf.float32, [None, n_classes], name='targets') ### 1st hidden layer
self.linear_1 = torch.nn.Linear(num_features, num_hidden_1)
# Model parameters
weights = { ### Output layer
'h1': tf.Variable(tf.truncated_normal([n_input, n_hidden_1], stddev=0.1)), self.linear_out = torch.nn.Linear(num_hidden_2, num_classes)
'out': tf.Variable(tf.truncated_normal([n_hidden_2, n_classes], stddev=0.1))
}
biases = {
def forward(self, x):
'b1': tf.Variable(tf.zeros([n_hidden_1])), out = self.linear_1(x)
'out': tf.Variable(tf.zeros([n_classes])) out = F.relu(out)
} logits = self.linear_out(out)
probas = F.softmax(logits, dim=1)
# Multilayer perceptron return logits, probas
layer_1 = tf.add(tf.matmul(tf_x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1) model = MultilayerPerceptron(num_features=num_features,
out_layer = tf.matmul(layer_1, weights['out']) + biases['out'] num_classes=num_classes)
64
Thanks for attending!
Contact:
o E-mail: mail@sebastianraschka.com
o Website: http://sebastianraschka.com
o Twitter: @rasbt
65
o GitHub: rasbt