2018 02 Msu Data Science

Machine Learning with Python
Sebastian Raschka, Ph.D.

MSU Data Science workshop
East Lansing, Michigan State University • Feb 21, 2018
Today’s focus:
And if we have time, a quick

overview ...
2
Tutorial Material on GitHub:
https://github.com/rasbt/msu-datascience-ml-tutorial-2018
Contact:
o E-mail: mail@sebastianraschka.com
o Website: http://sebastianraschka.com
o Twitter: @rasbt
o GitHub: rasbt
3
Machine learning is used & useful (almost) anywhere
4
5
3 Types of Learning
Supervised Unsupervised
Reinforcement
6
Working with Labeled Data
?
Regression
y (“output”)
Supervised
x (“input”)
Learning
Classification
x2 (“input”)
?
x1 (“input”)
7
Working with Unlabeled Data
Clustering
Unsupervised
Learning
Compression
8
Topics
1. Introduction to Machine Learning

2. Linear Regression
3. Introduction to Classification
4. Feature Preprocessing & scikit-learn Pipelines
5. Dimensionality Reduction: Feature Selection & Extraction
6. Model Evaluation & Hyperparameter Tuning
9
Simple Linear Regression
y (response variable)
ŷ = w0 + w1x
vertical offset
|ŷ − y|
Δy
w1 (slope)
Δx = Δy / Δx
(xi, yi)
w0 (intercept)
x (explanatory variable)
10
Data Representation
Columns: features (explanatory variables, independent variables, covariates,
predictors, variables, inputs, attributes)
x0 x1 … xm y0
Rows: training examples

(observations, records,
y1
x0,0 x0,1
instances, samples)
x1,0 x1,1 y2 Targets (target
variable,response variable,
X=
x2,0
x3,0
x2,1
x3,1
y= y3
dependent variable, labels,
. ground truth)
. .
. .
.
yn
xn,0 xn,1 … xn,m
11
“Basic” Supervised Learning Workflow
Training Data
Training Labels
Data
1 Labels
Test Data
Test Labels
Hyperparameter
Training Data Values
2 Model
Training Labels Learning
Algorithm
Test Data
Prediction
3 Model
Performance
Test Labels
Hyperparameter
Data Values Final
4 Labels Learning Model
Algorithm
12
Jupyter Notebook
13
Topics

14
Scikit-learn API
class SupervisedEstimator(...):
def __init__(self, hyperparam, ...):
...
def fit(self, X, y):
...
return self
def predict(self, X):
...
return y_pred
def score(self, X, y):
...
return score
... 15
Iris Dataset
Iris-Setosa Iris-Versicolor Iris-Virginica
16
Iris Dataset
features (columns) petal
sepal
sepal sepal petal petal
length width lengt width
[cm] [cm] h [cm]
[cm]
setosa
samples (rows)
1 5.1 3.5 1.4 0.2
setosa
2 4.9 3.0 1.4 0.2
X= 50 6.4 3.5 4.5 1.2

y= versicolor
.
. .
. .
.
virginica
150 5.9 3.0 5.0 1.8
17
Note about Non-Stratified Splits
§ training set → 38 x Setosa, 28 x Versicolor, 34 x Virginica

§ test set → 12 x Setosa, 22 x Versicolor, 16 x Virginica
18
Linear Regression Recap
Activation
Bias 1 function
unit w0
x1 w1 z a Predicted
Σ y
output
x2 w2
..
. wm Net input
xm function
Weight
Input coefficients
values
19
Linear Regression Recap
Activation
Bias 1 function
unit w0
x1 w1 z a Predicted
Σ y
output
x2 w2
..
. wm Net input
xm function
Weight
coefficients Here: Identity
Input
function
values
20
Logistic Regression, a Generalized Linear Model
(a Classifier)
Activation
Bias 1 function
unit w0
x1 w1 z a Predicted
Σ y
class label
x2 w2
..
. wm Net input Unit step
xm function function
Weight
Predicted
Input coefficients
probability
values
21
A “Lazy Learner:” K-Nearest Neighbors Classifier
1×
1×
3×
x2
? Predict
? =
x1
22
Jupyter Notebook
23
There are many, many more classification
and regression algorithms ...
http://scikit-learn.org/stable/supervised_learning.html
24
Topics

25
Categorical Variables
class
color size price
label
red M $10.49 0
blue XL $15.00 1
green L $12.99 1
26
Encoding Categorical Variables (Ordinal vs Nominal)
color size price class label
red M $10.49 0
blue XL $15.00 1
green L $12.99 1
red blue green

size
1 0 0
0
0 1 0
2
0 0 1
1
27
Feature Normalization
Min-max scaling Z-score standardization
feature minmax z-score
1.0 0.0 -1.46385
2.0 0.2 -0.87831
3.0 0.4 -0.29277
4.0 0.6 0.29277
5.0 0.8 0.87831
6.0 1.0 1.46385
28
Scikit-learn API
class UnsupervisedEstimator(...):
def __init__(self, ...):
...
def fit(self, X):
...
return self
def transform(self, X):
...
return X_transf
def predict(self, X):
...
return pred
29
Scikit-learn Pipelines
Class labels
Test data
Training data Pipeline
predict
fit
Scaling
fit & transform transform

Dimensionality
Reduction
fit & transform

Learning transform
Algorithm
fit
Model Class labels
predict
30
Jupyter Notebook
31
Topics

32
Dimensionality Reduction – why?
[cm]
[cm]
[cm]
[cm]
[cm] [cm] [cm] [cm]

33
Dimensionality Reduction – why?
visualization &
interpretability
predictive performance
predictive performance
storage & speed
34
Recursive Feature Elimination
available features: [ f1 f2 f3 f4 ]
[ w1 w2 w3 w4 ]
fit model, remove lowest weight, repeat
[ w1 w2 w4 ]
[ w1 w4 ]
[ w4 ]
35
Sequential Feature Selection
available features: [ f1 f2 f3 f4 ]
[ f1 ] [ f2 ] [ f3 ] [ f4 ] fit model, pick best, repeat
[ f1 f3 ] [ f1 f2 ] [ f1 f4 ]
fit model, pick best, repeat
[ f1 f3 f4 ] [ f1 f3 f2 ]
36
Principal Component Analysis
x2
PC2
PC1
x1
37
Jupyter Notebook
38
Topics

39
“Basic” Supervised Learning Workflow
Training Data
Training Labels
Data
1 Labels
Test Data
Test Labels
Hyperparameter
2 Model
Training Labels Learning
Algorithm
Test Data
Prediction
3 Model
Performance
Test Labels
Hyperparameter
Data Values Final
4 Labels Learning Model
Algorithm
40
Holdout Method and Hyperparameter Tuning 1-3
Training Data
Training Labels
Validation
Validation Data Prediction
Data
1
Data
Validation
Performance
Labels
Labels Model Validation
Labels
Best
Test
Hyperparameter
Data
values
Test Validation
Labels Data Prediction
3 Performance
Best
Model Validation Model
Labels
Hyperparameter
values
Model
Validation
Data
Training Data Learning Prediction
2 Training Labels
Hyperparameter
values Algorithm Model Performance
Model Validation
Labels
Hyperparameter
Model
values
41
Holdout Method and Hyperparameter Tuning 4-6
Best
Hyperparameter
Validation
Training Data Data Values
4 Training Labels
Validation
Labels
Learning
Model
Algorithm
Test Data
Prediction
5 Model
Performance
Test Labels
Best
Hyperparameter
Values
6
Data Final
Labels Learning Model
Algorithm
42
K-fold Cross-Validation
Validation Training
Fold Fold
1st Performance 1
K Iterations (K-Folds)
2nd Performance 2
3rd Performance 3 Performance

10
= 101 ∑ Performance i
4th Performance 4 i=1
5th Performance 5
Training Fold Data

Prediction
Validation
Training Fold Labels Fold Data
Performance
Model
Hyperparameter
Values Validation
Fold Labels
Learning
Algorithm
Model
43
This work by Sebastian Raschka is licensed under a
K-fold Cross-Validation Workflow 1-3
Training Data
Training Labels
Data
1
Labels
Test Data
Test Labels
Hyperparameter
Model
values
Training Data
Hyperparameter
Learning
2 Training Labels
values Algorithm Model
Hyperparameter
Model
values
Best
Hyperparameter
3 Training Labels Learning
Model
Algorithm
44
K-fold Cross-Validation Workflow 4-5
Test Data
Prediction
4 Model
Performance
Test Labels
Best
Hyperparameter
Values
5
Data Final
Labels Learning Model
Algorithm
45
More info about model evaluation (one of the most
important topics in ML):
https://sebastianraschka.com/blog/index.html
• Model evaluation, model selection, and algorithm selection in machine learning Part I - The basics
• Model evaluation, model selection, and algorithm selection in machine learning Part II -
Bootstrapping and uncertainties
• Model evaluation, model selection, and algorithm selection in machine learning Part III - Cross-
validation and hyperparameter tuning
46
Jupyter Notebook
47
BONUS SLIDES
48
https://www.tensorflow.org
49
Figure 1: Example TensorFlow code fragm
C
TensorFlow:
Large-Scale Machine Learning on Heterogeneous Distributed Systems
(Preliminary White Paper, November 9, 2015)
Martı́n Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,
...
Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow,
Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser,
Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray,
Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar,
ReLU
Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals,
Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng
Google Research⇤
Abstract sequence prediction [47], move selection for Go [34], Add
pedestrian detection [2], reinforcement learning [38],
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf
TensorFlow [1] is an interface for expressing machine learn-
and other areas [17, 5]. In addition, often in close collab-
ing algorithms, and an implementation for executing such al-
oration with the Google Brain team, more than 50 teams
gorithms. A computation expressed using TensorFlow can be
executed with little or no change on a wide variety of hetero-
geneous systems, ranging from mobile devices such as phones
b
at Google and other Alphabet companies have deployed
deep neural networks using DistBelief in a wide variety MatMul
of products, including Google Search [11], our advertis-
and tablets up to large-scale distributed systems of hundreds
ing products, our speech recognition systems [50, 6, 46],
of machines and thousands of computational devices such as
Google Photos [43], Google Maps and StreetView [19],
GPU cards. The system is flexible and can be used to express
a wide variety of algorithms, including training and inference
algorithms for deep neural network models, and it has been
Google Translate [18], YouTube, and many others.
Based on our experience with DistBelief and a more W x
used for conducting research and for deploying machine learn- complete understanding of the desirable system proper-
ties and requirements for training and using neural net- 50
ing systems into production across more than a dozen areas of
at performing highly parallelized numerical computations. In addition, TensorFlow also
supports distributed systems as well as mobile computing platforms, including Android and
Apple’s iOS.
Tensors?
But what is a tensor? In simplifying terms, we can think of tensors as multidimensional
Index [0,2,1]
arrays of numbers, as a generalization of scalars, vectors,
Index [0,0] and matrices.
1. Scalar: R Index [2]

2. Vector: Rn
3. Matrix: Rn × Rm
4. 3-Tensor: Rn × Rm × Rp
5. …
rank 0 tensor rank 1 tensor rank 2 tensor rank 3 tensor
dimensions [ ] dimensions [5] dimensions [5, 3] dimensions [4, 4, 2]
When we describe tensors, we refer to its “dimensions” as the rank (or order) of a tensor,
scalar vector matrix
which is not to be confused with the dimensions of a matrix. For instance, an m × n matrix,
https://sebastianraschka.com/pdf/books/dlb/appendix_g_tensorflow.pdf
where m is the number of rows and n is the number of columns, would be a special case of
51
a rank-2 tensor. A visual explanation of tensors and their ranks is given is the figure below.
GPUs
52
Vectorization
X = np.random.random((num_train_examples, num_features))
W = np.random.random((num_features, num_hidden))
x =
53
Vectorization
x =
54
Computation Graphs
a(x, w, b) = relu(w*x + b)
u
v
b
x + v = u+b a = relu(v)
u = wx
w
*
55
Computation Graphs
import tensorflow as tf
g = tf.Graph()
with g.as_default() as g:
x = tf.placeholder(dtype=tf.float32, shape=None, name='x')

w = tf.Variable(initial_value=2, dtype=tf.float32, name='w')
b = tf.Variable(initial_value=1, dtype=tf.float32, name='b')
u = x * w
v = u + b
a = tf.nn.relu(v)
print(x, w, b, u, v, a)
Tensor("x:0", dtype=float32) <tf.Variable 'w:0' shape=() dtype=float32_ref> <tf.Variable

'b:0' shape=() dtype=float32_ref> Tensor("mul:0", dtype=float32) Tensor("add:0",
dtype=float32) Tensor("Relu:0", dtype=float32)
56
Computation Graphs
b=1
x + v = u+b a = relu(v)
u = wx
w=2
*
with tf.Session(graph=g) as sess:

sess.run(init_op)
b_res = sess.run(’b:0')
print(b_res)
1.0
57
() (+ () $#
(*
= (* (+
=1 =1
$% !"
=1
!#
b=1 7 7
x=3 6 + v = u+b a = relu(v)
u = wx
w=2
*
() (- () $#
= $& $&
=1
(, (, (- =3
$'
(- (+ ()
= = 3*1*1 = 3
(, (- (+ https://github.com/rasbt/pydata-annarbor2017-dl-tutorial 58
g = tf.Graph()
with g.as_default() as g:
x = tf.placeholder(dtype=tf.float32, shape=None, name='x')

w = tf.Variable(initial_value=2, dtype=tf.float32, name='w')
b = tf.Variable(initial_value=1, dtype=tf.float32, name='b')
u = x * w
v = u + b
a = tf.nn.relu(v)
d_a_w = tf.gradients(a, w)
d_b_w = tf.gradients(a, b)
with tf.Session(graph=g) as sess:

sess.run(tf.global_variables_initializer())
res = sess.run([d_a_w, d_b_w], feed_dict={'x:0': 3})
[3.0] [1.0] 59
http://pytorch.org
60
import torch
import torch.nn.functional as F
from torch.autograd import Variable
from torch.autograd import grad
x = Variable(torch.Tensor([3]))
w = Variable(torch.Tensor([2]), requires_grad=True)
b = Variable(torch.Tensor([1]), requires_grad=True)
u = x * w
v = u + b
a = F.relu(v)
partial_derivatives = grad(a, (w, b))
for name, grad in zip("wb", (partial_derivatives)):

print('d_a_%s:' % name, grad)
d_a_w: Variable containing:

3
[torch.FloatTensor of size 1]
d_a_b: Variable containing:

1
[torch.FloatTensor of size 1]
61
Multilayer Perceptron
https://github.com/rasbt/python-machine-learning-book-2nd-edition/blob/master/code/ch12/images/12_02.png
62
g = tf.Graph() class MultilayerPerceptron(torch.nn.Module):
with g.as_default():
def __init__(self, num_features, num_classes):
# Input data super(MultilayerPerceptron, self).__init__()
tf_x = tf.placeholder(tf.float32, [None, n_input], name='features')
tf_y = tf.placeholder(tf.float32, [None, n_classes], name='targets') ### 1st hidden layer
self.linear_1 = torch.nn.Linear(num_features, num_hidden_1)
# Model parameters
weights = { ### Output layer
'h1': tf.Variable(tf.truncated_normal([n_input, n_hidden_1], stddev=0.1)), self.linear_out = torch.nn.Linear(num_hidden_2, num_classes)
'out': tf.Variable(tf.truncated_normal([n_hidden_2, n_classes], stddev=0.1))
}
biases = {
def forward(self, x):
'b1': tf.Variable(tf.zeros([n_hidden_1])), out = self.linear_1(x)
'out': tf.Variable(tf.zeros([n_classes])) out = F.relu(out)
} logits = self.linear_out(out)
probas = F.softmax(logits, dim=1)
# Multilayer perceptron return logits, probas
layer_1 = tf.add(tf.matmul(tf_x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1) model = MultilayerPerceptron(num_features=num_features,
out_layer = tf.matmul(layer_1, weights['out']) + biases['out'] num_classes=num_classes)
# Loss and optimizer if torch.cuda.is_available():

loss = tf.nn.softmax_cross_entropy_with_logits(logits=out_layer, labels=tf_y) model.cuda()
cost = tf.reduce_mean(loss, name='cost')
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate) for epoch in range(num_epochs):
train = optimizer.minimize(cost, name='train') for batch_idx, (features, targets) in enumerate(train_loader):
# Prediction
correct_prediction = tf.equal(tf.argmax(tf_y, 1), tf.argmax(out_layer, 1))
features = Variable(features.view(-1, 28*28))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name='accuracy') targets = Variable(targets)
with tf.Session(graph=g) as sess: if torch.cuda.is_available():

sess.run(tf.global_variables_initializer()) features, targets = features.cuda(), targets.cuda()
for epoch in range(training_epochs): ### FORWARD AND BACK PROP

avg_cost = 0. logits, probas = model(features)
total_batch = mnist.train.num_examples // batch_size cost = cost_fn(logits, targets)
optimizer.zero_grad()
for i in range(total_batch):
batch_x, batch_y = mnist.train.next_batch(batch_size) cost.backward()
_, c = sess.run(['train', 'cost:0'], feed_dict={'features:0': batch_x,
'targets:0': batch_y}) ### UPDATE MODEL PARAMETERS
optimizer.step()
63
Further Resources
Math-heavy Math-free scikit-learn intro Mix of code & math

(~60% scikit-learn)
64
Thanks for attending!
Tutorial Material on GitHub:

https://github.com/rasbt/msu-datascience-ml-tutorial-2018
Contact:
o E-mail: mail@sebastianraschka.com
o Website: http://sebastianraschka.com
o Twitter: @rasbt
65
o GitHub: rasbt

2018 02 Msu Data Science

Uploaded by

Copyright:

Available Formats

2018 02 Msu Data Science

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2018 02 Msu Data Science

Uploaded by

Copyright:

Available Formats

Machine Learning with Python

Sebastian Raschka, Ph.D.

And if we have time, a quick

1. Introduction to Machine Learning

Rows: training examples

1. Introduction to Machine Learning

Iris-Setosa Iris-Versicolor Iris-Virginica

X= 50 6.4 3.5 4.5 1.2

§ training set → 38 x Setosa, 28 x Versicolor, 34 x Virginica

1. Introduction to Machine Learning

red blue green

Min-max scaling Z-score standardization

feature minmax z-score

1.0 0.0 -1.46385

2.0 0.2 -0.87831

3.0 0.4 -0.29277

4.0 0.6 0.29277

5.0 0.8 0.87831

6.0 1.0 1.46385

fit & transform transform

fit & transform

1. Introduction to Machine Learning

[cm] [cm] [cm] [cm]

storage & speed

[ f1 ] [ f2 ] [ f3 ] [ f4 ] fit model, pick best, repeat

1. Introduction to Machine Learning

3rd Performance 3 Performance

Training Fold Data

1. Scalar: R Index [2]

x = tf.placeholder(dtype=tf.float32, shape=None, name='x')

Tensor("x:0", dtype=float32) <tf.Variable 'w:0' shape=() dtype=float32_ref> <tf.Variable

with tf.Session(graph=g) as sess:

x = tf.placeholder(dtype=tf.float32, shape=None, name='x')

with tf.Session(graph=g) as sess:

partial_derivatives = grad(a, (w, b))

for name, grad in zip("wb", (partial_derivatives)):

d_a_w: Variable containing:

d_a_b: Variable containing:

# Loss and optimizer if torch.cuda.is_available():

with tf.Session(graph=g) as sess: if torch.cuda.is_available():

for epoch in range(training_epochs): ### FORWARD AND BACK PROP

Math-heavy Math-free scikit-learn intro Mix of code & math

Tutorial Material on GitHub:

You might also like