Nnet - Ug 1 150 PDF
Nnet - Ug 1 150 PDF
Nnet - Ug 1 150 PDF
User's Guide
R2018b
How to Contact MathWorks
Phone: 508-647-7000
Deep Networks
1
Deep Learning in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
What Is Deep Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Try Deep Learning in 10 Lines of MATLAB Code . . . . . . . . . . . 1-5
Start Deep Learning Faster Using Transfer Learning . . . . . . . 1-7
Train Classifiers Using Features Extracted from
Pretrained Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
Deep Learning with Big Data on CPUs, GPUs, in Parallel, and on
the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
v
Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-41
Batch Normalization Layer . . . . . . . . . . . . . . . . . . . . . . . . . . 1-46
ReLU Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-47
Cross Channel Normalization (Local Response Normalization)
Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-48
Max and Average Pooling Layers . . . . . . . . . . . . . . . . . . . . . 1-48
Dropout Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-49
Fully Connected Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-49
Output Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-50
vi Contents
Create Forward Functions . . . . . . . . . . . . . . . . . . . . . . . . . 1-100
Create Backward Function . . . . . . . . . . . . . . . . . . . . . . . . . 1-102
Completed Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-104
GPU Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-105
Check Validity of Layer Using checkLayer . . . . . . . . . . . . . . 1-106
Include Custom Layer in Network . . . . . . . . . . . . . . . . . . . . 1-107
vii
List of Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-143
Generated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-145
Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-146
viii Contents
Build Networks with Deep Network Designer . . . . . . . . . . . . . 2-16
Open the App and Import Networks . . . . . . . . . . . . . . . . . . . 2-16
Create and Edit Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
Check Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18
Export Network for Training . . . . . . . . . . . . . . . . . . . . . . . . . 2-19
ix
Neural Network Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 4-11
One Layer of Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11
Multiple Layers of Neurons . . . . . . . . . . . . . . . . . . . . . . . . . 4-13
Input and Output Processing Functions . . . . . . . . . . . . . . . . 4-15
x Contents
Create, Configure, and Initialize Multilayer Shallow Neural
Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14
Other Related Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
Initializing Weights (init) . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
xi
Neural Network Time-Series Utilities . . . . . . . . . . . . . . . . . . . 6-42
Control Systems
7
Introduction to Neural Network Control Systems . . . . . . . . . . 7-2
xii Contents
Radial Basis Neural Networks
8
Introduction to Radial Basis Neural Networks . . . . . . . . . . . . . 8-2
Important Radial Basis Functions . . . . . . . . . . . . . . . . . . . . . . 8-2
xiii
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17
Create a Self-Organizing Map Neural Network
(selforgmap) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-18
Training (learnsomb) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-19
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-22
Advanced Topics
11
Neural Networks with Parallel and GPU Computing . . . . . . . 11-2
Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Modes of Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Distributed Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3
Single GPU Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-5
Distributed GPU Computing . . . . . . . . . . . . . . . . . . . . . . . . . 11-8
Parallel Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-10
Parallel Availability, Fallbacks, and Feedback . . . . . . . . . . . 11-10
xiv Contents
Optimize Neural Network Training Speed and Memory . . . . 11-12
Memory Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12
Fast Elliot Sigmoid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12
xv
Deploy Training of Neural Networks . . . . . . . . . . . . . . . . . . . 11-68
xvi Contents
Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-16
Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-22
Biases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-24
Input Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-25
Layer Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-27
Bibliography
14
Deep Learning Toolbox Bibliography . . . . . . . . . . . . . . . . . . . . 14-2
Mathematical Notation
A
Mathematics and Code Equivalents . . . . . . . . . . . . . . . . . . . . . . A-2
Mathematics Notation to MATLAB Notation . . . . . . . . . . . . . . A-2
Figure Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
xvii
Code Notes
C
Deep Learning Toolbox Data Conventions . . . . . . . . . . . . . . . . . C-2
Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2
xviii Contents
1
Deep Networks
In this section...
“What Is Deep Learning?” on page 1-2
“Try Deep Learning in 10 Lines of MATLAB Code” on page 1-5
“Start Deep Learning Faster Using Transfer Learning” on page 1-7
“Train Classifiers Using Features Extracted from Pretrained Networks” on page 1-8
“Deep Learning with Big Data on CPUs, GPUs, in Parallel, and on the Cloud” on page 1-
8
Deep Learning Toolbox provides simple MATLAB commands for creating and
interconnecting the layers of a deep neural network. Examples and pretrained networks
make it easy to use MATLAB for deep learning, even without knowledge of advanced
computer vision algorithms or neural networks.
For a free hands-on introduction to practical deep learning methods, see Deep Learning
Onramp.
1-2
Deep Learning in MATLAB
1-3
1 Deep Networks
To learn more about deep learning application areas, including automated driving, see
“Deep Learning Applications”.
To choose whether to use a pretrained network or create a new deep network, consider
the scenarios in this table.
Deep learning uses neural networks to learn useful representations of features directly
from data. Neural networks combine multiple nonlinear processing layers, using simple
elements operating in parallel and inspired by biological nervous systems. Deep learning
models can achieve state-of-the-art accuracy in object classification, sometimes exceeding
human-level performance.
You train models using a large set of labeled data and neural network architectures that
contain many layers, usually including some convolutional layers. Training these models
is computationally intensive and you can usually accelerate training by using a high
performance GPU. This diagram shows how convolutional neural networks combine layers
that automatically learn features from many images to classify new images.
1-4
Deep Learning in MATLAB
Many deep learning applications use image files, and sometimes millions of image files. To
access many image files for deep learning efficiently, MATLAB provides the
imageDatastore function. Use this function to:
• Automatically read batches of images for faster processing in machine learning and
computer vision applications
• Import data from image collections that are too large to fit in memory
• Label your image data automatically based on folder names
1 Run these commands to get the downloads if needed, connect to the webcam, and get
a pretrained neural network.
camera = webcam; % Connect to the camera
net = alexnet; % Load the neural network
The webcam and alexnet functions provide a link to help you download the free add-
ons using Add-On Explorer. Alternatively, see Deep Learning Toolbox Model for
AlexNet Network and MATLAB Support Package for USB Webcams.
You can use alexnet to classify images. AlexNet is a pretrained convolutional neural
network (CNN) that has been trained on more than a million images and can classify
images into 1000 object categories (for example, keyboard, mouse, coffee mug,
pencil, and many animals).
2 To show and classify live images, run the following code. Point the webcam at an
object and the neural network reports what class of object it thinks the webcam is
showing. It keeps classifying images until you press Ctrl+C. The code resizes the
image for the network using imresize.
while true
im = snapshot(camera); % Take a picture
image(im); % Show the picture
im = imresize(im,[227 227]); % Resize the picture for alexnet
label = classify(net,im); % Classify the picture
title(char(label)); % Show the class label
drawnow
end
1-5
1 Deep Networks
In this example, the network correctly classifies a coffee mug. Experiment with
objects in your surroundings to see how accurate the network is.
To watch a video of this example, see Deep Learning in 11 Lines of MATLAB Code.
To get the code to extend this example to show the probability scores of classes, see
“Classify Webcam Images Using Deep Learning”.
For next steps in deep learning, you can use the pretrained network for other tasks. Solve
new classification problems on your image data with transfer learning or feature
extraction. For examples, see “Start Deep Learning Faster Using Transfer Learning” on
page 1-7 and “Train Classifiers Using Features Extracted from Pretrained Networks”
on page 1-8. To try other pretrained networks, see “Pretrained Convolutional Neural
Networks” on page 1-21.
1-6
Deep Learning in MATLAB
For example, if you take a network trained on thousands or millions of images, you can
retrain it for new object detection using only hundreds of images. You can effectively fine-
tune a pretrained network with much smaller data sets than the original training data. If
you have a very large dataset, then transfer learning might not be faster than training a
new network.
For an interactive example, see “Transfer Learning with Deep Network Designer” on page
2-2.
For programmatic examples, see “Get Started with Transfer Learning”, “Transfer
Learning Using AlexNet”, and “Train Deep Learning Network to Classify New Images”.
1-7
1 Deep Networks
Training deep networks is extremely computationally intensive and you can usually
accelerate training by using a high performance GPU. If you do not have a suitable GPU,
you can train on one or more CPU cores instead. You can train a convolutional neural
network on a single GPU or CPU, or on multiple GPUs or CPU cores, or in parallel on a
cluster. Using GPU or parallel options requires Parallel Computing Toolbox.
You do not need multiple computers to solve problems using data sets too large to fit in
memory. You can use the imageDatastore function to work with batches of data without
needing a cluster of machines. However, if you have a cluster available, it can be helpful
to take your code to the data repository rather than moving large amounts of data around.
To learn more about deep learning hardware and memory settings, see “Deep Learning
with Big Data on GPUs and in Parallel” on page 1-13.
1-8
See Also
See Also
Related Examples
• “Classify Webcam Images Using Deep Learning”
• “Transfer Learning with Deep Network Designer” on page 2-2
• “Train Deep Learning Network to Classify New Images”
• “Pretrained Convolutional Neural Networks” on page 1-21
• “Create Simple Deep Learning Network for Classification”
• “Deep Learning with Big Data on GPUs and in Parallel” on page 1-13
• “Deep Learning, Semantic Segmentation, and Detection” (Computer Vision System
Toolbox)
• “Classify Text Data Using Deep Learning”
• “Deep Learning Tips and Tricks” on page 1-60
1-9
1 Deep Networks
1 Run these commands to get the downloads if needed, connect to the webcam, and get
a pretrained neural network.
If you need to install the webcam and alexnet add-ons, a message from each
function appears with a link to help you download the free add-ons using Add-On
Explorer. Alternatively, see Deep Learning Toolbox Model for AlexNet Network and
MATLAB Support Package for USB Webcams.
After you install Deep Learning Toolbox Model for AlexNet Network, you can use it to
classify images. AlexNet is a pretrained convolutional neural network (CNN) that has
been trained on more than a million images and can classify images into 1000 object
categories (for example, keyboard, mouse, coffee mug, pencil, and many animals).
2 Run the following code to show and classify live images. Point the webcam at an
object and the neural network reports what class of object it thinks the webcam is
showing. It will keep classifying images until you press Ctrl+C. The code resizes the
image for the network using imresize.
while true
im = snapshot(camera); % Take a picture
image(im); % Show the picture
im = imresize(im,[227 227]); % Resize the picture for alexnet
label = classify(net,im); % Classify the picture
title(char(label)); % Show the class label
drawnow
end
In this example, the network correctly classifies a coffee mug. Experiment with
objects in your surroundings to see how accurate the network is.
1-10
Try Deep Learning in 10 Lines of MATLAB Code
To watch a video of this example, see Deep Learning in 11 Lines of MATLAB Code.
To get the code to extend this example to show the probability scores of classes, see
“Classify Webcam Images Using Deep Learning”.
For next steps in deep learning, you can use the pretrained network for other tasks. Solve
new classification problems on your image data with transfer learning or feature
extraction. For examples, see “Start Deep Learning Faster Using Transfer Learning” on
page 1-7 and “Train Classifiers Using Features Extracted from Pretrained Networks” on
page 1-8. To try other pretrained networks, see “Pretrained Convolutional Neural
Networks” on page 1-21.
1-11
1 Deep Networks
See Also
Related Examples
• “Classify Webcam Images Using Deep Learning”
• “Get Started with Transfer Learning”
1-12
Deep Learning with Big Data on GPUs and in Parallel
Tip GPU support is automatic if you have Parallel Computing Toolbox. By default, the
trainNetwork function uses a GPU if available.
If you have access to a machine with multiple GPUs, then simply specify the training
option 'ExecutionEnvironment','multi-gpu'.
You do not need multiple computers to solve problems using data sets too large to fit in
memory. You can use the augmentedImageDatastore function to work with batches of
data without needing a cluster of machines. For an example, see “Train Network with
Augmented Images”. However, if you have a cluster available, it can be helpful to take
your code to the data repository rather than moving large amounts of data around.
1-13
1 Deep Networks
1-14
Deep Learning with Big Data on GPUs and in Parallel
Tip To learn more, see “Scale Up Deep Learning in Parallel and in the Cloud” on page 3-
2.
All functions for deep learning training, prediction, and validation in Deep Learning
Toolbox perform computations using single-precision, floating-point arithmetic. Functions
for deep learning include trainNetwork, predict, classify, and activations. The
software uses single-precision arithmetic when you train networks using both CPUs and
GPUs.
1-15
1 Deep Networks
Convolutional neural networks are typically trained iteratively using batches of images.
This is done because the whole dataset is too large to fit into GPU memory. For optimum
performance, you can experiment with the MiniBatchSize option that you specify with
the trainingOptions function.
The optimal batch size depends on your exact network, dataset, and GPU hardware. When
training with multiple GPUs, each image batch is distributed between the GPUs. This
effectively increases the total GPU memory available, allowing larger batch sizes.
Because it improves the significance of each batch, you can increase the learning rate. A
good general guideline is to increase the learning rate proportionally to the increase in
batch size. Depending on your application, a larger batch size and learning rate can speed
up training without a decrease in accuracy, up to some limit.
Using multiple GPUs can speed up training significantly. To decide if you expect multi-
GPU training to deliver a performance gain, consider the following factors:
• How long is the iteration on each GPU? If each GPU iteration is short, then the added
overhead of communication between GPUs can dominate. Try increasing the
computation per iteration by using a larger batch size.
• Are all the GPUs on a single machine? Communication between GPUs on different
machines introduces a significant communication delay. You can mitigate this if you
have suitable hardware. For more information, see “Advanced Support for Fast Multi-
Node GPU Communication” on page 3-5.
To learn more, see “Scale Up Deep Learning in Parallel and in the Cloud” on page 3-2
and “Select Particular GPUs to Use for Training” on page 3-7.
1-16
See Also
You can accelerate training by using multiple GPUs on a single machine or in a cluster of
machines with multiple GPUs. Train a single network using multiple GPUs, or train
multiple models at once on the same data.
For more information on the complete cloud workflow, see “Deep Learning in Parallel and
in the Cloud”.
You can fine-tune the training computation and data dispatch loads between workers by
specifying the 'WorkerLoad' name-value pair argument of trainingOptions. For
advanced options, you can try modifying the number of workers of the parallel pool. For
more information, see “Specify Your Parallel Preferences” (Parallel Computing Toolbox)
See Also
trainNetwork | trainingOptions
See Also
Related Examples
• “Scale Up Deep Learning in Parallel and in the Cloud” on page 3-2
1-17
1 Deep Networks
[X,T] = wine_dataset;
Train an autoencoder with a hidden layer of size 10 and a linear transfer function for the
decoder. Set the L2 weight regularizer to 0.001, sparsity regularizer to 4 and sparsity
proportion to 0.05.
hiddenSize = 10;
autoenc1 = trainAutoencoder(X,hiddenSize,...
'L2WeightRegularization',0.001,...
'SparsityRegularization',4,...
'SparsityProportion',0.05,...
'DecoderTransferFunction','purelin');
features1 = encode(autoenc1,X);
Train a second autoencoder using the features from the first autoencoder. Do not scale
the data.
hiddenSize = 10;
autoenc2 = trainAutoencoder(features1,hiddenSize,...
'L2WeightRegularization',0.001,...
'SparsityRegularization',4,...
'SparsityProportion',0.05,...
'DecoderTransferFunction','purelin',...
'ScaleData',false);
features2 = encode(autoenc2,features1);
Train a softmax layer for classification using the features, features2, from the second
autoencoder, autoenc2.
softnet = trainSoftmaxLayer(features2,T,'LossFunction','crossentropy');
Stack the encoders and the softmax layer to form a deep network.
deepnet = stack(autoenc1,autoenc2,softnet);
1-18
Construct Deep Network Using Autoencoders
deepnet = train(deepnet,X,T);
wine_type = deepnet(X);
plotconfusion(T,wine_type);
1-19
1 Deep Networks
1-20
Pretrained Convolutional Neural Networks
In this section...
“Load Pretrained Networks” on page 1-22
“Compare Pretrained Networks” on page 1-23
“Feature Extraction” on page 1-25
“Transfer Learning” on page 1-25
“Import and Export Networks” on page 1-26
You can take a pretrained image classification network that has already learned to extract
powerful and informative features from natural images and use it as a starting point to
learn a new task. The pretrained networks are trained on more than a million images and
can classify images into 1000 object categories, such as keyboard, coffee mug, pencil, and
many animals. The training images are a subset of the ImageNet database [1], which is
used in ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) [2]. Using a
pretrained network with transfer learning is typically much faster and easier than
training a network from scratch.
You can use previously trained networks for the following tasks:
Purpose Description
Classification Apply pretrained networks directly to
classification problems. To classify a new
image, use classify. For an example
showing how to use a pretrained network
for classification, see “Classify Image Using
GoogLeNet”.
Feature Extraction Use a pretrained network as a feature
extractor by using the layer activations as
features. You can use these activations as
features to train another machine learning
model, such as a support vector machine
(SVM). For more information, see “Feature
Extraction” on page 1-25. For an example,
see “Feature Extraction Using AlexNet”.
1-21
1 Deep Networks
Purpose Description
Transfer Learning Take layers from a network trained on a
large data set and fine-tune on a new data
set. For more information, see “Transfer
Learning” on page 1-25. For a simple
example, see “Get Started with Transfer
Learning”. To try more pretrained
networks, see “Train Deep Learning
Network to Classify New Images”.
1-22
Pretrained Convolutional Neural Networks
Tip To get started with transfer learning, try choosing one of the faster networks, such as
SqueezeNet or GoogLeNet. You can then iterate quickly and try out different settings
such as data preprocessing steps and training options. Once you have a feeling of which
settings work well, try a more accurate network such as Inception-v3 or a ResNet and see
if that improves your results.
Use the plot below to compare the ImageNet validation accuracy with the time required
to make a prediction using the network. A good network has a high accuracy and is fast.
The plot displays the classification accuracy versus the prediction time when using a
modern GPU (an NVIDIA TITAN Xp) and a mini-batch size of 64. The prediction time is
measured relative to the fastest network. The area of each marker is proportional to the
size of the network on disk.
A network is Pareto efficient if there is no other network that is better on all the metrics
being compared, in this case accuracy and prediction time. The set of all Pareto efficient
networks is called the Pareto frontier. The Pareto frontier contains all the networks that
are not worse than another network on both metrics. The plot connects the networks that
are on the Pareto frontier in the plane of accuracy and prediction time. All networks
except AlexNet, VGG-16, VGG-19, and DenseNet-201 are on the Pareto frontier.
Note The plot below only shows an indication of the relative speeds of the different
networks. The exact prediction and training iteration times depend on the hardware and
mini-batch size that you use.
1-23
1 Deep Networks
The classification accuracy on the ImageNet validation set is the most common way to
measure the accuracy of networks trained on ImageNet. Networks that are accurate on
ImageNet are also often accurate when you apply them to other natural image data sets
using transfer learning or feature extraction. This generalization is possible because the
networks have learned to extract powerful and informative features from natural images
that generalize to other similar data sets. However, high accuracy on ImageNet does not
always transfer directly to other tasks, so it is a good idea to try multiple networks.
If you want to perform prediction using constrained hardware or distribute networks over
the Internet, then also consider the size of the network on disk and in memory.
Network Accuracy
There are multiple ways to calculate the classification accuracy on the ImageNet
validation set and different sources use different methods. Sometimes an ensemble of
1-24
Pretrained Convolutional Neural Networks
multiple models is used and sometimes each image is evaluated multiple times using
multiple crops. Sometimes the top-5 accuracy instead of the standard (top-1) accuracy is
quoted. Because of these differences, it is often not possible to directly compare the
accuracies from different sources. The accuracies of pretrained networks in Deep
Learning Toolbox are standard (top-1) accuracies using a single model and single central
image crop.
Feature Extraction
Feature extraction is an easy and fast way to use the power of deep learning without
investing time and effort into training a full network. Because it only requires a single
pass over the training images, it is especially useful if you do not have a GPU. You extract
learned image features using a pretrained network, and then use those features to train a
classifier, such as a support vector machine using fitcsvm.
Try feature extraction when your new data set is very small. Since you only train a simple
classifier on the extracted features, training is fast. It is also unlikely that fine-tuning
deeper layers of the network improves the accuracy since there is little data to learn
from.
• If your data is very similar to the original data, then the more specific features
extracted deeper in the network are likely to be useful for the new task.
• If your data is very different from the original data, then the features extracted deeper
in the network might be less useful for your task. Try training the final classifier on
more general features extracted from an earlier network layer. If the new data set is
large, then you can also try training a network from scratch.
ResNets are often the best feature extractors [4], independently of their ImageNet
accuracies. For an example showing how to use a pretrained network for feature
extraction, see “Feature Extraction Using AlexNet”.
Transfer Learning
You can fine-tune deeper layers in the network by training the network on your new data
set with the pretrained network as a starting point. Fine-tuning a network with transfer
learning is often faster and easier than constructing and training a new network. The
network has already learned a rich set of image features, but when you fine-tune the
network it can learn features specific to your new data set. If you have a very large data
set, then transfer learning might not be faster than training from scratch.
1-25
1 Deep Networks
Tip Fine-tuning a network often gives the highest accuracy. For very small data sets
(fewer than about 20 images per class), try feature extraction.
Fine-tuning a network is slower and requires more effort than simple feature extraction,
but since the network can learn to extract a different set of features, the final network is
often more accurate. Fine-tuning usually works better than feature extraction as long as
the new data set is not very small, because then the network has data to learn new
features from. For examples showing how to perform transfer learning, see “Transfer
Learning with Deep Network Designer” on page 2-2 and “Train Deep Learning
Network to Classify New Images”.
1-26
See Also
Export trained networks to the ONNX model format by using the exportONNXNetwork
function. You can then import the ONNX model to other deep learning frameworks, such
as TensorFlow, that support ONXX model import. For more information, see
exportONNXNetwork.
Import pretrained networks from ONNX using importONNXNetwork and import network
architectures with or without weights using importONNXLayers.
References
[1] ImageNet. http://www.image-net.org
[2] Russakovsky, O., Deng, J., Su, H., et al. “ImageNet Large Scale Visual Recognition
Challenge.” International Journal of Computer Vision (IJCV). Vol 115, Issue 3,
2015, pp. 211–252
[4] Kornblith, Simon, Jonathon Shlens, and Quoc V. Le. "Do Better ImageNet Models
Transfer Better?." arXiv preprint arXiv:1805.08974 (2018).
See Also
alexnet | densenet201 | exportONNXNetwork | googlenet | importCaffeLayers |
importCaffeNetwork | importKerasLayers | importKerasNetwork |
importONNXLayers | importONNXNetwork | inceptionresnetv2 | inceptionv3 |
resnet101 | resnet18 | resnet50 | squeezenet | vgg16 | vgg19
1-27
1 Deep Networks
Related Examples
• “Deep Learning in MATLAB” on page 1-2
• “Transfer Learning Using AlexNet”
• “Feature Extraction Using AlexNet”
• “Classify Image Using GoogLeNet”
• “Train Deep Learning Network to Classify New Images”
• “Visualize Features of a Convolutional Neural Network”
• “Visualize Activations of a Convolutional Neural Network”
• “Deep Dream Images Using AlexNet”
1-28
Learn About Convolutional Neural Networks
Convolutional neural networks are inspired from the biological structure of a visual
cortex, which contains arrangements of simple and complex cells [1]. These cells are
found to activate based on the subregions of a visual field. These subregions are called
receptive fields. Inspired from the findings of this study, the neurons in a convolutional
layer connect to the subregions of the layers before that layer instead of being fully-
connected as in other types of neural networks. The neurons are unresponsive to the
areas outside of these subregions in the image.
These subregions might overlap, hence the neurons of a ConvNet produce spatially-
correlated outcomes, whereas in other types of neural networks, the neurons do not share
any connections and produce independent outcomes.
1-29
1 Deep Networks
The neurons in each layer of a ConvNet are arranged in a 3-D manner, transforming a 3-D
input to a 3-D output. For example, for an image input, the first layer (input layer) holds
the images as 3-D inputs, with the dimensions being height, width, and the color channels
of the image. The neurons in the first convolutional layer connect to the regions of these
images and transform them into a 3-D output. The hidden units (neurons) in each layer
learn nonlinear combinations of the original inputs, which is called feature extraction [2].
These learned features, also known as activations, from one layer become the inputs for
the next layer. Finally, the learned features become the inputs to the classifier or the
regression function at the end of the network.
The architecture of a ConvNet can vary depending on the types and numbers of layers
included. The types and number of layers included depends on the particular application
or data. For example, if you have categorical responses, you must have a classification
function and a classification layer, whereas if your response is continuous, you must have
a regression layer at the end of the network. A smaller network with only one or two
convolutional layers might be sufficient to learn a small number of gray scale image data.
On the other hand, for more complex data with millions of colored images, you might
need a more complicated network with multiple convolutional and fully connected layers.
You can concatenate the layers of a convolutional neural network in MATLAB in the
following way:
1-30
See Also
After defining the layers of your network, you must specify the training options using the
trainingOptions function. For example,
options = trainingOptions('sgdm');
Then, you can train the network with your training data using the trainNetwork
function. The data, layers, and training options become the inputs to the training
function. For example,
convnet = trainNetwork(data,layers,options);
References
[1] Hubel, H. D. and Wiesel, T. N. '' Receptive Fields of Single neurones in the Cat’s
Striate Cortex.'' Journal of Physiology. Vol 148, pp. 574-591, 1959.
See Also
trainNetwork | trainingOptions
More About
• “Deep Learning in MATLAB” on page 1-2
• “Specify Layers of Convolutional Neural Network” on page 1-40
• “Set Up Parameters and Train Convolutional Neural Network” on page 1-55
1-31
1 Deep Networks
1-32
List of Deep Learning Layers
To learn how to create networks from layers for different tasks, see the following
examples.
Layer Functions
Use the following functions to create different layer types. Alternatively, you can import
layers from Caffe and Keras, or you can define your own custom layers. To import layers
from Caffe and Keras, use importCaffeLayers and importKerasLayers respectively.
To learn how to define your own custom layers, see “Define Custom Deep Learning
Layers” on page 1-78.
1-33
1 Deep Networks
Input Layers
Function Description
An image input layer inputs images to a
imageInputLayer network and applies data normalization.
A sequence input layer inputs sequence
sequenceInputLayer data to a network.
An ROI input layer inputs images to a Fast
roiInputLayer (Computer Vision R-CNN object detection network.
System Toolbox™)
Function Description
A 2-D convolutional layer applies sliding
convolution2dLayer convolutional filters to the input.
A transposed 2-D convolution layer
transposedConv2dLayer upsamples feature maps.
A fully connected layer multiplies the input
fullyConnectedLayer by a weight matrix and then adds a bias
vector.
Sequence Layers
Function Description
A sequence input layer inputs sequence
sequenceInputLayer data to a network.
An LSTM layer learns long-term
lstmLayer dependencies between time steps in time
series and sequence data.
1-34
List of Deep Learning Layers
Function Description
A bidirectional LSTM (BiLSTM) layer learns
bilstmLayer bidirectional long-term dependencies
between time steps of time series or
sequence data. These dependencies can be
useful for when you want the network to
learn from the complete time series at each
time step.
A word embedding layer maps word indices
wordEmbeddingLayer (Text to vectors.
Analytics Toolbox™)
Activation Layers
Function Description
A ReLU layer performs a threshold
reluLayer operation to each element of the input,
where any value less than zero is set to
zero.
A leaky ReLU layer performs a threshold
leakyReluLayer operation, where any input value less than
zero is multiplied by a fixed scalar.
A clipped ReLU layer performs a threshold
clippedReluLayer operation, where any input value less than
zero is set to zero and any value above the
clipping ceiling is set to that clipping
ceiling.
A PReLU layer performs a threshold
preluLayer on page 1-95 (Custom operation, where for each channel, any
layer example) input value less than zero is multiplied by a
scalar learned at training time.
1-35
1 Deep Networks
Function Description
A batch normalization layer normalizes
batchNormalizationLayer each input channel across a mini-batch. To
speed up training of convolutional neural
networks and reduce the sensitivity to
network initialization, use batch
normalization layers between convolutional
layers and nonlinearities, such as ReLU
layers.
A channel-wise local response (cross-
channel) normalization layer carries out
crossChannelNormalizationLayer channel-wise normalization.
A dropout layer randomly sets input
dropoutLayer elements to zero with a given probability.
A 2-D crop layer applies 2-D cropping to the
crop2dLayer (Computer Vision input.
System Toolbox)
Function Description
An average pooling layer performs down-
averagePooling2dLayer sampling by dividing the input into
rectangular pooling regions and computing
the average values of each region.
A max pooling layer performs down-
maxPooling2dLayer sampling by dividing the input into
rectangular pooling regions, and computing
the maximum of each region.
A max unpooling layer unpools the output
maxUnpooling2dLayer of a max pooling layer.
1-36
List of Deep Learning Layers
Combination Layers
Function Description
An addition layer adds inputs from multiple
additionLayer neural network layers element-wise.
A depth concatenation layer takes inputs
depthConcatenationLayer that have the same height and width and
concatenates them along the third
dimension (the channel dimension).
Function Description
An ROI input layer inputs images to a Fast
roiInputLayer (Computer Vision R-CNN object detection network.
System Toolbox)
A ROI max pooling layer outputs fixed size
roiMaxPooling2dLayer (Computer feature maps for every rectangular ROI
Vision System Toolbox) within the input feature map. Use this layer
to create a Fast or Faster R-CNN object
detection network.
A region proposal layer outputs bounding
regionProposalLayer (Computer boxes around potential objects in an image
Vision System Toolbox) as part of the region proposal network
(RPN) within Faster R-CNN.
A region proposal network (RPN) softmax
rpnSoftmaxLayer (Computer Vision layer applies a softmax activation function
System Toolbox) to the input. Use this layer to create a
Faster R-CNN object detection network.
A region proposal network (RPN)
rpnClassificationLayer classification layer classifies image regions
(Computer Vision System Toolbox) as either object or background by using a
cross entropy loss function. Use this layer
to create a Faster R-CNN object detection
network.
1-37
1 Deep Networks
Function Description
A box regression layer refines bounding box
rcnnBoxRegressionLayer locations by using a smooth L1 loss
(Computer Vision System Toolbox) function. Use this layer to create a Fast or
Faster R-CNN object detection network.
Output Layers
Function Description
A softmax layer applies a softmax function
softmaxLayer to the input.
A classification layer computes the cross
classificationLayer entropy loss for multi-class classification
problems with mutually exclusive classes.
A regression layer computes the half-mean-
regressionLayer squared-error loss for regression problems.
A pixel classification layer provides a
pixelClassificationLayer categorical label for each image pixel.
(Computer Vision System Toolbox)
A region proposal network (RPN) softmax
rpnSoftmaxLayer (Computer Vision layer applies a softmax activation function
System Toolbox) to the input. Use this layer to create a
Faster R-CNN object detection network.
A region proposal network (RPN)
rpnClassificationLayer classification layer classifies image regions
(Computer Vision System Toolbox) as either object or background by using a
cross entropy loss function. Use this layer
to create a Faster R-CNN object detection
network.
A box regression layer refines bounding box
rcnnBoxRegressionLayer locations by using a smooth L1 loss
(Computer Vision System Toolbox) function. Use this layer to create a Fast or
Faster R-CNN object detection network.
1-38
See Also
Function Description
A weighted classification layer computes
weightedClassificationLayer on the weighted cross entropy loss for
page 1-131 (Custom layer example) classification problems.
A Dice pixel classification layer computes
dicePixelClassificationLayer the Dice loss for semantic segmentation
(Custom layer example) problems.
A classification SSE layer computes the
sseClassificationLayer on page sum of squares error loss for classification
1-120 (Custom layer example) problems.
A regression MAE layer computes the mean
maeRegressionLayer on page 1-109 absolute error loss for regression problems.
(Custom layer example)
See Also
trainNetwork | trainingOptions
More About
• “Learn About Convolutional Neural Networks” on page 1-29
• “Specify Layers of Convolutional Neural Network” on page 1-40
• “Set Up Parameters and Train Convolutional Neural Network” on page 1-55
• “Define Custom Deep Learning Layers” on page 1-78
• “Create Simple Deep Learning Network for Classification”
• “Sequence Classification Using Deep Learning”
• “Pretrained Convolutional Neural Networks” on page 1-21
• “Deep Learning in MATLAB” on page 1-2
1-39
1 Deep Networks
The first step of creating and training a new convolutional neural network (ConvNet) is to
define the network architecture. This topic explains the details of ConvNet layers, and the
order they appear in a ConvNet. For a complete list of deep learning layers and how to
create them, see “List of Deep Learning Layers” on page 1-33. To learn about LSTM
networks for sequence classification and regression, see “Long Short-Term Memory
Networks” on page 1-154. To learn how to create your own custom layers, see “Define
Custom Deep Learning Layers” on page 1-78.
The network architecture can vary depending on the types and numbers of layers
included. The types and number of layers included depends on the particular application
or data. For example, if you have categorical responses, you must have a softmax layer
and a classification layer, whereas if your response is continuous, you must have a
regression layer at the end of the network. A smaller network with only one or two
convolutional layers might be sufficient to learn on a small number of grayscale image
data. On the other hand, for more complex data with millions of colored images, you
might need a more complicated network with multiple convolutional and fully connected
layers.
To specify the architecture of a deep network with all layers connected sequentially,
create an array of layers directly. For example, to create a deep network which classifies
28-by-28 grayscale images into 10 classes, specify the layer array
layers = [
imageInputLayer([28 28 1])
1-40
Specify Layers of Convolutional Neural Network
convolution2dLayer(3,16,'Padding',1)
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,32,'Padding',1)
batchNormalizationLayer
reluLayer
fullyConnectedLayer(10)
softmaxLayer
classificationLayer];
layers is an array of Layer objects. You can then use layers as an input to the training
function trainNetwork.
To specify the architecture of a neural network with all layers connected sequentially,
create an array of layers directly. To specify the architecture of a network where layers
can have multiple inputs or outputs, use a LayerGraph object.
An image input layer inputs images to a network and applies data normalization.
Specify the image size using the inputSize argument. The size of an image corresponds
to the height, width, and the number of color channels of that image. For example, for a
grayscale image, the number of channels is 1, and for a color image it is 3.
Convolutional Layer
A 2-D convolutional layer applies sliding convolutional filters to the input. Create a 2-D
convolutional layer using convolution2dLayer.
A convolutional layer consists of neurons that connect to subregions of the input images
or the outputs of the previous layer. The layer learns the features localized by these
regions while scanning through an image. When creating a layer using the
1-41
1 Deep Networks
convolution2dLayer function, you can specify the size of these regions using the
filterSize input argument.
For each region, the trainNetwork function computes a dot product of the weights and
the input, and then adds a bias term. A set of weights that is applied to a region in the
image is called a filter. The filter moves along the input image vertically and horizontally,
repeating the same computation for each region. In other words, the filter convolves the
input.
This image shows a 3-by-3 filter scanning through the input. The lower map represents
the input and the upper map represents the output.
The step size with which the filter moves is called a stride. You can specify the step size
with the Stride name-value pair argument. The local regions that the neurons connect
to can overlap depending on the filterSize and 'Stride' values.
This image shows a 3-by-3 filter scanning through the input with a stride of 2. The lower
map represents the input and the upper map represents the output.
1-42
Specify Layers of Convolutional Neural Network
The number of weights in a filter is h * w * c, where h is the height, and w is the width of
the filter, respectively, and c is the number of channels in the input. For example, if the
input is a color image, the number of color channels is 3. The number of filters
determines the number of channels in the output of a convolutional layer. Specify the
number of filters using the numFilters argument with the convolution2dLayer
function.
Dilated Convolution
A dilated convolution is a convolution in which the filters are expanded by spaces inserted
between the elements of the filter. Specify the dilation factor using the
'DilationFactor' property.
Use dilated convolutions to increase the receptive field (the area of the input which the
layer can see) of the layer without increasing the number of parameters or computation.
The layer expands the filters by inserting zeros between each filter element. The dilation
factor determines the step size for sampling the input or equivalently the upsampling
factor of the filter. It corresponds to an effective filter size of (Filter Size – 1) .* Dilation
Factor + 1. For example, a 3-by-3 filter with the dilation factor [2 2] is equivalent to a 5-
by-5 filter with zeros between the elements.
1-43
1 Deep Networks
This image shows a 3-by-3 filter dilated by a factor of two scanning through the input. The
lower map represents the input and the upper map represents the output.
Feature Maps
As a filter moves along the input, it uses the same set of weights and the same bias for the
convolution, forming a feature map. Each feature map is the result of a convolution using
a different set of weights and a different bias. Hence, the number of feature maps is equal
to the number of filters. The total number of parameters in a convolutional layer is
((h*w*c + 1)*Number of Filters), where 1 is the bias.
Zero Padding
You can also apply zero padding to input image borders vertically and horizontally using
the 'Padding' name-value pair argument. Padding is rows or columns of zeros added to
1-44
Specify Layers of Convolutional Neural Network
the borders of an image input. By adjusting the padding, you can control the output size
of the layer.
This image shows a 3-by-3 filter scanning through the input with padding of size 1. The
lower map represents the input and the upper map represents the output.
Output Size
The output height and width of a convolutional layer is (Input Size – ((Filter Size –
1)*Dilation Factor + 1) + 2*Padding)/Stride + 1. This value must be an integer for the
whole image to be fully covered. If the combination of these parameters does not lead the
1-45
1 Deep Networks
image to be fully covered, the software by default ignores the remaining part of the image
along the right and bottom edges in the convolution.
Number of Neurons
The product of the output height and width gives the total number of neurons in a feature
map, say Map Size. The total number of neurons (output size) in a convolutional layer is
Map Size*Number of Filters.
For example, suppose that the input image is a 32-by-32-by-3 color image. For a
convolutional layer with eight filters and a filter size of 5-by-5, the number of weights per
filter is 5 * 5 * 3 = 75, and the total number of parameters in the layer is (75 + 1) * 8 =
608. If the stride is 2 in each direction and padding of size 2 is specified, then each
feature map is 16-by-16. This is because (32 – 5 + 2 * 2)/2 + 1 = 16.5, and some of the
outermost zero padding to the right and bottom of the image is discarded. Finally, the
total number of neurons in the layer is 16 * 16 * 8 = 2048.
Usually, the results from these neurons pass through some form of nonlinearity, such as
rectified linear units (ReLU).
Learning Parameters
You can adjust the learning rates and regularization parameters for the layer using name-
value pair arguments while defining the convolutional layer. If you choose not to specify
these parameters, then trainNetwork uses the global training parameters defined with
the trainingOptions function. For details on global and layer training options, see “Set
Up Parameters and Train Convolutional Neural Network” on page 1-55.
Number of Layers
A convolutional neural network can consist of one or multiple convolutional layers. The
number of convolutional layers depends on the amount and complexity of the data.
A batch normalization layer normalizes each input channel across a mini-batch. To speed
up training of convolutional neural networks and reduce the sensitivity to network
initialization, use batch normalization layers between convolutional layers and
nonlinearities, such as ReLU layers.
1-46
Specify Layers of Convolutional Neural Network
The layer first normalizes the activations of each channel by subtracting the mini-batch
mean and dividing by the mini-batch standard deviation. Then, the layer shifts the input
by a learnable offset β and scales it by a learnable scale factor γ. β and γ are themselves
learnable parameters that are updated during network training.
Batch normalization layers normalize the activations and gradients propagating through a
neural network, making network training an easier optimization problem. To take full
advantage of this fact, you can try increasing the learning rate. Since the optimization
problem is easier, the parameter updates can be larger and the network can learn faster.
You can also try reducing the L2 and dropout regularization. With batch normalization
layers, the activations of a specific image are not deterministic, but instead depend on
which images happen to appear in the same mini-batch. To take full advantage of this
regularizing effect, try shuffling the training data before every training epoch. To specify
how often to shuffle the data during training, use the 'Shuffle' name-value pair
argument of trainingOptions.
ReLU Layer
Create a ReLU layer using reluLayer.
A ReLU layer performs a threshold operation to each element of the input, where any
value less than zero is set to zero.
Ïx, x ≥ 0
f (x) = Ì .
Ó0 , x < 0
The ReLU layer does not change the size of its input.
There are extensions of the standard ReLU layer that perform slightly different operations
and can improve performance for some applications. A leaky ReLU layer performs a
threshold operation, where any input value less than zero is multiplied by a fixed scalar.
Create a leaky ReLU layer using leakyReluLayer. A clipped ReLU layer performs a
threshold operation, where any input value less than zero is set to zero and any value
above the clipping ceiling is set to that clipping ceiling.. This clipping prevents the output
from becoming too large. Create a clipped ReLU layer using clippedReluLayer.
1-47
1 Deep Networks
This layer performs a channel-wise local response normalization. It usually follows the
ReLU activation layer. This layer replaces each element with a normalized value it obtains
using the elements from a certain number of neighboring channels (elements in the
normalization window). That is, for each element in the input, trainNetwork
x
computes a normalized value ’ using
x
x
x’ = ,
b
Ê a * ss ˆ
Á K + windowChannelSize ˜
Ë ¯
where K, α, and β are the hyperparameters in the normalization, and ss is the sum of
squares of the elements in the normalization window [2]. You must specify the size of the
normalization window using the windowChannelSize argument of the
crossChannelNormalizationLayer function. You can also specify the
hyperparameters using the Alpha, Beta, and K name-value pair arguments.
The previous normalization formula is slightly different than what is presented in [2]. You
can obtain the equivalent formula by multiplying the alpha value by the
windowChannelSize.
An average pooling layer performs down-sampling by dividing the input into rectangular
pooling regions and computing the average values of each region. Create an average
pooling layer using averagePooling2dLayer.
Pooling layers follow the convolutional layers for down-sampling, hence, reducing the
number of connections to the following layers. They do not perform any learning
1-48
Specify Layers of Convolutional Neural Network
themselves, but reduce the number of parameters to be learned in the following layers.
They also help reduce overfitting.
A max pooling layer returns the maximum values of rectangular regions of its input. The
size of the rectangular regions is determined by the poolSize argument of
maxPoolingLayer. For example, if poolSize equals [2,3], then the layer returns the
maximum value in regions of height 2 and width 3. An average pooling layer outputs the
average values of rectangular regions of its input. The size of the rectangular regions is
determined by the poolSize argument of averagePoolingLayer. For example, if
poolSize is [2,3], then the layer returns the average value of regions of height 2 and
width 3.
Pooling layers scan through the input horizontally and vertically in step sizes you can
specify using the 'Stride' name-value pair argument. If the pool size is smaller than or
equal to the stride, then the pooling regions do not overlap.
For nonoverlapping regions (Pool Size and Stride are equal), if the input to the pooling
layer is n-by-n, and the pooling region size is h-by-h, then the pooling layer down-samples
the regions by h [6]. That is, the output of a max or average pooling layer for one channel
of a convolutional layer is n/h-by-n/h. For overlapping regions, the output of a pooling
layer is (Input Size – Pool Size + 2*Padding)/Stride + 1.
Dropout Layer
Create a dropout layer using dropoutLayer.
A dropout layer randomly sets input elements to zero with a given probability.
At prediction time the output of a dropout layer is equal to its input. At training time, the
operation corresponds to temporarily dropping a randomly chosen unit and all of its
connections from the network during training. So, for each new input element,
trainNetwork randomly selects a subset of neurons, forming a different layer
architecture. These architectures use common weights, but because the learning does not
depend on specific neurons and connections, the dropout layer might help prevent
overfitting [7], [2]. Similar to max or average pooling layers, no learning takes place in
this layer.
1-49
1 Deep Networks
A fully connected layer multiplies the input by a weight matrix and then adds a bias
vector.
The convolutional (and down-sampling) layers are followed by one or more fully
connected layers.
As the name suggests, all neurons in a fully connected layer connect to all the neurons in
the previous layer. This layer combines all of the features (local information) learned by
the previous layers across the image to identify the larger patterns. For classification
problems, the last fully connected layer combines the features to classify the images. This
is the reason that the outputSize argument of the last fully connected layer of the
network is equal to the number of classes of the data set. For regression problems, the
output size must be equal to the number of response variables.
You can also adjust the learning rate and the regularization parameters for this layer
using the related name-value pair arguments when creating the fully connected layer. If
you choose not to adjust them, then trainNetwork uses the global training parameters
defined by the trainingOptions function. For details on global and layer training
options, see “Set Up Parameters and Train Convolutional Neural Network” on page 1-55.
A fully connected layer multiplies the input by a weight matrix W and then adds a bias
vector b.
If the input to the layer is a sequence (for example, in an LSTM network), then the fully
connected layer acts independently on each time step. For example, if the layer before the
fully connected layer outputs an array X of size D-by-N-by-S, then the fully connected
layer outputs an array Z of size outputSize-by-N-by-S. At time step t, the corresponding
entry of Z is
WX t + b , where X t denotes time step t of X.
Output Layers
Softmax and Classification Layers
A softmax layer applies a softmax function to the input. Create a softmax layer using
softmaxLayer.
A classification layer computes the cross entropy loss for multi-class classification
problems with mutually exclusive classes. Create a classification layer using
classificationLayer.
1-50
Specify Layers of Convolutional Neural Network
For classification problems, a softmax layer and then a classification layer must follow the
final fully connected layer.
exp ( ar ( x ) )
yr ( x ) = ,
k
 exp ( a j ( x ))
j =1
k
where 0 £ yr £ 1 and  yj = 1 .
j =1
The softmax function is the output unit activation function after the last fully connected
layer for multi-class classification problems:
P ( x, q cr ) P ( cr ) exp ( ar ( x,q ) )
P ( cr x,q ) = = ,
k k
 P ( x,q c j ) P (c j )  exp (a j ( x,q ))
j =1 j =1
k
where 0 £ P ( cr x,q ) £ 1 and  P ( cj ) (
x,q = 1 . Moreover, ar = ln P ( x,q cr ) P ( cr ) , )
j =1
P ( x ,q cr ) is the conditional probability of the sample given class r, and P ( cr ) is the
class prior probability.
The softmax function is also known as the normalized exponential and can be considered
the multi-class generalization of the logistic sigmoid function [8].
For typical classification networks, the classification layer must follow the softmax layer.
In the classification layer, trainNetwork takes the values from the softmax function and
assigns each input to one of the K mutually exclusive classes using the cross entropy
function for a 1-of-K coding scheme [8]:
N K
loss = Â Âtij ln yij ,
i=1 j =1
1-51
1 Deep Networks
t
where N is the number of samples, K is the number of classes, ij is the indicator that the
y
ith sample belongs to the jth class, and ij is the output for sample i for class j, which in
this case, is the value from the softmax function. That is, it is the probability that the
network associates the ith input with class j.
Regression Layer
R
(ti - yi ) 2
MSE = Â R
,
i =1
where R is the number of responses, ti is the target output, and yi is the network’s
prediction for the response variable corresponding to observation i.
1
R
(ti - yi )2
loss = Â
2 i =1 R
,
References
[1] Murphy, K. P. Machine Learning: A Probabilistic Perspective. Cambridge,
Massachusetts: The MIT Press, 2012.
[2] Krizhevsky, A., I. Sutskever, and G. E. Hinton. "ImageNet Classification with Deep
Convolutional Neural Networks." Advances in Neural Information Processing
Systems. Vol 25, 2012.
1-52
See Also
[3] LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel,
L.D., et al. ''Handwritten Digit Recognition with a Back-propagation Network.'' In
Advances of Neural Information Processing Systems, 1990.
[4] LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner. ''Gradient-based Learning Applied to
Document Recognition.'' Proceedings of the IEEE. Vol 86, pp. 2278–2324, 1998.
[5] Nair, V. and G. E. Hinton. "Rectified linear units improve restricted boltzmann
machines." In Proc. 27th International Conference on Machine Learning, 2010.
[8] Bishop, C. M. Pattern Recognition and Machine Learning. Springer, New York, NY,
2006.
[9] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network
training by reducing internal covariate shift." preprint, arXiv:1502.03167 (2015).
See Also
averagePooling2dLayer | batchNormalizationLayer | classificationLayer |
clippedReluLayer | convolution2dLayer | crossChannelNormalizationLayer |
dropoutLayer | fullyConnectedLayer | imageInputLayer | leakyReluLayer |
maxPooling2dLayer | regressionLayer | reluLayer | softmaxLayer |
trainNetwork | trainingOptions
More About
• “List of Deep Learning Layers” on page 1-33
• “Learn About Convolutional Neural Networks” on page 1-29
• “Set Up Parameters and Train Convolutional Neural Network” on page 1-55
• “Resume Training from Checkpoint Network” on page 1-71
1-53
1 Deep Networks
1-54
Set Up Parameters and Train Convolutional Neural Network
After you define the layers of your neural network as described in “Specify Layers of
Convolutional Neural Network” on page 1-40, the next step is to set up the training
options for the network. Use the trainingOptions function to define the global training
parameters. To train a network, use the object returned by trainingOptions as an
input argument to the trainNetwork function. For example:
options = trainingOptions('adam');
trainedNet = trainNetwork(data,layers,options);
Layers with learnable parameters also have options for adjusting the learning
parameters. For more information, see “Set Up Parameters in Convolutional and Fully
Connected Layers” on page 1-58.
The 'adam' (derived from adaptive moment estimation) solver is often a good optimizer
to try first. You can also try the 'rmsprop' (root mean square propagation) and 'sgdm'
(stochastic gradient descent with momentum) optimizers and see if this improves
1-55
1 Deep Networks
training. Different solvers work better for different problems. For more information about
the different solvers, see “Stochastic Gradient Descent”.
The solvers update the parameters using a subset of the data each step. This subset is
called a mini-batch. You can specify the size of the mini-batch by using the
'MiniBatchSize' name-value pair argument of trainingOptions. Each parameter
update is called an iteration. A full pass through the entire data set is called an epoch.
You can specify the maximum number of epochs to train for by using the 'MaxEpochs'
name-value pair argument of trainingOptions. The default value is 30, but you can
choose a smaller number of epochs for small networks or for fine-tuning and transfer
learning, where most of the learning is already done.
By default, the software shuffles the data once before training. You can change this
setting by using the 'Shuffle' name-value pair argument.
Tip If the mini-batch loss during training ever becomes NaN, then the learning rate is
likely too high. Try reducing the learning rate, for example by a factor of 3, and restarting
network training.
1-56
Set Up Parameters and Train Convolutional Neural Network
Performing validation at regular intervals during training helps you to determine if your
network is overfitting to the training data. A common problem is that the network simply
"memorizes" the training data, rather than learning general features that enable the
network to make accurate predictions for new data. To check if your network is
overfitting, compare the training loss and accuracy to the corresponding validation
metrics. If the training loss is significantly lower than the validation loss, or the training
accuracy is significantly higher than the validation accuracy, then your network is
overfitting.
1-57
1 Deep Networks
load net_checkpoint__351__2018_04_12__18_09_52.mat
You can then resume training by using the layers of the network as an input argument to
trainNetwork. For example:
trainNetwork(XTrain,YTrain,net.Layers,options)
You must manually specify the training options and the input data, because the
checkpoint network does not contain this information. For an example, see “Resume
Training from Checkpoint Network” on page 1-71.
1-58
See Also
By default, the initial values of the weights of the convolutional and fully connected layers
are randomly generated from a Gaussian distribution with mean 0 and standard deviation
0.01. The initial biases are by default equal to 0. You can manually change the initial
weights and biases after you create the layers. For examples, see “Specify Initial Weights
and Biases in Convolutional Layer” and “Specify Initial Weights and Biases in Fully
Connected Layer”.
See Also
Convolution2dLayer | FullyConnectedLayer | trainNetwork | trainingOptions
More About
• “Learn About Convolutional Neural Networks” on page 1-29
• “Specify Layers of Convolutional Neural Network” on page 1-40
• “Create Simple Deep Learning Network for Classification”
• “Resume Training from Checkpoint Network” on page 1-71
1-59
1 Deep Networks
1-60
Deep Learning Tips and Tricks
1-61
1 Deep Networks
1-62
Deep Learning Tips and Tricks
For more information, see “Set Up Parameters and Train Convolutional Neural Network”
on page 1-55.
1-63
1 Deep Networks
1-64
Deep Learning Tips and Tricks
For more information, see “Set Up Parameters and Train Convolutional Neural Network”
on page 1-55.
1-65
1 Deep Networks
1-66
Deep Learning Tips and Tricks
You can analyze your deep learning network using analyzeNetwork. The
analyzeNetwork function displays an interactive visualization of the network
architecture, detects errors and issues with the network, and provides detailed
information about the network layers. Use the network analyzer to visualize and
understand the network architecture, check that you have defined the architecture
correctly, and detect problems before training. Problems that analyzeNetwork detects
include missing or disconnected layers, mismatched or incorrect sizes of layer inputs, an
incorrect number of layer inputs, and invalid graph structures.
Ideally, all classes have an equal number of observations. However, for some tasks,
classes can be imbalanced. For example, automotive datasets of street scenes tend to
have more sky, building, and road pixels than pedestrian and bicyclist pixels because the
sky, buildings, and roads cover more image area. If not handled correctly, this imbalance
can be detrimental to the learning process because the learning is biased in favor of the
dominant classes.
1-67
1 Deep Networks
classification tasks, you can use the example custom classification layer provided in
“Define Custom Weighted Classification Layer” on page 1-131.
Alternatively, you can balance the classes by doing one or more of the following:
For more information about preprocessing image data, see “Preprocess Images for Deep
Learning” on page 1-166.
auimds = augmentedImageDatastore(inputSize,imds)
1-68
Deep Learning Tips and Tricks
For more information about working with LSTM networks, see “Long Short-Term Memory
Networks” on page 1-154.
1-69
1 Deep Networks
For more information, see “Scale Up Deep Learning in Parallel and in the Cloud” on page
3-2.
See Also
Deep Network Designer | analyzeNetwork | checkLayer | trainingOptions
More About
• “Pretrained Convolutional Neural Networks” on page 1-21
• “Preprocess Images for Deep Learning” on page 1-166
• “Transfer Learning with Deep Network Designer” on page 2-2
• “Train Deep Learning Network to Classify New Images”
• “Convert Classification Network into Regression Network”
1-70
Resume Training from Checkpoint Network
Load the sample data as a 4-D array. digitTrain4DArrayData loads the digit training
set as 4-D array data. XTrain is a 28-by-28-by-1-by-5000 array, where 28 is the height
and 28 is the width of the images. 1 is the number of channels and 5000 is the number of
synthetic images of handwritten digits. YTrain is a categorical vector containing the
labels for each observation.
[XTrain,YTrain] = digitTrain4DArrayData;
size(XTrain)
ans = 1×4
28 28 1 5000
figure;
perm = randperm(size(XTrain,4),20);
for i = 1:20
subplot(4,5,i);
imshow(XTrain(:,:,:,perm(i)));
end
1-71
1 Deep Networks
layers = [
imageInputLayer([28 28 1])
convolution2dLayer(3,8,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,16,'Padding','same')
batchNormalizationLayer
1-72
Resume Training from Checkpoint Network
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,32,'Padding','same')
batchNormalizationLayer
reluLayer
averagePooling2dLayer(7)
fullyConnectedLayer(10)
softmaxLayer
classificationLayer];
Specify training options for stochastic gradient descent with momentum (SGDM) and
specify the path for saving the checkpoint networks.
checkpointPath = pwd;
options = trainingOptions('sgdm', ...
'InitialLearnRate',0.1, ...
'MaxEpochs',20, ...
'Verbose',false, ...
'Plots','training-progress', ...
'Shuffle','every-epoch', ...
'CheckpointPath',checkpointPath);
Train the network. trainNetwork uses a GPU if there is one available. If there is no
available GPU, then it uses CPU. trainNetwork saves one checkpoint network each
epoch and automatically assigns unique names to the checkpoint files.
net1 = trainNetwork(XTrain,YTrain,layers,options);
1-73
1 Deep Networks
Suppose that training was interrupted and did not complete. Rather than restarting the
training from the beginning, you can load the last checkpoint network and resume
training from that point. trainNetwork saves the checkpoint files with file names on the
form net_checkpoint__195__2018_07_13__11_59_10.mat, where 195 is the
iteration number, 2018_07_13 is the date, and 11_59_10 is the time trainNetwork
saved the network. The checkpoint network has the variable name net.
load('net_checkpoint__195__2018_07_13__11_59_10.mat','net')
Specify the training options and reduce the maximum number of epochs. You can also
adjust other training options, such as the initial learning rate.
1-74
Resume Training from Checkpoint Network
Resume training using the layers of the checkpoint network you loaded with the new
training options. If the checkpoint network is a DAG network, then use
layerGraph(net) as the argument instead of net.Layers.
net2 = trainNetwork(XTrain,YTrain,net.Layers,options);
1-75
1 Deep Networks
See Also
trainNetwork | trainingOptions
Related Examples
• “Create Simple Deep Learning Network for Classification”
More About
• “Learn About Convolutional Neural Networks” on page 1-29
1-76
See Also
1-77
1 Deep Networks
Tip This topic explains how to define custom deep learning layers for your problems. For
a list of built-in layers in Deep Learning Toolbox, see “List of Deep Learning Layers” on
page 1-33.
This topic explains the architecture of deep learning layers and how to define custom
layers to use for your problems.
Type Description
Layer Define a custom deep learning layer and
specify optional learnable parameters,
forward functions, and a backward
function.
1-78
Define Custom Deep Learning Layers
Layer Templates
You can use the following templates to define new layers.
This template outlines the structure of an intermediate layer with learnable parameters.
If the layer does not have learnable parameters, then you can omit the properties
(learnable) section. For an example showing how to define a layer with learnable
parameters, see “Define a Custom Deep Learning Layer with Learnable Parameters” on
page 1-95.
classdef myLayer < nnet.layer.Layer
properties
% (Optional) Layer properties.
properties (Learnable)
% (Optional) Layer learnable parameters.
methods
function layer = myLayer()
% (Optional) Create a myLayer.
% This function must have the same name as the layer.
function Z = predict(layer, X)
% Forward input data through the layer at prediction time and
% output the result.
%
% Inputs:
% layer - Layer to forward propagate through
% X - Input data
% Output:
% Z - Output of layer forward function
1-79
1 Deep Networks
This template outlines the structure of a classification output layer with a loss function.
For an example showing how to define a classification output layer and specify a loss
function, see “Define a Custom Classification Output Layer” on page 1-120.
classdef myClassificationLayer < nnet.layer.ClassificationLayer
properties
% (Optional) Layer properties.
methods
function layer = myClassificationLayer()
% (Optional) Create a myClassificationLayer.
1-80
Define Custom Deep Learning Layers
%
% Inputs:
% layer - Output layer
% Y – Predictions made by network
% T – Training targets
%
% Output:
% loss - Loss between Y and T
This template outlines the structure of a regression output layer with a loss function. For
an example showing how to define a regression output layer and specify a loss function,
see “Define a Custom Regression Output Layer” on page 1-109.
classdef myRegressionLayer < nnet.layer.RegressionLayer
properties
% (Optional) Layer properties.
methods
function layer = myRegressionLayer()
% (Optional) Create a myRegressionLayer.
1-81
1 Deep Networks
%
% Output:
% loss - Loss between Y and T
During the forward pass of a network, the layer takes the output x of the previous layer,
applies a function, and then outputs (forward propagates) the result z to the next layer.
At the end of a forward pass, the network calculates the loss L between the predictions Y
and the true targets T.
During the backward pass of a network, each layer takes the derivatives of the loss with
respect to z, computes the derivatives of the loss L with respect to x, and then outputs
(backward propagates) results to the previous layer. If the layer has learnable
parameters, then the layer also computes the derivatives of the layer weights (learnable
parameters) W. The layer uses the derivatives of the weights to update the learnable
parameters.
The following figure describes the flow of data through a deep neural network and
highlights the data flow through the layer.
1-82
Define Custom Deep Learning Layers
Declare the layer properties in the properties section of the class definition.
If the layer has no other properties, then you can omit the properties section.
Learnable Parameters
Declare the layer learnable parameters in the properties (Learnable) section of the
class definition. If the layer has no learnable parameters, then you can omit the
properties (Learnable) section.
Optionally, you can specify the learning rate factor and the L2 factor of the learnable
parameters. By default, each learnable parameter has its learning rate factor and L2
factor set to 1.
For both built-in and user-defined layers, you can set and get the learn rate factors and L2
regularization factors using the following functions.
1-83
1 Deep Networks
Function Description
setLearnRateFactor Set the learn rate factor of a learnable
parameter.
setL2Factor Set the L2 regularization factor of a
learnable parameter.
getLearnRateFactor Get the learn rate factor of a learnable
parameter.
getL2Factor Get the L2 regularization factor of a
learnable parameter.
To specify the learning rate factor and the L2 factor of a learnable parameter, use the
syntaxes layer = setLearnRateFactor(layer,'MyParameterName',value) and
layer = setL2Factor(layer,'MyParameterName',value), respectively.
To get the value of the learning rate factor and the L2 factor of a learnable parameter, use
the syntaxes getLearnRateFactor(layer,'MyParameterName') and
getL2Factor(layer,'MyParameterName') respectively.
For example, this syntax sets the learn rate factor of the learnable parameter Alpha to
0.1.
layer = setLearnRateFactor(layer,'Alpha',0.1);
Forward Functions
A layer uses one of two functions to perform a forward pass: predict or forward. If the
forward pass is at prediction time, then the layer uses the predict function. If the
forward pass is at training time, then the layer uses the forward function. The forward
function has an additional output argument memory, which you can use during backward
propagation.
If you do not require two different functions for prediction time and training time, then
you do not need to create the forward function. By default, the layer uses predict at
training time.
The syntax for predict is Z = predict(layer,X), where X is the input data and Z is
the output of the layer forward function.
1-84
Define Custom Deep Learning Layers
the memory value to use in backward propagation. memory is a required output argument
and it must return a value. If the layer does not require a memory value, then return an
empty value [].
The dimensions of X depend on the output of the previous layer. Similarly, the output Z
must have the appropriate shape for the next layer.
Built-in layers output 4-D arrays with size h-by-w-by-c-by-N, except for LSTM layers and
sequence input layers, which output 3-D arrays of size D-by-N-by-S.
Fully connected, ReLU, dropout, and softmax layers also accept 3-D inputs. When these
layers get inputs of this shape, they then output 3-D arrays of size D-by-N-by-S.
Backward Function
The layer uses one function for a backward pass: backward. The backward function
computes the derivatives of the loss with respect to the input data and then outputs
(backward propagates) results to the previous layer. If the layer has learnable
parameters, then backward also computes the derivatives of the layer weights (learnable
parameters). During the backward pass, the layer automatically updates the learnable
parameters using these derivatives.
To calculate the derivatives of the loss, you can use the chain rule:
∂L ∂L ∂z j
=Â
∂x j ∂z j ∂x
∂L ∂L ∂z j
=Â
∂Wi j ∂z j ∂Wi
1-85
1 Deep Networks
The values of X and Z are the same as in the forward functions. The dimensions of dLdZ
are the same as the dimensions of Z.
The dimensions and data type of dLdX are the same as the dimensions and data type of X.
The dimensions and data types of dLdW1,…,dLdWn are the same as the dimensions and
data types of W1,…,Wn, respectively, where Wi is the ith learnable parameter.
During the backward pass, the layer automatically updates the learnable parameters
using the derivatives dLdW1,…,dLdWn.
GPU Compatibility
For GPU compatibility, the layer functions must support inputs and return outputs of type
gpuArray. Any other functions the layer uses must do the same. Many MATLAB built-in
functions support gpuArray input arguments. If you call any of these functions with at
least one gpuArray input, then the function executes on the GPU and returns a
gpuArray output. For a list of functions that execute on a GPU, see “Run MATLAB
Functions on a GPU” (Parallel Computing Toolbox). To use a GPU for deep learning, you
must also have a CUDA enabled NVIDIA GPU with compute capability 3.0 or higher. For
more information on working with GPUs in MATLAB, see “GPU Computing in MATLAB”
(Parallel Computing Toolbox).
where layer is an instance of the layer, validInputSize is a vector specifying the valid
input size to the layer, and dim specifies the dimension of the observations in the layer
input data. For large input sizes, the gradient checks take longer to run. To speed up the
tests, specify a smaller valid input size.
1-86
Define Custom Deep Learning Layers
For more information, see “Check Custom Layer Validity” on page 1-141.
Define a custom PReLU layer. To create this layer, save the file preluLayer.m in the
current folder.
Create an instance of the layer and check its validity using checkLayer. Specify the valid
input size to be the size of a single observation of typical input to the layer. The layer
expects 4-D array inputs, where the first three dimensions correspond to the height,
width, and number of channels of the previous layer output, and the fourth dimension
corresponds to the observations.
layer = preluLayer(20,'prelu');
validInputSize = [24 24 20];
checkLayer(layer,validInputSize,'ObservationDimension',4)
Running nnet.checklayer.TestCase
.......... .....
Done nnet.checklayer.TestCase
__________
Test Summary:
15 Passed, 0 Failed, 0 Incomplete, 6 Skipped.
Time elapsed: 66.797 seconds.
Here, the function does not detect any issues with the layer.
You can use a custom layer in the same way as any other layer in Deep Learning Toolbox.
Define a custom PReLU layer. To create this layer, save the file preluLayer.m in the
current folder.
1-87
1 Deep Networks
layers = [
imageInputLayer([28 28 1])
convolution2dLayer(5,20)
batchNormalizationLayer
preluLayer(20,'prelu')
fullyConnectedLayer(10)
softmaxLayer
classificationLayer];
The following figure describes the flow of data through a convolutional neural network
and an output layer.
Declare the layer properties in the properties section of the class definition.
1-88
Define Custom Deep Learning Layers
series network with this layer and Name is set to '', then the software automatically
assigns a name to the layer at training time.
• Description – One-line description of the layer, specified as a character vector or a
string scalar. This description appears when the layer is displayed in a Layer array. If
you do not specify a layer description, then the software displays "Classification
Output" or "Regression Output".
• Type – Type of the layer, specified as a character vector or a string scalar. The value of
Type appears when the layer is displayed in a Layer array. If you do not specify a
layer type, then the software displays the layer class name.
• Classes – Classes of the output layer, specified as a categorical vector, string array,
cell array of character vectors, or 'auto'. If Classes is 'auto', then the software
automatically sets the classes at training time. If you specify the string array or cell
array of character vectors str, then the software sets the classes of the output layer
to categorical(str,str). The default value is 'auto'.
If the layer has no other properties, then you can omit the properties section.
Loss Functions
The output layer uses two functions to compute the loss and the derivatives:
forwardLoss and backwardLoss. The forwardLoss function computes the loss L. The
backwardLoss function computes the derivatives of the loss with respect to the
predictions.
1-89
1 Deep Networks
is the derivative of the loss with respect to the predictions Y. The output dLdY must be
the same size as the layer input Y.
The size of Y depends on the output of the previous layer. To ensure that Y is the same
size as T, you must include a layer that outputs the correct size before the output layer.
For example, to ensure that Y is a 4-D array of prediction scores for K classes, you can
include a fully connected layer of size K followed by a softmax layer before the output
layer.
For regression problems, the dimensions of T also depend on the type of problem.
1-90
Define Custom Deep Learning Layers
For example, if the network defines an image regression network with one response and
has mini-batches of size 50, then T is a 4-D array of size 1-by-1-by-1-by-50.
The size of Y depends on the output of the previous layer. To ensure that Y is the same
size as T, you must include a layer that outputs the correct size before the output layer.
For example, for image regression with R responses, to ensure that Y is a 4-D array of the
correct size, you can include a fully connected layer of size R before the output layer.
The forwardLoss and backwardLoss functions have the following output arguments.
If you want to include a user-defined output layer after a built-in layer, then
backwardLoss must output dLdY with the size expected by the previous layer. Built-in
layers expect dLdY to be the same size as Y.
GPU Compatibility
For GPU compatibility, the layer functions must support inputs and return outputs of type
gpuArray. Any other functions the layer uses must do the same. Many MATLAB built-in
functions support gpuArray input arguments. If you call any of these functions with at
least one gpuArray input, then the function executes on the GPU and returns a
gpuArray output. For a list of functions that execute on a GPU, see “Run MATLAB
Functions on a GPU” (Parallel Computing Toolbox). To use a GPU for deep learning, you
must also have a CUDA enabled NVIDIA GPU with compute capability 3.0 or higher. For
more information on working with GPUs in MATLAB, see “GPU Computing in MATLAB”
(Parallel Computing Toolbox).
1-91
1 Deep Networks
You can use a custom output layer in the same way as any other output layer in Deep
Learning Toolbox. This section shows how to create and train a network for regression
using a custom output layer.
Define a custom mean absolute error regression layer. To create this layer, save the file
maeRegressionLayer.m in the current folder.
Create a layer array and include the custom regression output layer
maeRegressionLayer.
layers = [
imageInputLayer([28 28 1])
convolution2dLayer(5,20)
batchNormalizationLayer
reluLayer
fullyConnectedLayer(1)
maeRegressionLayer('mae')]
layers =
6x1 Layer array with layers:
1-92
Define Custom Deep Learning Layers
|======================================================================================
| Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Base Learning
| | | (hh:mm:ss) | RMSE | Loss | Rate
|======================================================================================
| 1 | 1 | 00:00:00 | 25.52 | 22.0 | 0.010
| 2 | 50 | 00:00:04 | 12.67 | 10.2 | 0.010
| 3 | 100 | 00:00:08 | 12.23 | 9.9 | 0.010
| 4 | 150 | 00:00:12 | 11.56 | 8.9 | 0.010
| 6 | 200 | 00:00:17 | 11.72 | 8.7 | 0.010
| 7 | 250 | 00:00:21 | 11.63 | 7.8 | 0.010
| 8 | 300 | 00:00:25 | 11.09 | 8.3 | 0.010
| 9 | 350 | 00:00:30 | 9.48 | 6.9 | 0.010
| 11 | 400 | 00:00:34 | 9.86 | 7.4 | 0.010
| 12 | 450 | 00:00:38 | 8.14 | 6.0 | 0.010
| 13 | 500 | 00:00:43 | 8.46 | 6.6 | 0.010
| 15 | 550 | 00:00:47 | 7.76 | 5.1 | 0.010
| 16 | 600 | 00:00:51 | 10.24 | 7.8 | 0.010
| 17 | 650 | 00:00:56 | 8.24 | 6.1 | 0.010
| 18 | 700 | 00:01:00 | 7.93 | 5.9 | 0.010
| 20 | 750 | 00:01:04 | 7.94 | 5.6 | 0.010
| 21 | 800 | 00:01:09 | 7.51 | 5.2 | 0.010
| 22 | 850 | 00:01:13 | 7.94 | 6.4 | 0.010
| 24 | 900 | 00:01:18 | 7.16 | 5.3 | 0.010
| 25 | 950 | 00:01:22 | 8.71 | 6.7 | 0.010
| 26 | 1000 | 00:01:26 | 9.56 | 8.0 | 0.010
| 27 | 1050 | 00:01:30 | 7.65 | 5.8 | 0.010
| 29 | 1100 | 00:01:34 | 5.88 | 4.3 | 0.010
| 30 | 1150 | 00:01:38 | 7.19 | 5.4 | 0.010
| 30 | 1170 | 00:01:40 | 7.73 | 6.0 | 0.010
|======================================================================================
Evaluate the network performance by calculating the prediction error between the
predicted and actual angles of rotation.
[XTest,~,YTest] = digitTest4DArrayData;
YPred = predict(net,XTest);
predictionError = YTest - YPred;
Calculate the number of predictions within an acceptable error margin from the true
angles. Set the threshold to 10 degrees and calculate the percentage of predictions within
this threshold.
thr = 10;
numCorrect = sum(abs(predictionError) < thr);
1-93
1 Deep Networks
numTestImages = size(XTest,4);
accuracy = numCorrect/numTestImages
accuracy = 0.7840
See Also
assembleNetwork | checkLayer | getL2Factor | getLearnRateFactor |
setL2Factor | setLearnRateFactor
More About
• “Deep Learning in MATLAB” on page 1-2
• “Check Custom Layer Validity” on page 1-141
• “Define a Custom Deep Learning Layer with Learnable Parameters” on page 1-95
• “Define a Custom Classification Output Layer” on page 1-120
• “Define a Custom Regression Output Layer” on page 1-109
• “Define Custom Weighted Classification Layer” on page 1-131
1-94
Define a Custom Deep Learning Layer with Learnable Parameters
To define a custom deep learning layer, you can use the template provided in this
example, which takes you through the following steps:
1 Name the layer – Give the layer a name so it can be used in MATLAB.
2 Declare the layer properties – Specify the properties of the layer and which
parameters are learned during training.
3 Create a constructor function (optional) – Specify how to construct the layer and
initialize its properties. If you do not specify a constructor function, then the software
initializes the properties with [] at creation.
4 Create forward functions – Specify how data passes forward through the layer
(forward propagation) at prediction time and at training time.
5 Create a backward function – Specify the derivatives of the loss with respect to the
input data and the learnable parameters (backward propagation).
A PReLU layer performs a threshold operation, where for each channel, any input value
less than zero is multiplied by a scalar learned at training time.[1] For values less than
a
zero, a PReLU layer applies scaling coefficients i to each channel of the input. These
coefficients form a learnable parameter, which the layer learns during training.
This figure from [1] compares the ReLU and PReLU layer functions.
1-95
1 Deep Networks
properties
% (Optional) Layer properties.
properties (Learnable)
% (Optional) Layer learnable parameters.
methods
function layer = myLayer()
% (Optional) Create a myLayer.
% This function must have the same name as the layer.
function Z = predict(layer, X)
% Forward input data through the layer at prediction time and
% output the result.
1-96
Define a Custom Deep Learning Layer with Learnable Parameters
%
% Inputs:
% layer - Layer to forward propagate through
% X - Input data
% Output:
% Z - Output of layer forward function
1-97
1 Deep Networks
Next, rename the myLayer constructor function (the first function in the methods
section) so that it has the same name as the layer.
methods
function layer = preluLayer()
...
end
...
end
Save the layer class file in a new file named preluLayer.m. The file name must match
the layer name. To use the layer, you must save the file in the current folder or in a folder
on the MATLAB path.
If the layer has no other properties, then you can omit the properties section.
A PReLU layer does not require any additional properties, so you can remove the
properties section.
1-98
Define a Custom Deep Learning Layer with Learnable Parameters
A PReLU layer has only one learnable parameter, the scaling coefficient a. Declare this
learnable parameter in the properties (Learnable) section and call the parameter
Alpha.
properties (Learnable)
% Layer learnable parameters
% Scaling coefficient
Alpha
end
The PReLU layer constructor function requires only one input, the number of channels of
the expected input data. This input specifies the size of the learnable parameter Alpha.
Specify two input arguments named numChannels and name in the preluLayer
function. Add a comment to the top of the function that explains the syntax of the
function.
...
end
Initialize the layer properties, including learnable parameters in the constructor function.
Replace the comment % Layer constructor function goes here with code that
initializes the layer properties.
Give the layer a one-line description by setting the Description property of the layer.
Set the description to describe the type of layer and its size.
1-99
1 Deep Networks
For a PReLU layer, when the input values are negative, the layer multiplies each channel
of the input by the corresponding channel of Alpha. Initialize the learnable parameter
Alpha to be a random vector of size 1-by-1-by-numChannels. With the third dimension
specified as size numChannels, the layer can use element-wise multiplication of the input
in the forward function. Alpha is a property of the layer object, so you must assign the
vector to layer.Alpha.
With this constructor function, the command preluLayer(3) creates a PReLU layer with
three channels.
Create a function named predict that propagates the data forward through the layer at
prediction time and outputs the result. The syntax for predict is Z = predict(layer,
X), where X is the input data and Z is the output of the layer forward function. By default,
the layer uses predict as the forward function at training time. To use a different
forward function at training time, or retain a value required for the backward function,
you must also create a function named forward.
1-100
Define a Custom Deep Learning Layer with Learnable Parameters
The dimensions of X depend on the output of the previous layer. Similarly, the output Z
must have the appropriate shape for the next layer.
Built-in layers output 4-D arrays with size h-by-w-by-c-by-N, except for LSTM layers and
sequence input layers, which output 3-D arrays of size D-by-N-by-S.
Fully connected, ReLU, dropout, and softmax layers also accept 3-D inputs. When these
layers get inputs of this shape, they then output 3-D arrays of size D-by-N-by-S.
The forward function propagates the data forward through the layer at training time and
also outputs a memory value. The syntax for forward is [Z, memory] =
forward(layer, X), where memory is the output memory value. You can use this value
as an input to the backward function.
Ï x if xi > 0
f ( xi ) = Ì i
Óa i xi if xi £ 0
where
xi is the input of the nonlinear activation f on channel i, and a i is the coefficient
memory or a different forward function for training, so you can remove the forward
function from the class file. Add a comment to the top of the function that explains the
syntaxes of the function.
1-101
1 Deep Networks
function Z = predict(layer, X)
% Z = predict(layer, X) forwards the input data X through the
% layer and outputs the result Z.
The dimensions of X and Z are the same as in the forward functions. The dimensions of
dLdZ are the same as the dimensions of Z.
The dimensions and data type of dLdX are the same as the dimensions and data type of X.
The dimensions and data types of dLdW1,…,dLdWn are the same as the dimensions and
data types of W1,…,Wn respectively where Wi is the ith learnable parameter.
During the backward pass, the layer automatically updates the learnable parameters
using the derivatives dLdW1,…,dLdWn.
If you want to include a custom layer after a built-in layer in a network, then the layer
functions must accept inputs X which are the outputs of the previous layer, and backward
propagate dLdX with the same size as X. If you want to include a custom layer before a
built-in layer, then the forward functions must output arrays Z with the size expected by
the next layer. Similarly, backward must accept inputs dLdZ with the same size as Z.
∂L ∂L ∂f ( xi )
=
∂xi ∂f ( xi ) ∂xi
1-102
Define a Custom Deep Learning Layer with Learnable Parameters
where
∂L / ∂f ( x )
i is the gradient propagated from the deeper layer, and the gradient of
the activation is
∂f ( xi ) Ï 1 if xi ≥ 0
=Ì .
∂xi Óa i if x i < 0
∂L ∂L ∂f ( xij )
=Â
∂a i j ∂f ( xij ) ∂ai
where i indexes the channels, j indexes the elements over height, width, and observations,
∂L / ∂f ( xi ) is the gradient propagated from the deeper layer, and the gradient of the
and
activation is
∂f ( xi ) Ï 0 if xi ≥ 0
=Ì .
∂a i Ó xi if xi < 0
In backward, replace the output dLdW with the output dLdAlpha. In backward, the
f ( xi ) . The input dLdZ corresponds
input X corresponds to x. The input Z corresponds to
∂L / ∂f ( xi ) . The output dLdX corresponds to ∂L / ∂xi . The output dLdAlpha
to
corresponds to
∂L / ∂a i .
Add a comment to the top of the function that explains the syntaxes of the function.
1-103
1 Deep Networks
Completed Layer
View the completed layer class file.
classdef preluLayer < nnet.layer.Layer
% Example custom PReLU layer.
properties (Learnable)
% Layer learnable parameters
% Scaling coefficient
Alpha
end
methods
function layer = preluLayer(numChannels, name)
% layer = preluLayer(numChannels, name) creates a PReLU layer
% with numChannels channels and specifies the layer name.
1-104
Define a Custom Deep Learning Layer with Learnable Parameters
end
function Z = predict(layer, X)
% Z = predict(layer, X) forwards the input data X through the
% layer and outputs the result Z.
GPU Compatibility
For GPU compatibility, the layer functions must support inputs and return outputs of type
gpuArray. Any other functions the layer uses must do the same. Many MATLAB built-in
functions support gpuArray input arguments. If you call any of these functions with at
least one gpuArray input, then the function executes on the GPU and returns a
1-105
1 Deep Networks
gpuArray output. For a list of functions that execute on a GPU, see “Run MATLAB
Functions on a GPU” (Parallel Computing Toolbox). To use a GPU for deep learning, you
must also have a CUDA enabled NVIDIA GPU with compute capability 3.0 or higher. For
more information on working with GPUs in MATLAB, see “GPU Computing in MATLAB”
(Parallel Computing Toolbox).
The MATLAB functions used in predict, forward, and backward all support gpuArray
inputs, so the layer is GPU compatible.
Define a custom PReLU layer. To create this layer, save the file preluLayer.m in the
current folder.
Create an instance of the layer and check its validity using checkLayer. Specify the valid
input size to be the size of a single observation of typical input to the layer. The layer
expects 4-D array inputs, where the first three dimensions correspond to the height,
width, and number of channels of the previous layer output, and the fourth dimension
corresponds to the observations.
layer = preluLayer(20,'prelu');
validInputSize = [24 24 20];
checkLayer(layer,validInputSize,'ObservationDimension',4)
Running nnet.checklayer.TestCase
.......... .....
Done nnet.checklayer.TestCase
__________
Test Summary:
15 Passed, 0 Failed, 0 Incomplete, 6 Skipped.
Time elapsed: 66.797 seconds.
Here, the function does not detect any issues with the layer.
1-106
Define a Custom Deep Learning Layer with Learnable Parameters
[XTrain,YTrain] = digitTrain4DArrayData;
Define a custom PReLU layer. To create this layer, save the file preluLayer.m in the
current folder. Create a layer array including the custom layer preluLayer.
layers = [
imageInputLayer([28 28 1])
convolution2dLayer(5,20)
batchNormalizationLayer
preluLayer(20,'prelu')
fullyConnectedLayer(10)
softmaxLayer
classificationLayer];
options = trainingOptions('adam','MaxEpochs',10);
net = trainNetwork(XTrain,YTrain,layers,options);
1-107
1 Deep Networks
Evaluate the network performance by predicting on new data and calculating the
accuracy.
[XTest,YTest] = digitTest4DArrayData;
YPred = classify(net,XTest);
accuracy = sum(YTest==YPred)/numel(YTest)
accuracy = 0.9436
References
[1] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving deep into
rectifiers: Surpassing human-level performance on ImageNet classification." In
Proceedings of the IEEE international conference on computer vision, pp.
1026-1034. 2015.
See Also
assembleNetwork | checkLayer
More About
• “Deep Learning in MATLAB” on page 1-2
• “Check Custom Layer Validity” on page 1-141
• “Define Custom Deep Learning Layers” on page 1-78
• “Define Custom Weighted Classification Layer” on page 1-131
• “Define a Custom Classification Output Layer” on page 1-120
• “Define a Custom Regression Output Layer” on page 1-109
1-108
Define a Custom Regression Output Layer
Tip To create a regression output layer with mean squared error loss, use
regressionLayer. If you want to use a different loss function for your regression
problems, then you can define a custom regression output layer using this example as a
guide.
This example shows how to create a custom regression output layer with the mean
absolute error (MAE) loss.
To define a custom regression output layer, you can use the template provided in this
example, which takes you through the following steps:
1 Name the layer – Give the layer a name so it can be used in MATLAB.
2 Declare the layer properties – Specify the properties of the layer.
3 Create a constructor function – Specify how to construct the layer and initialize its
properties. If you do not specify a constructor function, then the software initializes
the properties with '' at creation.
4 Create a forward loss function – Specify the loss between the predictions and the
training targets.
5 Create a backward loss function – Specify the derivative of the loss with respect to
the predictions.
A regression MAE layer computes the mean absolute error loss for regression problems.
MAE loss is an error measure between two continuous random variables. For predictions
Y and training targets T, the MAE loss between Y and T is given by
1 N
Ê1 R ˆ
L= Â Á Â Yni - Tni ˜,
N n =1 Ë R i =1 ¯
where N is the number of observations and R is the number of responses.
1-109
1 Deep Networks
properties
% (Optional) Layer properties.
methods
function layer = myRegressionLayer()
% (Optional) Create a myRegressionLayer.
Next, rename the myRegressionLayer constructor function (the first function in the
methods section) so that it has the same name as the layer.
1-110
Define a Custom Regression Output Layer
methods
function layer = maeRegressionLayer()
...
end
...
end
Save the layer class file in a new file named maeRegressionLayer.m. The file name
must match the layer name. To use the layer, you must save the file in the current folder
or in a folder on the MATLAB path.
• Classes – Classes of the output layer, specified as a categorical vector, string array,
cell array of character vectors, or 'auto'. If Classes is 'auto', then the software
automatically sets the classes at training time. If you specify the string array or cell
array of character vectors str, then the software sets the classes of the output layer
to categorical(str,str). The default value is 'auto'.
1-111
1 Deep Networks
If the layer has no other properties, then you can omit the properties section.
The layer does not require any additional properties, so you can remove the properties
section.
To initialize the Name property at creation, specify the input argument name. Add a
comment to the top of the function that explains the syntax of the function.
function layer = maeRegressionLayer(name)
% layer = maeRegressionLayer(name) creates a
% mean-absolute-error regression layer and specifies the layer
% name.
...
end
Replace the comment % Layer constructor function goes here with code that
initializes the layer properties.
Give the layer a one-line description by setting the Description property of the layer.
Set the Name property to the input argument name. Set the description to describe the
type of layer and its size.
1-112
Define a Custom Regression Output Layer
For regression problems, the dimensions of T also depend on the type of problem.
For example, if the network defines an image regression network with one response and
has mini-batches of size 50, then T is a 4-D array of size 1-by-1-by-1-by-50.
The size of Y depends on the output of the previous layer. To ensure that Y is the same
size as T, you must include a layer that outputs the correct size before the output layer.
For example, for image regression with R responses, to ensure that Y is a 4-D array of the
correct size, you can include a fully connected layer of size R before the output layer.
A regression MAE layer computes the mean absolute error loss for regression problems.
MAE loss is an error measure between two continuous random variables. For predictions
Y and training targets T, the MAE loss between Y and T is given by
1 N
Ê1 R ˆ
L= Â Á Â Yni - Tni ˜,
N n =1 Ë R i =1 ¯
1-113
1 Deep Networks
The inputs Y and T correspond to Y and T in the equation, respectively. The output loss
corresponds to L. To ensure that loss is scalar, output the mean loss over the mini-batch.
Add a comment to the top of the function that explains the syntaxes of the function.
function loss = forwardLoss(layer, Y, T)
% loss = forwardLoss(layer, Y, T) returns the MAE loss between
% the predictions Y and the training targets T.
% Calculate MAE.
R = size(Y,3);
meanAbsoluteError = sum(abs(Y-T),3)/R;
Create a function named backwardLoss that returns the derivatives of the MAE loss
with respect to the predictions Y. The syntax for backwardLoss is loss =
backwardLoss(layer, Y, T), where Y is the output of the previous layer and T
contains the training targets.
The derivative of the MAE loss with respect to the predictions Y is given by
∂L 1
= sign(Yi - Ti ),
∂Yi NR
where N is the number of observations and R is the number of responses. Add a comment
to the top of the function that explains the syntaxes of the function.
function dLdY = backwardLoss(layer, Y, T)
% Returns the derivatives of the MAE loss with respect to the predictions Y
R = size(Y,3);
1-114
Define a Custom Regression Output Layer
N = size(Y,4);
dLdY = sign(Y-T)/(N*R);
end
Completed Layer
View the completed regression output layer class file.
classdef maeRegressionLayer < nnet.layer.RegressionLayer
% Example custom regression layer with mean-absolute-error loss.
methods
function layer = maeRegressionLayer(name)
% layer = maeRegressionLayer(name) creates a
% mean-absolute-error regression layer and specifies the layer
% name.
% Calculate MAE.
R = size(Y,3);
meanAbsoluteError = sum(abs(Y-T),3)/R;
R = size(Y,3);
N = size(Y,4);
dLdY = sign(Y-T)/(N*R);
end
1-115
1 Deep Networks
end
end
GPU Compatibility
For GPU compatibility, the layer functions must support inputs and return outputs of type
gpuArray. Any other functions the layer uses must do the same. Many MATLAB built-in
functions support gpuArray input arguments. If you call any of these functions with at
least one gpuArray input, then the function executes on the GPU and returns a
gpuArray output. For a list of functions that execute on a GPU, see “Run MATLAB
Functions on a GPU” (Parallel Computing Toolbox). To use a GPU for deep learning, you
must also have a CUDA enabled NVIDIA GPU with compute capability 3.0 or higher. For
more information on working with GPUs in MATLAB, see “GPU Computing in MATLAB”
(Parallel Computing Toolbox).
Define a custom mean absolute error regression layer. To create this layer, save the file
maeRegressionLayer.m in the current folder. Create an instance of the layer.
layer = maeRegressionLayer('mae');
Check the layer is valid using checkLayer. Specify the valid input size to be the size of a
single observation of typical input to the layer. The layer expects a 1-by-1-by-R-by-N array
inputs, where R is the number of responses, and N is the number of observations in the
mini-batch.
validInputSize = [1 1 10];
checkLayer(layer,validInputSize,'ObservationDimension',4);
Running nnet.checklayer.OutputLayerTestCase
.......... ...
Done nnet.checklayer.OutputLayerTestCase
__________
1-116
Define a Custom Regression Output Layer
Test Summary:
13 Passed, 0 Failed, 0 Incomplete, 4 Skipped.
Time elapsed: 0.19366 seconds.
The test summary reports the number of passed, failed, incomplete, and skipped tests.
[trainImages,~,trainAngles] = digitTrain4DArrayData;
layers = [
imageInputLayer([28 28 1])
convolution2dLayer(5,20)
batchNormalizationLayer
reluLayer
fullyConnectedLayer(1)
maeRegressionLayer('mae')]
layers =
6x1 Layer array with layers:
1-117
1 Deep Networks
options = trainingOptions('sgdm');
net = trainNetwork(trainImages,trainAngles,layers,options);
Evaluate the network performance by calculating the prediction error between the
predicted and actual angles of rotation.
[testImages,~,testAngles] = digitTest4DArrayData;
predictedTestAngles = predict(net,testImages);
predictionError = testAngles - predictedTestAngles;
1-118
See Also
Calculate the number of predictions within an acceptable error margin from the true
angles. Set the threshold to be 10 degrees and calculate the percentage of predictions
within this threshold.
thr = 10;
numCorrect = sum(abs(predictionError) < thr);
numTestImages = size(testImages,4);
accuracy = numCorrect/numTestImages
accuracy = 0.7840
See Also
assembleNetwork | checkLayer | regressionLayer
More About
• “Deep Learning in MATLAB” on page 1-2
• “Define Custom Deep Learning Layers” on page 1-78
• “Define Custom Weighted Classification Layer” on page 1-131
• “Define a Custom Deep Learning Layer with Learnable Parameters” on page 1-95
1-119
1 Deep Networks
Tip To construct a classification output layer with cross entropy loss for k mutually
exclusive classes, use classificationLayer. If you want to use a different loss
function for your classification problems, then you can define a custom classification
output layer using this example as a guide.
This example shows how to define a custom classification output layer with the sum of
squares error (SSE) loss and use it in a convolutional neural network.
To define a custom classification output layer, you can use the template provided in this
example, which takes you through the following steps:
1 Name the layer – Give the layer a name so it can be used in MATLAB.
2 Declare the layer properties – Specify the properties of the layer.
3 Create a constructor function – Specify how to construct the layer and initialize its
properties. If you do not specify a constructor function, then the software initializes
the properties with '' at creation.
4 Create a forward loss function – Specify the loss between the predictions and the
training targets.
5 Create a backward loss function – Specify the derivative of the loss with respect to
the predictions.
A classification SSE layer computes the sum of squares error loss for classification
problems. SSE is an error measure between two continuous random variables. For
predictions Y and training targets T, the SSE loss between Y and T is given by
N K
1
L=
N
ÂÂ(Y ni - Tni ) 2 ,
n =1 i =1
1-120
Define a Custom Classification Output Layer
properties
% (Optional) Layer properties.
methods
function layer = myClassificationLayer()
% (Optional) Create a myClassificationLayer.
1-121
1 Deep Networks
methods
function layer = sseClassificationLayer()
...
end
...
end
Save the layer class file in a new file named sseClassificationLayer.m. The file
name must match the layer name. To use the layer, you must save the file in the current
folder or in a folder on the MATLAB path.
1-122
Define a Custom Classification Output Layer
• Classes – Classes of the output layer, specified as a categorical vector, string array,
cell array of character vectors, or 'auto'. If Classes is 'auto', then the software
automatically sets the classes at training time. If you specify the string array or cell
array of character vectors str, then the software sets the classes of the output layer
to categorical(str,str). The default value is 'auto'.
If the layer has no other properties, then you can omit the properties section.
In this example, the layer does not require any additional properties, so you can remove
the properties section.
Specify the input argument name to assign to the Name property at creation. Add a
comment to the top of the function that explains the syntax of the function.
function layer = sseClassificationLayer(name)
% layer = sseClassificationLayer(name) creates a sum of squares
% error classification layer and specifies the layer name.
...
end
Replace the comment % Layer constructor function goes here with code that
initializes the layer properties.
Give the layer a one-line description by setting the Description property of the layer.
Set the Name property to the input argument name.
function layer = sseClassificationLayer(name)
% layer = sseClassificationLayer(name) creates a sum of squares
% error classification layer and specifies the layer name.
1-123
1 Deep Networks
The size of Y depends on the output of the previous layer. To ensure that Y is the same
size as T, you must include a layer that outputs the correct size before the output layer.
For example, to ensure that Y is a 4-D array of prediction scores for K classes, you can
include a fully connected layer of size K followed by a softmax layer before the output
layer.
A classification SSE layer computes the sum of squares error loss for classification
problems. SSE is an error measure between two continuous random variables. For
predictions Y and training targets T, the SSE loss between Y and T is given by
1-124
Define a Custom Classification Output Layer
N K
1
L=
N
ÂÂ(Y ni - Tni ) 2 ,
n =1 i =1
The inputs Y and T correspond to Y and T in the equation, respectively. The output loss
corresponds to L. Add a comment to the top of the function that explains the syntaxes of
the function.
Create a function named backwardLoss that returns the derivatives of the SSE loss with
respect to the predictions Y. The syntax for backwardLoss is loss =
backwardLoss(layer, Y, T), where Y is the output of the previous layer and T
represents the training targets.
The derivative of the SSE loss with respect to the predictions Y is given by
dL 2
= (Yi - Ti )
d Yi N
where N is the number of observations. Add a comment to the top of the function that
explains the syntaxes of the function.
1-125
1 Deep Networks
N = size(Y,4);
dLdY = 2*(Y-T)/N;
end
Completed Layer
View the completed classification output layer class file.
classdef sseClassificationLayer < nnet.layer.ClassificationLayer
% Example custom classification layer with sum of squares error loss.
methods
function layer = sseClassificationLayer(name)
% layer = sseClassificationLayer(name) creates a sum of squares
% error classification layer and specifies the layer name.
N = size(Y,4);
1-126
Define a Custom Classification Output Layer
dLdY = 2*(Y-T)/N;
end
end
end
GPU Compatibility
For GPU compatibility, the layer functions must support inputs and return outputs of type
gpuArray. Any other functions the layer uses must do the same. Many MATLAB built-in
functions support gpuArray input arguments. If you call any of these functions with at
least one gpuArray input, then the function executes on the GPU and returns a
gpuArray output. For a list of functions that execute on a GPU, see “Run MATLAB
Functions on a GPU” (Parallel Computing Toolbox). To use a GPU for deep learning, you
must also have a CUDA enabled NVIDIA GPU with compute capability 3.0 or higher. For
more information on working with GPUs in MATLAB, see “GPU Computing in MATLAB”
(Parallel Computing Toolbox).
The MATLAB functions used in forwardLoss, and backwardLoss all support gpuArray
inputs, so the layer is GPU compatible.
Define a custom sum-of-squares error classification layer. To create this layer, save the file
sseClassificationLayer.m in the current folder. Create an instance of the layer.
layer = sseClassificationLayer('sse');
Check the layer is valid using checkLayer. Specify the valid input size to be the size of a
single observation of typical input to the layer. The layer expects a 1-by-1-by-K-by-N array
inputs, where K is the number of classes, and N is the number of observations in the mini-
batch.
validInputSize = [1 1 10];
checkLayer(layer,validInputSize,'ObservationDimension',4);
Running nnet.checklayer.OutputLayerTestCase
.......... ...
1-127
1 Deep Networks
Done nnet.checklayer.OutputLayerTestCase
__________
Test Summary:
13 Passed, 0 Failed, 0 Incomplete, 4 Skipped.
Time elapsed: 0.28916 seconds.
The test summary reports the number of passed, failed, incomplete, and skipped tests.
Define a custom sum-of-squares error classification layer. To create this layer, save the file
sseClassificationLayer.m in the current folder. Create an instance of the layer.
Create a layer array including the custom classification output layer
sseClassificationLayer.
layers = [
imageInputLayer([28 28 1])
convolution2dLayer(5,20)
batchNormalizationLayer
reluLayer
fullyConnectedLayer(10)
softmaxLayer
sseClassificationLayer('sse')]
layers =
7x1 Layer array with layers:
1-128
Define a Custom Classification Output Layer
options = trainingOptions('sgdm');
net = trainNetwork(XTrain,YTrain,layers,options);
Evaluate the network performance by making predictions on new data and calculating the
accuracy.
[XTest,YTest] = digitTest4DArrayData;
YPred = classify(net, XTest);
accuracy = mean(YTest == YPred)
1-129
1 Deep Networks
accuracy = 0.9856
See Also
assembleNetwork | checkLayer | classificationLayer
More About
• “Deep Learning in MATLAB” on page 1-2
• “Define Custom Deep Learning Layers” on page 1-78
• “Define Custom Weighted Classification Layer” on page 1-131
• “Define a Custom Deep Learning Layer with Learnable Parameters” on page 1-95
• “Define a Custom Regression Output Layer” on page 1-109
1-130
Define Custom Weighted Classification Layer
Tip To construct a classification output layer with cross entropy loss for k mutually
exclusive classes, use classificationLayer. If you want to use a different loss
function for your classification problems, then you can define a custom classification
output layer using this example as a guide.
This example shows how to define and create a custom weighted classification output
layer with weighted cross entropy loss. Use a weighted classification layer for
classification problems with an imbalanced distribution of classes. For an example
showing how to use a weighted classification layer in a network, see “Speech Command
Recognition Using Deep Learning”.
To define a custom classification output layer, you can use the template provided in this
example, which takes you through the following steps:
1 Name the layer – Give the layer a name so it can be used in MATLAB.
2 Declare the layer properties – Specify the properties of the layer.
3 Create a constructor function – Specify how to construct the layer and initialize its
properties. If you do not specify a constructor function, then the software initializes
the properties with '' at creation.
4 Create a forward loss function – Specify the loss between the predictions and the
training targets.
5 Create a backward loss function – Specify the derivative of the loss with respect to
the predictions.
A weighted classification layer computes the weighted cross entropy loss for classification
problems. Weighted cross entropy is an error measure between two continuous random
variables. For prediction scores Y and training targets T, the weighted cross entropy loss
between Y and T is given by
N K
1
L=-
N
 Âw T i ni log(Yni ),
n =1 i =1
1-131
1 Deep Networks
properties
% (Optional) Layer properties.
methods
function layer = myClassificationLayer()
% (Optional) Create a myClassificationLayer.
1-132