Nnet - Ug 1 150 PDF

Deep Learning Toolbox™
User's Guide
Mark Hudson Beale

Martin T. Hagan
Howard B. Demuth
R2018b
How to Contact MathWorks
Latest news: www.mathworks.com
Sales and services: www.mathworks.com/sales_and_services
User community: www.mathworks.com/matlabcentral
Technical support: www.mathworks.com/support/contact_us
Phone: 508-647-7000
The MathWorks, Inc.

3 Apple Hill Drive
Natick, MA 01760-2098
Deep Learning Toolbox™ User's Guide
© COPYRIGHT 1992–2018 by The MathWorks, Inc.
The software described in this document is furnished under a license agreement. The software may be used
or copied only under the terms of the license agreement. No part of this manual may be photocopied or
reproduced in any form without prior written consent from The MathWorks, Inc.
FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by,
for, or through the federal government of the United States. By accepting delivery of the Program or
Documentation, the government hereby agrees that this software or documentation qualifies as commercial
computer software or commercial computer software documentation as such terms are used or defined in
FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and conditions of this
Agreement and only those rights specified in this Agreement, shall pertain to and govern the use,
modification, reproduction, release, performance, display, and disclosure of the Program and
Documentation by the federal government (or other entity acquiring for or through the federal government)
and shall supersede any conflicting contractual terms or conditions. If this License fails to meet the
government's needs or is inconsistent in any respect with federal procurement law, the government agrees
to return the Program and Documentation, unused, to The MathWorks, Inc.
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See
www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand
names may be trademarks or registered trademarks of their respective holders.
Patents
MathWorks products are protected by one or more U.S. patents. Please see
www.mathworks.com/patents for more information.
Revision History
June 1992 First printing
April 1993 Second printing
January 1997 Third printing
July 1997 Fourth printing
January 1998 Fifth printing Revised for Version 3 (Release 11)
September 2000 Sixth printing Revised for Version 4 (Release 12)
June 2001 Seventh printing Minor revisions (Release 12.1)
July 2002 Online only Minor revisions (Release 13)
January 2003 Online only Minor revisions (Release 13SP1)
June 2004 Online only Revised for Version 4.0.3 (Release 14)
October 2004 Online only Revised for Version 4.0.4 (Release 14SP1)
October 2004 Eighth printing Revised for Version 4.0.4
March 2005 Online only Revised for Version 4.0.5 (Release 14SP2)
March 2006 Online only Revised for Version 5.0 (Release 2006a)
September 2006 Ninth printing Minor revisions (Release 2006b)
March 2007 Online only Minor revisions (Release 2007a)
September 2007 Online only Revised for Version 5.1 (Release 2007b)
October 2008 Online only Revised for Version 6.0.1 (Release 2008b)
March 2009 Online only Revised for Version 6.0.2 (Release 2009a)
September 2009 Online only Revised for Version 6.0.3 (Release 2009b)
April 2011 Online only Revised for Version 7.0.1 (Release 2011a)
September 2011 Online only Revised for Version 7.0.2 (Release 2011b)
October 2014 Online only Revised for Version 8.2.1 (Release 2014b)
Contents
Deep Networks
1
Deep Learning in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
What Is Deep Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Try Deep Learning in 10 Lines of MATLAB Code . . . . . . . . . . . 1-5
Start Deep Learning Faster Using Transfer Learning . . . . . . . 1-7
Train Classifiers Using Features Extracted from
Pretrained Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
Deep Learning with Big Data on CPUs, GPUs, in Parallel, and on
the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
Try Deep Learning in 10 Lines of MATLAB Code . . . . . . . . . . 1-10
Deep Learning with Big Data on GPUs and in Parallel . . . . . 1-13

Training with Multiple GPUs . . . . . . . . . . . . . . . . . . . . . . . . . 1-15
Deep Learning in the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . 1-16
Fetch and Preprocess Data in Background . . . . . . . . . . . . . . 1-17
Construct Deep Network Using Autoencoders . . . . . . . . . . . . 1-18
Pretrained Convolutional Neural Networks . . . . . . . . . . . . . . 1-21

Load Pretrained Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 1-22
Compare Pretrained Networks . . . . . . . . . . . . . . . . . . . . . . . 1-23
Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-25
Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-25
Import and Export Networks . . . . . . . . . . . . . . . . . . . . . . . . 1-26
Learn About Convolutional Neural Networks . . . . . . . . . . . . . 1-29
List of Deep Learning Layers . . . . . . . . . . . . . . . . . . . . . . . . . . 1-33

Layer Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-33
Specify Layers of Convolutional Neural Network . . . . . . . . . . 1-40

Image Input Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-41
v
Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-41
Batch Normalization Layer . . . . . . . . . . . . . . . . . . . . . . . . . . 1-46
ReLU Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-47
Cross Channel Normalization (Local Response Normalization)
Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-48
Max and Average Pooling Layers . . . . . . . . . . . . . . . . . . . . . 1-48
Dropout Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-49
Fully Connected Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-49
Output Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-50
Set Up Parameters and Train Convolutional Neural

Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-55
Specify Solver and Maximum Number of Epochs . . . . . . . . . 1-55
Specify and Modify Learning Rate . . . . . . . . . . . . . . . . . . . . 1-56
Specify Validation Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-57
Select Hardware Resource . . . . . . . . . . . . . . . . . . . . . . . . . . 1-57
Save Checkpoint Networks and Resume Training . . . . . . . . . 1-58
Set Up Parameters in Convolutional and Fully Connected
Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-58
Train Your Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-59
Deep Learning Tips and Tricks . . . . . . . . . . . . . . . . . . . . . . . . . 1-60

Choose Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . 1-60
Choose Training Options . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-62
Improve Training Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . 1-64
Fix Errors in Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-65
Prepare and Preprocess Data . . . . . . . . . . . . . . . . . . . . . . . . 1-67
Use Available Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-69
Resume Training from Checkpoint Network . . . . . . . . . . . . . . 1-71
Define Custom Deep Learning Layers . . . . . . . . . . . . . . . . . . . 1-78

Layer Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-79
Intermediate Layer Architecture . . . . . . . . . . . . . . . . . . . . . . 1-82
Check Validity of Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-86
Output Layer Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 1-88
Define a Custom Deep Learning Layer with Learnable

Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-95
Layer with Learnable Parameters Template . . . . . . . . . . . . . 1-96
Name the Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-97
Declare Properties and Learnable Parameters . . . . . . . . . . . 1-98
Create Constructor Function . . . . . . . . . . . . . . . . . . . . . . . . 1-99
vi Contents
Create Forward Functions . . . . . . . . . . . . . . . . . . . . . . . . . 1-100
Create Backward Function . . . . . . . . . . . . . . . . . . . . . . . . . 1-102
Completed Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-104
GPU Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-105
Check Validity of Layer Using checkLayer . . . . . . . . . . . . . . 1-106
Include Custom Layer in Network . . . . . . . . . . . . . . . . . . . . 1-107
Define a Custom Regression Output Layer . . . . . . . . . . . . . . 1-109

Regression Output Layer Template . . . . . . . . . . . . . . . . . . . 1-109
Name the Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-110
Declare Layer Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 1-111
Create Constructor Function . . . . . . . . . . . . . . . . . . . . . . . 1-112
Create Forward Loss Function . . . . . . . . . . . . . . . . . . . . . . 1-113
Create Backward Loss Function . . . . . . . . . . . . . . . . . . . . . 1-114
Completed Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-115
Check Output Layer Validity . . . . . . . . . . . . . . . . . . . . . . . . 1-116
Include Custom Regression Output Layer in Network . . . . . 1-117
Define a Custom Classification Output Layer . . . . . . . . . . . . 1-120

Classification Output Layer Template . . . . . . . . . . . . . . . . . 1-120
Name the Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-121
Completed Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-126
Include Custom Classification Output Layer in Network . . . 1-128
Define Custom Weighted Classification Layer . . . . . . . . . . . . 1-131

Classification Output Layer Template . . . . . . . . . . . . . . . . . 1-132
Name the Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-133
Completed Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-138
Check Custom Layer Validity . . . . . . . . . . . . . . . . . . . . . . . . . 1-141

Check Layer Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-141
vii
List of Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-143
Generated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-145
Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-146
Long Short-Term Memory Networks . . . . . . . . . . . . . . . . . . . 1-154

LSTM Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . 1-154
Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-157
Classification and Prediction . . . . . . . . . . . . . . . . . . . . . . . . 1-158
Sequence Padding, Truncation, and Splitting . . . . . . . . . . . 1-158
Normalize Sequence Data . . . . . . . . . . . . . . . . . . . . . . . . . 1-160
Out of Memory Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-161
LSTM Layer Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 1-161
Preprocess Images for Deep Learning . . . . . . . . . . . . . . . . . . 1-166

Resize Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-166
Augment Images for Training . . . . . . . . . . . . . . . . . . . . . . . 1-167
Advanced Image Preprocessing . . . . . . . . . . . . . . . . . . . . . 1-168
Develop Custom Mini-Batch Datastore . . . . . . . . . . . . . . . . . 1-170

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-170
Implement MiniBatchable Datastore . . . . . . . . . . . . . . . . . . 1-171
Add Support for Shuffling . . . . . . . . . . . . . . . . . . . . . . . . . . 1-175
Add Support for Parallel and Multi-GPU Training . . . . . . . . 1-176
Add Support for Background Dispatch . . . . . . . . . . . . . . . . 1-177
Validate Custom Mini-Batch Datastore . . . . . . . . . . . . . . . . 1-179
Define Custom Mini-Batch Datastore For Super-Resolution

Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-180
Deep Network Designer

2
Transfer Learning with Deep Network Designer . . . . . . . . . . . . 2-2
Choose a Pretrained Network . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Import Network into Deep Network Designer . . . . . . . . . . . . . 2-3
Edit Network for Transfer Learning . . . . . . . . . . . . . . . . . . . . 2-4
Check Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
Export Network for Training . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
Train Network Exported from Deep Network Designer . . . . . 2-11
viii Contents
Build Networks with Deep Network Designer . . . . . . . . . . . . . 2-16
Open the App and Import Networks . . . . . . . . . . . . . . . . . . . 2-16
Create and Edit Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
Check Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18
Export Network for Training . . . . . . . . . . . . . . . . . . . . . . . . . 2-19
Deep Learning in the Cloud

3
Scale Up Deep Learning in Parallel and in the Cloud . . . . . . . . 3-2
Deep Learning on Multiple GPUs . . . . . . . . . . . . . . . . . . . . . . 3-2
Deep Learning in the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
Advanced Support for Fast Multi-Node GPU
Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5
Deep Learning with MATLAB on Multiple GPUs . . . . . . . . . . . . 3-7

Select Particular GPUs to Use for Training . . . . . . . . . . . . . . . 3-7
Train Network in the Cloud Using Built-in Parallel Support . . . 3-8
Neural Network Design Book
Neural Network Objects, Data, and Training Styles

4
Workflow for Neural Network Design . . . . . . . . . . . . . . . . . . . . 4-2
Four Levels of Neural Network Design . . . . . . . . . . . . . . . . . . . 4-4
Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5

Simple Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
Transfer Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
Neuron with Vector Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
ix
Neural Network Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 4-11
One Layer of Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11
Multiple Layers of Neurons . . . . . . . . . . . . . . . . . . . . . . . . . 4-13
Input and Output Processing Functions . . . . . . . . . . . . . . . . 4-15
Create Neural Network Object . . . . . . . . . . . . . . . . . . . . . . . . . 4-17
Configure Neural Network Inputs and Outputs . . . . . . . . . . . 4-21
Understanding Deep Learning Toolbox Data Structures . . . . 4-23

Simulation with Concurrent Inputs in a Static Network . . . . . 4-23
Simulation with Sequential Inputs in a Dynamic Network . . . 4-24
Simulation with Concurrent Inputs in a Dynamic Network . . 4-26
Neural Network Training Concepts . . . . . . . . . . . . . . . . . . . . . 4-28

Incremental Training with adapt . . . . . . . . . . . . . . . . . . . . . . 4-28
Batch Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-31
Training Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-34
Multilayer Shallow Neural Networks and

Backpropagation Training
5
Multilayer Shallow Neural Networks and Backpropagation
Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2
Multilayer Shallow Neural Network Architecture . . . . . . . . . . . 5-4

Neuron Model (logsig, tansig, purelin) . . . . . . . . . . . . . . . . . . 5-4
Feedforward Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . 5-5
Prepare Data for Multilayer Shallow Neural Networks . . . . . . 5-8
Choose Neural Network Input-Output Processing

Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
Representing Unknown or Don't-Care Targets . . . . . . . . . . . 5-11
Divide Data for Optimal Neural Network Training . . . . . . . . . 5-12
x Contents
Create, Configure, and Initialize Multilayer Shallow Neural
Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14
Other Related Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
Initializing Weights (init) . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
Train and Apply Multilayer Shallow Neural Networks . . . . . . 5-17

Training Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-18
Training Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-20
Use the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-22
Analyze Shallow Neural Network Performance After

Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24
Improving Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-29
Limitations and Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-30
Dynamic Neural Networks

6
Introduction to Dynamic Neural Networks . . . . . . . . . . . . . . . . 6-2
How Dynamic Neural Networks Work . . . . . . . . . . . . . . . . . . . . 6-3

Feedforward and Recurrent Neural Networks . . . . . . . . . . . . . 6-3
Applications of Dynamic Networks . . . . . . . . . . . . . . . . . . . . 6-10
Dynamic Network Structures . . . . . . . . . . . . . . . . . . . . . . . . 6-10
Dynamic Network Training . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11
Design Time Series Time-Delay Neural Networks . . . . . . . . . 6-14

Prepare Input and Layer Delay States . . . . . . . . . . . . . . . . . . 6-18
Design Time Series Distributed Delay Neural Networks . . . . 6-20
Design Time Series NARX Feedback Neural Networks . . . . . 6-23

Multiple External Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 6-30
Design Layer-Recurrent Neural Networks . . . . . . . . . . . . . . . 6-31
Create Reference Model Controller with MATLAB Script . . . 6-34
Multiple Sequences with Dynamic Neural Networks . . . . . . . 6-41
xi
Neural Network Time-Series Utilities . . . . . . . . . . . . . . . . . . . 6-42
Train Neural Networks with Error Weights . . . . . . . . . . . . . . . 6-44
Normalize Errors of Multiple Outputs . . . . . . . . . . . . . . . . . . . 6-47
Multistep Neural Network Prediction . . . . . . . . . . . . . . . . . . . 6-52

Set Up in Open-Loop Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 6-52
Multistep Closed-Loop Prediction From Initial Conditions . . . 6-53
Multistep Closed-Loop Prediction Following Known
Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-53
Following Closed-Loop Simulation with Open-Loop
Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-54
Control Systems
7
Introduction to Neural Network Control Systems . . . . . . . . . . 7-2
Design Neural Network Predictive Controller in Simulink . . . 7-4

System Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
Predictive Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5
Use the Neural Network Predictive Controller Block . . . . . . . . 7-6
Design NARMA-L2 Neural Controller in Simulink . . . . . . . . . 7-14

Identification of the NARMA-L2 Model . . . . . . . . . . . . . . . . . 7-14
NARMA-L2 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16
Use the NARMA-L2 Controller Block . . . . . . . . . . . . . . . . . . 7-18
Design Model-Reference Neural Controller in Simulink . . . . 7-23

Use the Model Reference Controller Block . . . . . . . . . . . . . . 7-24
Import-Export Neural Network Simulink Control Systems . . 7-31

Import and Export Networks . . . . . . . . . . . . . . . . . . . . . . . . 7-31
Import and Export Training Data . . . . . . . . . . . . . . . . . . . . . 7-35
xii Contents
Radial Basis Neural Networks
8
Introduction to Radial Basis Neural Networks . . . . . . . . . . . . . 8-2
Important Radial Basis Functions . . . . . . . . . . . . . . . . . . . . . . 8-2
Radial Basis Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3

Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3
Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4
Exact Design (newrbe) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6
More Efficient Design (newrb) . . . . . . . . . . . . . . . . . . . . . . . . 8-7
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8
Probabilistic Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 8-9

Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-9
Design (newpnn) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-10
Generalized Regression Neural Networks . . . . . . . . . . . . . . . . 8-12

Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-12
Design (newgrnn) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-14
Self-Organizing and Learning Vector Quantization

Networks
9
Introduction to Self-Organizing and LVQ . . . . . . . . . . . . . . . . . 9-2
Important Self-Organizing and LVQ Functions . . . . . . . . . . . . . 9-2
Cluster with a Competitive Neural Network . . . . . . . . . . . . . . . 9-3

Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3
Create a Competitive Neural Network . . . . . . . . . . . . . . . . . . 9-4
Kohonen Learning Rule (learnk) . . . . . . . . . . . . . . . . . . . . . . . 9-5
Bias Learning Rule (learncon) . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6
Graphical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8
Cluster with Self-Organizing Map Neural Network . . . . . . . . . 9-9

Topologies (gridtop, hextop, randtop) . . . . . . . . . . . . . . . . . . 9-11
Distance Functions (dist, linkdist, mandist, boxdist) . . . . . . . 9-14
xiii
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17
Create a Self-Organizing Map Neural Network
(selforgmap) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-18
Training (learnsomb) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-19
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-22
Learning Vector Quantization (LVQ) Neural Networks . . . . . 9-34

Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-34
Creating an LVQ Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-35
LVQ1 Learning Rule (learnlv1) . . . . . . . . . . . . . . . . . . . . . . . 9-38
Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-39
Supplemental LVQ2.1 Learning Rule (learnlv2) . . . . . . . . . . . 9-41
Adaptive Filters and Adaptive Training

10
Adaptive Neural Network Filters . . . . . . . . . . . . . . . . . . . . . . . 10-2
Adaptive Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
Linear Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3
Adaptive Linear Network Architecture . . . . . . . . . . . . . . . . . 10-4
Least Mean Square Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6
LMS Algorithm (learnwh) . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-7
Adaptive Filtering (adapt) . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-7
Advanced Topics
11
Neural Networks with Parallel and GPU Computing . . . . . . . 11-2
Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Modes of Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Distributed Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3
Single GPU Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-5
Distributed GPU Computing . . . . . . . . . . . . . . . . . . . . . . . . . 11-8
Parallel Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-10
Parallel Availability, Fallbacks, and Feedback . . . . . . . . . . . 11-10
xiv Contents
Optimize Neural Network Training Speed and Memory . . . . 11-12
Memory Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12
Fast Elliot Sigmoid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12
Choose a Multilayer Neural Network Training Function . . . 11-16

SIN Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-17
PARITY Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-19
ENGINE Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-22
CANCER Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-24
CHOLESTEROL Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . 11-26
DIABETES Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-28
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-30
Improve Shallow Neural Network Generalization and Avoid

Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-32
Retraining Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 11-34
Multiple Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 11-35
Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-36
Index Data Division (divideind) . . . . . . . . . . . . . . . . . . . . . . 11-37
Random Data Division (dividerand) . . . . . . . . . . . . . . . . . . . 11-37
Block Data Division (divideblock) . . . . . . . . . . . . . . . . . . . . 11-37
Interleaved Data Division (divideint) . . . . . . . . . . . . . . . . . . 11-38
Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-38
Summary and Discussion of Early Stopping and
Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-41
Posttraining Analysis (regression) . . . . . . . . . . . . . . . . . . . . 11-43
Edit Shallow Neural Network Properties . . . . . . . . . . . . . . . . 11-46

Custom Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-46
Network Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-47
Network Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-57
Custom Neural Network Helper Functions . . . . . . . . . . . . . . 11-60
Automatically Save Checkpoints During Neural Network

Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-61
Deploy Trained Neural Network Functions . . . . . . . . . . . . . . 11-63

Deployment Functions and Tools for Trained Networks . . . . 11-63
Generate Neural Network Functions for Application
Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-64
Generate Simulink Diagrams . . . . . . . . . . . . . . . . . . . . . . . 11-67
xv
Deploy Training of Neural Networks . . . . . . . . . . . . . . . . . . . 11-68
Historical Neural Networks

12
Historical Neural Networks Overview . . . . . . . . . . . . . . . . . . . 12-2
Perceptron Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3

Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3
Perceptron Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-5
Create a Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-6
Perceptron Learning Rule (learnp) . . . . . . . . . . . . . . . . . . . . 12-8
Training (train) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-10
Limitations and Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . 12-15
Linear Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-18

Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-18
Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-19
Least Mean Square Error . . . . . . . . . . . . . . . . . . . . . . . . . . 12-22
Linear System Design (newlind) . . . . . . . . . . . . . . . . . . . . . 12-23
Linear Networks with Delays . . . . . . . . . . . . . . . . . . . . . . . 12-24
LMS Algorithm (learnwh) . . . . . . . . . . . . . . . . . . . . . . . . . . 12-26
Linear Classification (train) . . . . . . . . . . . . . . . . . . . . . . . . 12-28
Limitations and Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . 12-30
Neural Network Object Reference

13
Neural Network Object Properties . . . . . . . . . . . . . . . . . . . . . . 13-2
General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2
Subobject Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-8
Weight and Bias Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-12
Neural Network Subobject Properties . . . . . . . . . . . . . . . . . . 13-14

Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-14
xvi Contents
Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-16
Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-22
Biases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-24
Input Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-25
Layer Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-27
Bibliography
14
Deep Learning Toolbox Bibliography . . . . . . . . . . . . . . . . . . . . 14-2
Mathematical Notation
A
Mathematics and Code Equivalents . . . . . . . . . . . . . . . . . . . . . . A-2
Mathematics Notation to MATLAB Notation . . . . . . . . . . . . . . A-2
Figure Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Neural Network Blocks for the Simulink Environment

B
Neural Network Simulink Block Library . . . . . . . . . . . . . . . . . . B-2
Transfer Function Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Net Input Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Weight Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Processing Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4
Deploy Neural Network Simulink Diagrams . . . . . . . . . . . . . . . B-5

Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5
Suggested Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-7
Generate Functions and Objects . . . . . . . . . . . . . . . . . . . . . . . B-8
xvii
Code Notes
C
Deep Learning Toolbox Data Conventions . . . . . . . . . . . . . . . . . C-2
Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2
xviii Contents
1
Deep Networks
• “Deep Learning in MATLAB” on page 1-2

• “Try Deep Learning in 10 Lines of MATLAB Code” on page 1-10
• “Deep Learning with Big Data on GPUs and in Parallel” on page 1-13
• “Construct Deep Network Using Autoencoders” on page 1-18
• “Pretrained Convolutional Neural Networks” on page 1-21
• “Learn About Convolutional Neural Networks” on page 1-29
• “List of Deep Learning Layers” on page 1-33
• “Specify Layers of Convolutional Neural Network” on page 1-40
• “Set Up Parameters and Train Convolutional Neural Network” on page 1-55
• “Deep Learning Tips and Tricks” on page 1-60
• “Resume Training from Checkpoint Network” on page 1-71
• “Define Custom Deep Learning Layers” on page 1-78
• “Define a Custom Deep Learning Layer with Learnable Parameters” on page 1-95
• “Define a Custom Regression Output Layer” on page 1-109
• “Define a Custom Classification Output Layer” on page 1-120
• “Define Custom Weighted Classification Layer” on page 1-131
• “Check Custom Layer Validity” on page 1-141
• “Long Short-Term Memory Networks” on page 1-154
• “Preprocess Images for Deep Learning” on page 1-166
• “Develop Custom Mini-Batch Datastore” on page 1-170
• “Define Custom Mini-Batch Datastore For Super-Resolution Networks” on page 1-180
1 Deep Networks
Deep Learning in MATLAB
In this section...
“What Is Deep Learning?” on page 1-2
“Try Deep Learning in 10 Lines of MATLAB Code” on page 1-5
“Start Deep Learning Faster Using Transfer Learning” on page 1-7
“Train Classifiers Using Features Extracted from Pretrained Networks” on page 1-8
“Deep Learning with Big Data on CPUs, GPUs, in Parallel, and on the Cloud” on page 1-
8
What Is Deep Learning?

Deep learning is a branch of machine learning that teaches computers to do what comes
naturally to humans: learn from experience. Machine learning algorithms use
computational methods to “learn” information directly from data without relying on a
predetermined equation as a model. Deep learning is especially suited for image
recognition, which is important for solving problems such as facial recognition, motion
detection, and many advanced driver assistance technologies such as autonomous
driving, lane detection, pedestrian detection, and autonomous parking.
Deep Learning Toolbox provides simple MATLAB commands for creating and
interconnecting the layers of a deep neural network. Examples and pretrained networks
make it easy to use MATLAB for deep learning, even without knowledge of advanced
computer vision algorithms or neural networks.
For a free hands-on introduction to practical deep learning methods, see Deep Learning
Onramp.
What Do You Want to Do? Learn More

Perform transfer learning to fine-tune a “Start Deep Learning Faster Using Transfer
network with your data Learning” on page 1-7
Tip Fine-tuning a pretrained network to

learn a new task is typically much faster
and easier than training a new network.
1-2
What Do You Want to Do? Learn More

Classify images with pretrained networks “Pretrained Convolutional Neural
Networks” on page 1-21
Create a new deep neural network for “Create Simple Deep Learning Network for
classification or regression Classification”
“Train Convolutional Neural Network for

Regression”
Resize, rotate, or preprocess images for “Preprocess Images for Deep Learning” on
training or prediction page 1-166
Label your image data automatically based “Train Network for Image Classification”
on folder names, or interactively using an
app Image Labeler
Create deep learning networks for “Sequence Classification Using Deep
sequence and time series data. Learning”
“Time Series Forecasting Using Deep

Learning”
Classify each pixel of an image (for “Semantic Segmentation Basics” (Computer
example, road, car, pedestrian) Vision System Toolbox)
Detect and recognize objects in images “Deep Learning, Semantic Segmentation,
and Detection” (Computer Vision System
Toolbox)
Classify text data “Classify Text Data Using Deep Learning”
Classify audio data for speech recognition “Speech Command Recognition Using Deep
Learning”
Visualize what features networks have “Deep Dream Images Using AlexNet”
learned
“Visualize Activations of a Convolutional
Neural Network”
Train on CPU, GPU, multiple GPUs, in “Deep Learning with Big Data on GPUs and
parallel on your desktop or on clusters in in Parallel” on page 1-13
the cloud, and work with data sets too large
to fit in memory
1-3
1 Deep Networks
To learn more about deep learning application areas, including automated driving, see
“Deep Learning Applications”.
To choose whether to use a pretrained network or create a new deep network, consider
the scenarios in this table.
Use a Pretrained Network Create a New Deep

for Transfer Learning Network
Training Data Hundreds to thousands of Thousands to millions of
labeled images (small) labeled images
Computation Moderate computation (GPU Compute intensive (requires
optional) GPU for speed)
Training Time Seconds to minutes Days to weeks for real
problems
Model Accuracy Good, depends on the High, but can overfit to
pretrained model small data sets
For more information, see “Choose Network Architecture” on page 1-60.
Deep learning uses neural networks to learn useful representations of features directly
from data. Neural networks combine multiple nonlinear processing layers, using simple
elements operating in parallel and inspired by biological nervous systems. Deep learning
models can achieve state-of-the-art accuracy in object classification, sometimes exceeding
human-level performance.
You train models using a large set of labeled data and neural network architectures that
contain many layers, usually including some convolutional layers. Training these models
is computationally intensive and you can usually accelerate training by using a high
performance GPU. This diagram shows how convolutional neural networks combine layers
that automatically learn features from many images to classify new images.
1-4
Many deep learning applications use image files, and sometimes millions of image files. To
access many image files for deep learning efficiently, MATLAB provides the
imageDatastore function. Use this function to:
• Automatically read batches of images for faster processing in machine learning and
computer vision applications
• Import data from image collections that are too large to fit in memory
• Label your image data automatically based on folder names
Try Deep Learning in 10 Lines of MATLAB Code

This example shows how to use deep learning to identify objects on a live webcam using
only 10 lines of MATLAB code. Try the example to see how simple it is to get started with
deep learning in MATLAB.
1 Run these commands to get the downloads if needed, connect to the webcam, and get
a pretrained neural network.
camera = webcam; % Connect to the camera
net = alexnet; % Load the neural network
The webcam and alexnet functions provide a link to help you download the free add-
ons using Add-On Explorer. Alternatively, see Deep Learning Toolbox Model for
AlexNet Network and MATLAB Support Package for USB Webcams.
You can use alexnet to classify images. AlexNet is a pretrained convolutional neural
network (CNN) that has been trained on more than a million images and can classify
images into 1000 object categories (for example, keyboard, mouse, coffee mug,
pencil, and many animals).
2 To show and classify live images, run the following code. Point the webcam at an
object and the neural network reports what class of object it thinks the webcam is
showing. It keeps classifying images until you press Ctrl+C. The code resizes the
image for the network using imresize.
while true
im = snapshot(camera); % Take a picture
image(im); % Show the picture
im = imresize(im,[227 227]); % Resize the picture for alexnet
label = classify(net,im); % Classify the picture
title(char(label)); % Show the class label
drawnow
end
1-5
1 Deep Networks
In this example, the network correctly classifies a coffee mug. Experiment with
objects in your surroundings to see how accurate the network is.
To watch a video of this example, see Deep Learning in 11 Lines of MATLAB Code.
To get the code to extend this example to show the probability scores of classes, see
“Classify Webcam Images Using Deep Learning”.
For next steps in deep learning, you can use the pretrained network for other tasks. Solve
new classification problems on your image data with transfer learning or feature
extraction. For examples, see “Start Deep Learning Faster Using Transfer Learning” on
page 1-7 and “Train Classifiers Using Features Extracted from Pretrained Networks”
on page 1-8. To try other pretrained networks, see “Pretrained Convolutional Neural
Networks” on page 1-21.
1-6
Start Deep Learning Faster Using Transfer Learning

Transfer learning is commonly used in deep learning applications. You can take a
pretrained network and use it as a starting point to learn a new task. Fine-tuning a
network with transfer learning is much faster and easier than training from scratch. You
can quickly make the network learn a new task using a smaller number of training
images. The advantage of transfer learning is that the pretrained network has already
learned a rich set of features that can be applied to a wide range of other similar tasks.
For example, if you take a network trained on thousands or millions of images, you can
retrain it for new object detection using only hundreds of images. You can effectively fine-
tune a pretrained network with much smaller data sets than the original training data. If
you have a very large dataset, then transfer learning might not be faster than training a
new network.
Transfer learning enables you to:
• Transfer the learned features of a pretrained network to a new problem

• Transfer learning is faster and easier than training a new network
• Reduce training time and dataset size
• Perform deep learning without needing to learn how to create a whole new network
For an interactive example, see “Transfer Learning with Deep Network Designer” on page
2-2.
For programmatic examples, see “Get Started with Transfer Learning”, “Transfer
Learning Using AlexNet”, and “Train Deep Learning Network to Classify New Images”.
1-7
1 Deep Networks
Train Classifiers Using Features Extracted from Pretrained

Networks
Feature extraction allows you to use the power of pretrained networks without investing
time and effort into training. Feature extraction can be the fastest way to use deep
learning. You extract learned features from a pretrained network, and use those features
to train a classifier, for example, a support vector machine (SVM — requires Statistics and
Machine Learning Toolbox™). For example, if an SVM trained using alexnet can achieve
>90% accuracy on your training and validation set, then fine-tuning with transfer
learning might not be worth the effort to gain some extra accuracy. If you perform fine-
tuning on a small dataset, then you also risk overfitting. If the SVM cannot achieve good
enough accuracy for your application, then fine-tuning is worth the effort to seek higher
accuracy.
For an example, see “Feature Extraction Using AlexNet”.
Deep Learning with Big Data on CPUs, GPUs, in Parallel, and

on the Cloud
Neural networks are inherently parallel algorithms. You can take advantage of this
parallelism by using Parallel Computing Toolbox™ to distribute training across multicore
CPUs, graphical processing units (GPUs), and clusters of computers with multiple CPUs
and GPUs.
Training deep networks is extremely computationally intensive and you can usually
accelerate training by using a high performance GPU. If you do not have a suitable GPU,
you can train on one or more CPU cores instead. You can train a convolutional neural
network on a single GPU or CPU, or on multiple GPUs or CPU cores, or in parallel on a
cluster. Using GPU or parallel options requires Parallel Computing Toolbox.
You do not need multiple computers to solve problems using data sets too large to fit in
memory. You can use the imageDatastore function to work with batches of data without
needing a cluster of machines. However, if you have a cluster available, it can be helpful
to take your code to the data repository rather than moving large amounts of data around.
To learn more about deep learning hardware and memory settings, see “Deep Learning
with Big Data on GPUs and in Parallel” on page 1-13.
1-8
See Also
See Also
Related Examples
• “Classify Webcam Images Using Deep Learning”
• “Transfer Learning with Deep Network Designer” on page 2-2
• “Train Deep Learning Network to Classify New Images”
• “Create Simple Deep Learning Network for Classification”
• “Deep Learning with Big Data on GPUs and in Parallel” on page 1-13
• “Deep Learning, Semantic Segmentation, and Detection” (Computer Vision System
Toolbox)
• “Classify Text Data Using Deep Learning”
• “Deep Learning Tips and Tricks” on page 1-60
1-9
1 Deep Networks

This example shows how to use deep learning to identify objects on a live webcam using
only 10 lines of MATLAB code. Try the example to see how simple it is to get started with
deep learning in MATLAB.
1 Run these commands to get the downloads if needed, connect to the webcam, and get
a pretrained neural network.
camera = webcam; % Connect to the camera

net = alexnet; % Load the neural network
If you need to install the webcam and alexnet add-ons, a message from each
function appears with a link to help you download the free add-ons using Add-On
Explorer. Alternatively, see Deep Learning Toolbox Model for AlexNet Network and
MATLAB Support Package for USB Webcams.
After you install Deep Learning Toolbox Model for AlexNet Network, you can use it to
classify images. AlexNet is a pretrained convolutional neural network (CNN) that has
been trained on more than a million images and can classify images into 1000 object
categories (for example, keyboard, mouse, coffee mug, pencil, and many animals).
2 Run the following code to show and classify live images. Point the webcam at an
object and the neural network reports what class of object it thinks the webcam is
showing. It will keep classifying images until you press Ctrl+C. The code resizes the
image for the network using imresize.
while true
im = snapshot(camera); % Take a picture
image(im); % Show the picture
im = imresize(im,[227 227]); % Resize the picture for alexnet
label = classify(net,im); % Classify the picture
title(char(label)); % Show the class label
drawnow
end
In this example, the network correctly classifies a coffee mug. Experiment with
objects in your surroundings to see how accurate the network is.
1-10
To watch a video of this example, see Deep Learning in 11 Lines of MATLAB Code.
To get the code to extend this example to show the probability scores of classes, see
“Classify Webcam Images Using Deep Learning”.
For next steps in deep learning, you can use the pretrained network for other tasks. Solve
new classification problems on your image data with transfer learning or feature
extraction. For examples, see “Start Deep Learning Faster Using Transfer Learning” on
page 1-7 and “Train Classifiers Using Features Extracted from Pretrained Networks” on
page 1-8. To try other pretrained networks, see “Pretrained Convolutional Neural
1-11
1 Deep Networks
See Also
Related Examples
• “Classify Webcam Images Using Deep Learning”
• “Get Started with Transfer Learning”
1-12
Deep Learning with Big Data on GPUs and in Parallel

Training deep networks is computationally intensive; however, neural networks are
inherently parallel algorithms. You can usually accelerate training of convolutional neural
networks by distributing training in parallel across multicore CPUs, high-performance
GPUs, and clusters with multiple CPUs and GPUs. Using GPU or parallel options requires
Parallel Computing Toolbox.
Tip GPU support is automatic if you have Parallel Computing Toolbox. By default, the
trainNetwork function uses a GPU if available.
If you have access to a machine with multiple GPUs, then simply specify the training
option 'ExecutionEnvironment','multi-gpu'.
You do not need multiple computers to solve problems using data sets too large to fit in
memory. You can use the augmentedImageDatastore function to work with batches of
data without needing a cluster of machines. For an example, see “Train Network with
Augmented Images”. However, if you have a cluster available, it can be helpful to take
your code to the data repository rather than moving large amounts of data around.
Deep Learning Hardware Recommendations Required Products

and Memory
Considerations
Data too large to fit in To import data from image MATLAB
memory collections that are too large
to fit in memory, use the Deep Learning Toolbox
augmentedImageDatasto
re function. This function is
designed to read batches of
images for faster processing
in machine learning and
computer vision
applications.
1-13
1 Deep Networks

and Memory
Considerations
CPU If you do not have a suitable MATLAB
GPU, then you can train on a
CPU instead. By default, the Deep Learning Toolbox
trainNetwork function
uses the CPU if no GPU is
available.
GPU By default, the MATLAB
trainNetwork function
uses a GPU if available. Deep Learning Toolbox
®
Requires a CUDA enabled
NVIDIA® GPU with compute Parallel Computing Toolbox
capability 3.0 or higher.
Check your GPU using
gpuDevice. Specify the
execution environment
using the
trainingOptions
function.
Parallel on your local Take advantage of multiple MATLAB
machine using multiple workers by specifying the
GPUs or CPU cores execution environment with Deep Learning Toolbox
the trainingOptions
function. If you have more Parallel Computing Toolbox
than one GPU on your
machine, specify 'multi-
gpu'. Otherwise, specify
'parallel'.
1-14

and Memory
Considerations
Parallel on a cluster or in Scale up to use workers on MATLAB
the cloud clusters or in the cloud to
accelerate your deep Deep Learning Toolbox
learning computations. Use
trainingOptions and Parallel Computing Toolbox
specify 'parallel' to use
MATLAB Distributed
a compute cluster. For more
Computing Server™
information, see “Deep
Learning in the Cloud” on
page 1-16.
Tip To learn more, see “Scale Up Deep Learning in Parallel and in the Cloud” on page 3-
2.
All functions for deep learning training, prediction, and validation in Deep Learning
Toolbox perform computations using single-precision, floating-point arithmetic. Functions
for deep learning include trainNetwork, predict, classify, and activations. The
software uses single-precision arithmetic when you train networks using both CPUs and
GPUs.
Because single-precision and double-precision performance of GPUs can differ

substantially, it is important to know in which precision computations are performed. If
you only use a GPU for deep learning, then single-precision performance is one of the
most important characteristics of a GPU. If you also use a GPU for other computations
using Parallel Computing Toolbox, then high double-precision performance is important.
This is because many functions in MATLAB use double-precision arithmetic by default.
For more information, see “Improve Performance Using Single Precision Calculations”
(Parallel Computing Toolbox).
Training with Multiple GPUs

MATLAB supports training a single network using multiple GPUs in parallel. This can be
achieved using multiple GPUs on your local machine, or on a cluster or cloud with
workers with GPUs. To speed up training using multiple GPUs, try increasing the mini-
batch size and learning rate.
1-15
1 Deep Networks
• Enable multi-GPU training on your local machine by setting the

“'ExecutionEnvironment'” option to 'multi-gpu' with the trainingOptions
function.
• On a cluster or cloud, set the “'ExecutionEnvironment'” option to 'parallel'
with the trainingOptions function.
Convolutional neural networks are typically trained iteratively using batches of images.
This is done because the whole dataset is too large to fit into GPU memory. For optimum
performance, you can experiment with the MiniBatchSize option that you specify with
the trainingOptions function.
The optimal batch size depends on your exact network, dataset, and GPU hardware. When
training with multiple GPUs, each image batch is distributed between the GPUs. This
effectively increases the total GPU memory available, allowing larger batch sizes.
Because it improves the significance of each batch, you can increase the learning rate. A
good general guideline is to increase the learning rate proportionally to the increase in
batch size. Depending on your application, a larger batch size and learning rate can speed
up training without a decrease in accuracy, up to some limit.
Using multiple GPUs can speed up training significantly. To decide if you expect multi-
GPU training to deliver a performance gain, consider the following factors:
• How long is the iteration on each GPU? If each GPU iteration is short, then the added
overhead of communication between GPUs can dominate. Try increasing the
computation per iteration by using a larger batch size.
• Are all the GPUs on a single machine? Communication between GPUs on different
machines introduces a significant communication delay. You can mitigate this if you
have suitable hardware. For more information, see “Advanced Support for Fast Multi-
Node GPU Communication” on page 3-5.
To learn more, see “Scale Up Deep Learning in Parallel and in the Cloud” on page 3-2
and “Select Particular GPUs to Use for Training” on page 3-7.
Deep Learning in the Cloud

If you do not have a suitable GPU available for faster training of a convolutional neural
network, you can try your deep learning applications with multiple high-performance
GPUs in the cloud, such as on Amazon® Elastic Compute Cloud (Amazon EC2®). MATLAB
Deep Learning Toolbox provides examples that show you how to perform deep learning in
the cloud using Amazon EC2 with P2 or P3 machine instances and data stored in the
cloud.
1-16
See Also
You can accelerate training by using multiple GPUs on a single machine or in a cluster of
machines with multiple GPUs. Train a single network using multiple GPUs, or train
multiple models at once on the same data.
For more information on the complete cloud workflow, see “Deep Learning in Parallel and
in the Cloud”.
Fetch and Preprocess Data in Background

When training a network in parallel, you can fetch and preprocess data in the
background. To perform data dispatch in the background, enable background dispatch in
the mini-batch datastore used by trainNetwork. You can use a built-in mini-batch
datastore, such as augmentedImageDatastore, denoisingImageDatastore, or
pixelLabelImageDatastore. You can also use a custom mini-batch datastore with
background dispatch enabled. For more information on creating custom mini-batch
datastores, see “Develop Custom Mini-Batch Datastore” on page 1-170.
To enable background dispatch, set the DispatchInBackground property of the

datastore to true.
You can fine-tune the training computation and data dispatch loads between workers by
specifying the 'WorkerLoad' name-value pair argument of trainingOptions. For
advanced options, you can try modifying the number of workers of the parallel pool. For
more information, see “Specify Your Parallel Preferences” (Parallel Computing Toolbox)
See Also
trainNetwork | trainingOptions
See Also
Related Examples
• “Scale Up Deep Learning in Parallel and in the Cloud” on page 3-2
1-17
1 Deep Networks
Construct Deep Network Using Autoencoders

Load the sample data.
[X,T] = wine_dataset;
Train an autoencoder with a hidden layer of size 10 and a linear transfer function for the
decoder. Set the L2 weight regularizer to 0.001, sparsity regularizer to 4 and sparsity
proportion to 0.05.
hiddenSize = 10;
autoenc1 = trainAutoencoder(X,hiddenSize,...
'L2WeightRegularization',0.001,...
'SparsityRegularization',4,...
'SparsityProportion',0.05,...
'DecoderTransferFunction','purelin');
Extract the features in the hidden layer.
features1 = encode(autoenc1,X);
Train a second autoencoder using the features from the first autoencoder. Do not scale
the data.
hiddenSize = 10;
autoenc2 = trainAutoencoder(features1,hiddenSize,...
'L2WeightRegularization',0.001,...
'SparsityRegularization',4,...
'SparsityProportion',0.05,...
'DecoderTransferFunction','purelin',...
'ScaleData',false);
Extract the features in the hidden layer.
features2 = encode(autoenc2,features1);
Train a softmax layer for classification using the features, features2, from the second
autoencoder, autoenc2.
softnet = trainSoftmaxLayer(features2,T,'LossFunction','crossentropy');
Stack the encoders and the softmax layer to form a deep network.
deepnet = stack(autoenc1,autoenc2,softnet);
1-18
Construct Deep Network Using Autoencoders
Train the deep network on the wine data.
deepnet = train(deepnet,X,T);
Estimate the wine types using the deep network, deepnet.
wine_type = deepnet(X);
Plot the confusion matrix.
plotconfusion(T,wine_type);
1-19
1 Deep Networks
1-20
Pretrained Convolutional Neural Networks
In this section...
“Load Pretrained Networks” on page 1-22
“Compare Pretrained Networks” on page 1-23
“Feature Extraction” on page 1-25
“Transfer Learning” on page 1-25
“Import and Export Networks” on page 1-26
You can take a pretrained image classification network that has already learned to extract
powerful and informative features from natural images and use it as a starting point to
learn a new task. The pretrained networks are trained on more than a million images and
can classify images into 1000 object categories, such as keyboard, coffee mug, pencil, and
many animals. The training images are a subset of the ImageNet database [1], which is
used in ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) [2]. Using a
pretrained network with transfer learning is typically much faster and easier than
training a network from scratch.
You can use previously trained networks for the following tasks:
Purpose Description
Classification Apply pretrained networks directly to
classification problems. To classify a new
image, use classify. For an example
showing how to use a pretrained network
for classification, see “Classify Image Using
GoogLeNet”.
Feature Extraction Use a pretrained network as a feature
extractor by using the layer activations as
features. You can use these activations as
features to train another machine learning
model, such as a support vector machine
(SVM). For more information, see “Feature
Extraction” on page 1-25. For an example,
see “Feature Extraction Using AlexNet”.
1-21
1 Deep Networks
Purpose Description
Transfer Learning Take layers from a network trained on a
large data set and fine-tune on a new data
set. For more information, see “Transfer
Learning” on page 1-25. For a simple
example, see “Get Started with Transfer
Learning”. To try more pretrained
networks, see “Train Deep Learning
Network to Classify New Images”.
Load Pretrained Networks

Use functions such as googlenet to get links to download pretrained networks from the
Add-On Explorer. For a list of all currently available downloads, see MathWorks Deep
Learning Toolbox Team. The following table lists the available pretrained networks and
some of their properties. The network depth is defined as the largest number of
sequential convolutional or fully connected layers on a path from the input layer to the
output layer.
Network Depth Size Parameters Image Input

(Millions) Size
alexnet 8 227 MB 61.0 227-by-227
vgg16 16 515 MB 138 224-by-224
vgg19 19 535 MB 144 224-by-224
squeezenet 18 4.6 MB 1.24 227-by-227
googlenet 22 27 MB 7.0 224-by-224
inceptionv3 48 89 MB 23.9 299-by-299
densenet201 201 77 MB 20.0 224-by-224
resnet18 18 44 MB 11.7 224-by-224
resnet50 50 96 MB 25.6 224-by-224
resnet101 101 167 MB 44.6 224-by-224
inceptionres 164 209 MB 55.9 299-by-299
netv2
1-22
Compare Pretrained Networks

Pretrained networks have different characteristics that matter when choosing a network
to apply to your problem. The most important characteristics are network accuracy,
speed, and size. Choosing a network is generally a tradeoff between these characteristics.
Tip To get started with transfer learning, try choosing one of the faster networks, such as
SqueezeNet or GoogLeNet. You can then iterate quickly and try out different settings
such as data preprocessing steps and training options. Once you have a feeling of which
settings work well, try a more accurate network such as Inception-v3 or a ResNet and see
if that improves your results.
Use the plot below to compare the ImageNet validation accuracy with the time required
to make a prediction using the network. A good network has a high accuracy and is fast.
The plot displays the classification accuracy versus the prediction time when using a
modern GPU (an NVIDIA TITAN Xp) and a mini-batch size of 64. The prediction time is
measured relative to the fastest network. The area of each marker is proportional to the
size of the network on disk.
A network is Pareto efficient if there is no other network that is better on all the metrics
being compared, in this case accuracy and prediction time. The set of all Pareto efficient
networks is called the Pareto frontier. The Pareto frontier contains all the networks that
are not worse than another network on both metrics. The plot connects the networks that
are on the Pareto frontier in the plane of accuracy and prediction time. All networks
except AlexNet, VGG-16, VGG-19, and DenseNet-201 are on the Pareto frontier.
Note The plot below only shows an indication of the relative speeds of the different
networks. The exact prediction and training iteration times depend on the hardware and
mini-batch size that you use.
1-23
1 Deep Networks
The classification accuracy on the ImageNet validation set is the most common way to
measure the accuracy of networks trained on ImageNet. Networks that are accurate on
ImageNet are also often accurate when you apply them to other natural image data sets
using transfer learning or feature extraction. This generalization is possible because the
networks have learned to extract powerful and informative features from natural images
that generalize to other similar data sets. However, high accuracy on ImageNet does not
always transfer directly to other tasks, so it is a good idea to try multiple networks.
If you want to perform prediction using constrained hardware or distribute networks over
the Internet, then also consider the size of the network on disk and in memory.
Network Accuracy
There are multiple ways to calculate the classification accuracy on the ImageNet
validation set and different sources use different methods. Sometimes an ensemble of
1-24
multiple models is used and sometimes each image is evaluated multiple times using
multiple crops. Sometimes the top-5 accuracy instead of the standard (top-1) accuracy is
quoted. Because of these differences, it is often not possible to directly compare the
accuracies from different sources. The accuracies of pretrained networks in Deep
Learning Toolbox are standard (top-1) accuracies using a single model and single central
image crop.
Feature Extraction
Feature extraction is an easy and fast way to use the power of deep learning without
investing time and effort into training a full network. Because it only requires a single
pass over the training images, it is especially useful if you do not have a GPU. You extract
learned image features using a pretrained network, and then use those features to train a
classifier, such as a support vector machine using fitcsvm.
Try feature extraction when your new data set is very small. Since you only train a simple
classifier on the extracted features, training is fast. It is also unlikely that fine-tuning
deeper layers of the network improves the accuracy since there is little data to learn
from.
• If your data is very similar to the original data, then the more specific features
extracted deeper in the network are likely to be useful for the new task.
• If your data is very different from the original data, then the features extracted deeper
in the network might be less useful for your task. Try training the final classifier on
more general features extracted from an earlier network layer. If the new data set is
large, then you can also try training a network from scratch.
ResNets are often the best feature extractors [4], independently of their ImageNet
accuracies. For an example showing how to use a pretrained network for feature
extraction, see “Feature Extraction Using AlexNet”.
Transfer Learning
You can fine-tune deeper layers in the network by training the network on your new data
set with the pretrained network as a starting point. Fine-tuning a network with transfer
learning is often faster and easier than constructing and training a new network. The
network has already learned a rich set of image features, but when you fine-tune the
network it can learn features specific to your new data set. If you have a very large data
set, then transfer learning might not be faster than training from scratch.
1-25
1 Deep Networks
Tip Fine-tuning a network often gives the highest accuracy. For very small data sets
(fewer than about 20 images per class), try feature extraction.
Fine-tuning a network is slower and requires more effort than simple feature extraction,
but since the network can learn to extract a different set of features, the final network is
often more accurate. Fine-tuning usually works better than feature extraction as long as
the new data set is not very small, because then the network has data to learn new
features from. For examples showing how to perform transfer learning, see “Transfer
Learning with Deep Network Designer” on page 2-2 and “Train Deep Learning
Network to Classify New Images”.
Import and Export Networks

You can import networks and network architectures from TensorFlow®-Keras, Caffe, and
the ONNX™ (Open Neural Network Exchange) model format. You can also export trained
networks to the ONNX model format.
Import from Keras
Import pretrained networks from TensorFlow-Keras by using importKerasNetwork. You

can import the network and weights either from the same HDF5 (.h5) file or separate
HDF5 and JSON (.json) files. For more information, see importKerasNetwork.
Import network architectures from TensorFlow-Keras by using importKerasLayers. You

can import the network architecture, either with or without weights. You can import the
network architecture and weights either from the same HDF5 (.h5) file or separate HDF5
and JSON (.json) files. For more information, see importKerasLayers.
1-26
See Also
Import from Caffe
Import pretrained networks from Caffe by using the importCaffeNetwork function.

There are many pretrained networks available in Caffe Model Zoo [3]. Download the
desired .prototxt and .caffemodel files and use importCaffeNetwork to import the
pretrained network into MATLAB. For more information, see importCaffeNetwork.
You can import network architectures of Caffe networks. Download the

desired .prototxt file and use importCaffeLayers to import the network layers into
MATLAB. For more information, see importCaffeLayers.
Export to and Import from ONNX
Export trained networks to the ONNX model format by using the exportONNXNetwork
function. You can then import the ONNX model to other deep learning frameworks, such
as TensorFlow, that support ONXX model import. For more information, see
exportONNXNetwork.
Import pretrained networks from ONNX using importONNXNetwork and import network
architectures with or without weights using importONNXLayers.
References
[1] ImageNet. http://www.image-net.org
[2] Russakovsky, O., Deng, J., Su, H., et al. “ImageNet Large Scale Visual Recognition
Challenge.” International Journal of Computer Vision (IJCV). Vol 115, Issue 3,
2015, pp. 211–252
[3] Caffe Model Zoo. http://caffe.berkeleyvision.org/model_zoo.html
[4] Kornblith, Simon, Jonathon Shlens, and Quoc V. Le. "Do Better ImageNet Models
Transfer Better?." arXiv preprint arXiv:1805.08974 (2018).
See Also
alexnet | densenet201 | exportONNXNetwork | googlenet | importCaffeLayers |
importCaffeNetwork | importKerasLayers | importKerasNetwork |
importONNXLayers | importONNXNetwork | inceptionresnetv2 | inceptionv3 |
resnet101 | resnet18 | resnet50 | squeezenet | vgg16 | vgg19
1-27
1 Deep Networks
Related Examples
• “Transfer Learning Using AlexNet”
• “Feature Extraction Using AlexNet”
• “Classify Image Using GoogLeNet”
• “Visualize Features of a Convolutional Neural Network”
• “Visualize Activations of a Convolutional Neural Network”
• “Deep Dream Images Using AlexNet”
1-28
Learn About Convolutional Neural Networks
Learn About Convolutional Neural Networks

Convolutional neural networks (ConvNets) are widely used tools for deep learning. They
are specifically suitable for images as inputs, although they are also used for other
applications such as text, signals, and other continuous responses. They differ from other
types of neural networks in a few ways:
Convolutional neural networks are inspired from the biological structure of a visual
cortex, which contains arrangements of simple and complex cells [1]. These cells are
found to activate based on the subregions of a visual field. These subregions are called
receptive fields. Inspired from the findings of this study, the neurons in a convolutional
layer connect to the subregions of the layers before that layer instead of being fully-
connected as in other types of neural networks. The neurons are unresponsive to the
areas outside of these subregions in the image.
These subregions might overlap, hence the neurons of a ConvNet produce spatially-
correlated outcomes, whereas in other types of neural networks, the neurons do not share
any connections and produce independent outcomes.
In addition, in a neural network with fully-connected neurons, the number of parameters

(weights) can increase quickly as the size of the input increases. A convolutional neural
network reduces the number of parameters with the reduced number of connections,
shared weights, and downsampling.
A ConvNet consists of multiple layers, such as convolutional layers, max-pooling or

average-pooling layers, and fully-connected layers.
1-29
1 Deep Networks
The neurons in each layer of a ConvNet are arranged in a 3-D manner, transforming a 3-D
input to a 3-D output. For example, for an image input, the first layer (input layer) holds
the images as 3-D inputs, with the dimensions being height, width, and the color channels
of the image. The neurons in the first convolutional layer connect to the regions of these
images and transform them into a 3-D output. The hidden units (neurons) in each layer
learn nonlinear combinations of the original inputs, which is called feature extraction [2].
These learned features, also known as activations, from one layer become the inputs for
the next layer. Finally, the learned features become the inputs to the classifier or the
regression function at the end of the network.
The architecture of a ConvNet can vary depending on the types and numbers of layers
included. The types and number of layers included depends on the particular application
or data. For example, if you have categorical responses, you must have a classification
function and a classification layer, whereas if your response is continuous, you must have
a regression layer at the end of the network. A smaller network with only one or two
convolutional layers might be sufficient to learn a small number of gray scale image data.
On the other hand, for more complex data with millions of colored images, you might
need a more complicated network with multiple convolutional and fully connected layers.
You can concatenate the layers of a convolutional neural network in MATLAB in the
following way:
1-30
See Also
layers = [imageInputLayer([28 28 1])

convolution2dLayer(5,20)
reluLayer
maxPooling2dLayer(2,'Stride',2)
fullyConnectedLayer(10)
softmaxLayer
classificationLayer];
After defining the layers of your network, you must specify the training options using the
trainingOptions function. For example,
options = trainingOptions('sgdm');
Then, you can train the network with your training data using the trainNetwork
function. The data, layers, and training options become the inputs to the training
function. For example,
convnet = trainNetwork(data,layers,options);
For detailed discussion of layers of a ConvNet, see “Specify Layers of Convolutional

Neural Network” on page 1-40. For setting up training parameters, see “Set Up
Parameters and Train Convolutional Neural Network” on page 1-55.
References
[1] Hubel, H. D. and Wiesel, T. N. '' Receptive Fields of Single neurones in the Cat’s
Striate Cortex.'' Journal of Physiology. Vol 148, pp. 574-591, 1959.
[2] Murphy, K. P. Machine Learning: A Probabilistic Perspective. Cambridge,

Massachusetts: The MIT Press, 2012.
See Also
More About
1-31
1 Deep Networks
• “Get Started with Transfer Learning”

1-32
List of Deep Learning Layers

This page provides a list of deep learning layers in MATLAB.
To learn how to create networks from layers for different tasks, see the following
examples.
Task Learn More

Create deep learning networks for image “Create Simple Deep Learning Network for
classification or regression. Classification”
“Train Convolutional Neural Network for

Regression”
“Train Residual Network for Image

Classification”
Create deep learning networks for “Sequence Classification Using Deep
sequence and time series data. Learning”
“Time Series Forecasting Using Deep

Learning”
Create deep learning network for audio “Speech Command Recognition Using Deep
data. Learning”
Create deep learning network for text data. “Classify Text Data Using Deep Learning”
“Generate Text Using Deep Learning”
Layer Functions
Use the following functions to create different layer types. Alternatively, you can import
layers from Caffe and Keras, or you can define your own custom layers. To import layers
from Caffe and Keras, use importCaffeLayers and importKerasLayers respectively.
To learn how to define your own custom layers, see “Define Custom Deep Learning
Layers” on page 1-78.
1-33
1 Deep Networks
Input Layers
Function Description
An image input layer inputs images to a
imageInputLayer network and applies data normalization.
A sequence input layer inputs sequence
sequenceInputLayer data to a network.
An ROI input layer inputs images to a Fast
roiInputLayer (Computer Vision R-CNN object detection network.
System Toolbox™)
Convolution and Fully Connected Layers
A 2-D convolutional layer applies sliding
convolution2dLayer convolutional filters to the input.
A transposed 2-D convolution layer
transposedConv2dLayer upsamples feature maps.
A fully connected layer multiplies the input
fullyConnectedLayer by a weight matrix and then adds a bias
vector.
Sequence Layers
A sequence input layer inputs sequence
sequenceInputLayer data to a network.
An LSTM layer learns long-term
lstmLayer dependencies between time steps in time
series and sequence data.
1-34
A bidirectional LSTM (BiLSTM) layer learns
bilstmLayer bidirectional long-term dependencies
between time steps of time series or
sequence data. These dependencies can be
useful for when you want the network to
learn from the complete time series at each
time step.
A word embedding layer maps word indices
wordEmbeddingLayer (Text to vectors.
Analytics Toolbox™)
Activation Layers
A ReLU layer performs a threshold
reluLayer operation to each element of the input,
where any value less than zero is set to
zero.
A leaky ReLU layer performs a threshold
leakyReluLayer operation, where any input value less than
zero is multiplied by a fixed scalar.
A clipped ReLU layer performs a threshold
clippedReluLayer operation, where any input value less than
zero is set to zero and any value above the
clipping ceiling is set to that clipping
ceiling.
A PReLU layer performs a threshold
preluLayer on page 1-95 (Custom operation, where for each channel, any
layer example) input value less than zero is multiplied by a
scalar learned at training time.
1-35
1 Deep Networks
Normalization, Dropout, and Cropping Layers
A batch normalization layer normalizes
batchNormalizationLayer each input channel across a mini-batch. To
speed up training of convolutional neural
networks and reduce the sensitivity to
network initialization, use batch
normalization layers between convolutional
layers and nonlinearities, such as ReLU
layers.
A channel-wise local response (cross-
channel) normalization layer carries out
crossChannelNormalizationLayer channel-wise normalization.
A dropout layer randomly sets input
dropoutLayer elements to zero with a given probability.
A 2-D crop layer applies 2-D cropping to the
crop2dLayer (Computer Vision input.
System Toolbox)
Pooling and Unpooling Layers
An average pooling layer performs down-
averagePooling2dLayer sampling by dividing the input into
rectangular pooling regions and computing
the average values of each region.
A max pooling layer performs down-
maxPooling2dLayer sampling by dividing the input into
rectangular pooling regions, and computing
the maximum of each region.
A max unpooling layer unpools the output
maxUnpooling2dLayer of a max pooling layer.
1-36
Combination Layers
An addition layer adds inputs from multiple
additionLayer neural network layers element-wise.
A depth concatenation layer takes inputs
depthConcatenationLayer that have the same height and width and
concatenates them along the third
dimension (the channel dimension).
Object Detection Layers
An ROI input layer inputs images to a Fast
roiInputLayer (Computer Vision R-CNN object detection network.
System Toolbox)
A ROI max pooling layer outputs fixed size
roiMaxPooling2dLayer (Computer feature maps for every rectangular ROI
Vision System Toolbox) within the input feature map. Use this layer
to create a Fast or Faster R-CNN object
detection network.
A region proposal layer outputs bounding
regionProposalLayer (Computer boxes around potential objects in an image
Vision System Toolbox) as part of the region proposal network
(RPN) within Faster R-CNN.
A region proposal network (RPN) softmax
rpnSoftmaxLayer (Computer Vision layer applies a softmax activation function
System Toolbox) to the input. Use this layer to create a
Faster R-CNN object detection network.
A region proposal network (RPN)
rpnClassificationLayer classification layer classifies image regions
(Computer Vision System Toolbox) as either object or background by using a
cross entropy loss function. Use this layer
to create a Faster R-CNN object detection
network.
1-37
1 Deep Networks
A box regression layer refines bounding box
rcnnBoxRegressionLayer locations by using a smooth L1 loss
(Computer Vision System Toolbox) function. Use this layer to create a Fast or
Output Layers
A softmax layer applies a softmax function
softmaxLayer to the input.
A classification layer computes the cross
classificationLayer entropy loss for multi-class classification
problems with mutually exclusive classes.
A regression layer computes the half-mean-
regressionLayer squared-error loss for regression problems.
A pixel classification layer provides a
pixelClassificationLayer categorical label for each image pixel.
(Computer Vision System Toolbox)
A region proposal network (RPN) softmax
rpnSoftmaxLayer (Computer Vision layer applies a softmax activation function
System Toolbox) to the input. Use this layer to create a
A region proposal network (RPN)
rpnClassificationLayer classification layer classifies image regions
(Computer Vision System Toolbox) as either object or background by using a
cross entropy loss function. Use this layer
to create a Faster R-CNN object detection
network.
A box regression layer refines bounding box
rcnnBoxRegressionLayer locations by using a smooth L1 loss
(Computer Vision System Toolbox) function. Use this layer to create a Fast or
1-38
See Also
A weighted classification layer computes
weightedClassificationLayer on the weighted cross entropy loss for
page 1-131 (Custom layer example) classification problems.
A Dice pixel classification layer computes
dicePixelClassificationLayer the Dice loss for semantic segmentation
(Custom layer example) problems.
A classification SSE layer computes the
sseClassificationLayer on page sum of squares error loss for classification
1-120 (Custom layer example) problems.
A regression MAE layer computes the mean
maeRegressionLayer on page 1-109 absolute error loss for regression problems.
(Custom layer example)
See Also
More About
• “Sequence Classification Using Deep Learning”
1-39
1 Deep Networks
Specify Layers of Convolutional Neural Network

In this section...
“Image Input Layer” on page 1-41
“Convolutional Layer” on page 1-41
“Batch Normalization Layer” on page 1-46
“ReLU Layer” on page 1-47
“Cross Channel Normalization (Local Response Normalization) Layer” on page 1-48
“Max and Average Pooling Layers” on page 1-48
“Dropout Layer” on page 1-49
“Fully Connected Layer” on page 1-49
“Output Layers” on page 1-50
The first step of creating and training a new convolutional neural network (ConvNet) is to
define the network architecture. This topic explains the details of ConvNet layers, and the
order they appear in a ConvNet. For a complete list of deep learning layers and how to
create them, see “List of Deep Learning Layers” on page 1-33. To learn about LSTM
networks for sequence classification and regression, see “Long Short-Term Memory
Networks” on page 1-154. To learn how to create your own custom layers, see “Define
Custom Deep Learning Layers” on page 1-78.
The network architecture can vary depending on the types and numbers of layers
included. The types and number of layers included depends on the particular application
or data. For example, if you have categorical responses, you must have a softmax layer
and a classification layer, whereas if your response is continuous, you must have a
regression layer at the end of the network. A smaller network with only one or two
convolutional layers might be sufficient to learn on a small number of grayscale image
data. On the other hand, for more complex data with millions of colored images, you
might need a more complicated network with multiple convolutional and fully connected
layers.
To specify the architecture of a deep network with all layers connected sequentially,
create an array of layers directly. For example, to create a deep network which classifies
28-by-28 grayscale images into 10 classes, specify the layer array
layers = [
imageInputLayer([28 28 1])
1-40
convolution2dLayer(3,16,'Padding',1)
batchNormalizationLayer
reluLayer
convolution2dLayer(3,32,'Padding',1)
reluLayer
softmaxLayer
layers is an array of Layer objects. You can then use layers as an input to the training
function trainNetwork.
To specify the architecture of a neural network with all layers connected sequentially,
create an array of layers directly. To specify the architecture of a network where layers
can have multiple inputs or outputs, use a LayerGraph object.
Image Input Layer

Create an image input layer using imageInputLayer.
An image input layer inputs images to a network and applies data normalization.
Specify the image size using the inputSize argument. The size of an image corresponds
to the height, width, and the number of color channels of that image. For example, for a
grayscale image, the number of channels is 1, and for a color image it is 3.
Convolutional Layer
A 2-D convolutional layer applies sliding convolutional filters to the input. Create a 2-D
convolutional layer using convolution2dLayer.
The convolutional layer consists of various components.1
Filters and Stride
A convolutional layer consists of neurons that connect to subregions of the input images
or the outputs of the previous layer. The layer learns the features localized by these
regions while scanning through an image. When creating a layer using the
1. Image credit: Convolution arithmetic (License)
1-41
1 Deep Networks
convolution2dLayer function, you can specify the size of these regions using the
filterSize input argument.
For each region, the trainNetwork function computes a dot product of the weights and
the input, and then adds a bias term. A set of weights that is applied to a region in the
image is called a filter. The filter moves along the input image vertically and horizontally,
repeating the same computation for each region. In other words, the filter convolves the
input.
This image shows a 3-by-3 filter scanning through the input. The lower map represents
the input and the upper map represents the output.
The step size with which the filter moves is called a stride. You can specify the step size
with the Stride name-value pair argument. The local regions that the neurons connect
to can overlap depending on the filterSize and 'Stride' values.
This image shows a 3-by-3 filter scanning through the input with a stride of 2. The lower
map represents the input and the upper map represents the output.
1-42
The number of weights in a filter is h * w * c, where h is the height, and w is the width of
the filter, respectively, and c is the number of channels in the input. For example, if the
input is a color image, the number of color channels is 3. The number of filters
determines the number of channels in the output of a convolutional layer. Specify the
number of filters using the numFilters argument with the convolution2dLayer
function.
Dilated Convolution
A dilated convolution is a convolution in which the filters are expanded by spaces inserted
between the elements of the filter. Specify the dilation factor using the
'DilationFactor' property.
Use dilated convolutions to increase the receptive field (the area of the input which the
layer can see) of the layer without increasing the number of parameters or computation.
The layer expands the filters by inserting zeros between each filter element. The dilation
factor determines the step size for sampling the input or equivalently the upsampling
factor of the filter. It corresponds to an effective filter size of (Filter Size – 1) .* Dilation
Factor + 1. For example, a 3-by-3 filter with the dilation factor [2 2] is equivalent to a 5-
by-5 filter with zeros between the elements.
1-43
1 Deep Networks
This image shows a 3-by-3 filter dilated by a factor of two scanning through the input. The
lower map represents the input and the upper map represents the output.
Feature Maps
As a filter moves along the input, it uses the same set of weights and the same bias for the
convolution, forming a feature map. Each feature map is the result of a convolution using
a different set of weights and a different bias. Hence, the number of feature maps is equal
to the number of filters. The total number of parameters in a convolutional layer is
((h*w*c + 1)*Number of Filters), where 1 is the bias.
Zero Padding
You can also apply zero padding to input image borders vertically and horizontally using
the 'Padding' name-value pair argument. Padding is rows or columns of zeros added to
1-44
the borders of an image input. By adjusting the padding, you can control the output size
of the layer.
This image shows a 3-by-3 filter scanning through the input with padding of size 1. The
lower map represents the input and the upper map represents the output.
Output Size
The output height and width of a convolutional layer is (Input Size – ((Filter Size –
1)*Dilation Factor + 1) + 2*Padding)/Stride + 1. This value must be an integer for the
whole image to be fully covered. If the combination of these parameters does not lead the
1-45
1 Deep Networks
image to be fully covered, the software by default ignores the remaining part of the image
along the right and bottom edges in the convolution.
Number of Neurons
The product of the output height and width gives the total number of neurons in a feature
map, say Map Size. The total number of neurons (output size) in a convolutional layer is
Map Size*Number of Filters.
For example, suppose that the input image is a 32-by-32-by-3 color image. For a
convolutional layer with eight filters and a filter size of 5-by-5, the number of weights per
filter is 5 * 5 * 3 = 75, and the total number of parameters in the layer is (75 + 1) * 8 =
608. If the stride is 2 in each direction and padding of size 2 is specified, then each
feature map is 16-by-16. This is because (32 – 5 + 2 * 2)/2 + 1 = 16.5, and some of the
outermost zero padding to the right and bottom of the image is discarded. Finally, the
total number of neurons in the layer is 16 * 16 * 8 = 2048.
Usually, the results from these neurons pass through some form of nonlinearity, such as
rectified linear units (ReLU).
Learning Parameters
You can adjust the learning rates and regularization parameters for the layer using name-
value pair arguments while defining the convolutional layer. If you choose not to specify
these parameters, then trainNetwork uses the global training parameters defined with
the trainingOptions function. For details on global and layer training options, see “Set
Up Parameters and Train Convolutional Neural Network” on page 1-55.
Number of Layers
A convolutional neural network can consist of one or multiple convolutional layers. The
number of convolutional layers depends on the amount and complexity of the data.
Batch Normalization Layer

Create a batch normalization layer using batchNormalizationLayer.
A batch normalization layer normalizes each input channel across a mini-batch. To speed
up training of convolutional neural networks and reduce the sensitivity to network
initialization, use batch normalization layers between convolutional layers and
nonlinearities, such as ReLU layers.
1-46
The layer first normalizes the activations of each channel by subtracting the mini-batch
mean and dividing by the mini-batch standard deviation. Then, the layer shifts the input
by a learnable offset β and scales it by a learnable scale factor γ. β and γ are themselves
learnable parameters that are updated during network training.
Batch normalization layers normalize the activations and gradients propagating through a
neural network, making network training an easier optimization problem. To take full
advantage of this fact, you can try increasing the learning rate. Since the optimization
problem is easier, the parameter updates can be larger and the network can learn faster.
You can also try reducing the L2 and dropout regularization. With batch normalization
layers, the activations of a specific image are not deterministic, but instead depend on
which images happen to appear in the same mini-batch. To take full advantage of this
regularizing effect, try shuffling the training data before every training epoch. To specify
how often to shuffle the data during training, use the 'Shuffle' name-value pair
argument of trainingOptions.
ReLU Layer
Create a ReLU layer using reluLayer.
A ReLU layer performs a threshold operation to each element of the input, where any
value less than zero is set to zero.
Convolutional and batch normalization layers are usually followed by a nonlinear

activation function such as a rectified linear unit (ReLU), specified by a ReLU layer. A
ReLU layer performs a threshold operation to each element, where any input value less
than zero is set to zero, that is,
Ïx, x ≥ 0
f (x) = Ì .
Ó0 , x < 0
The ReLU layer does not change the size of its input.
There are extensions of the standard ReLU layer that perform slightly different operations
and can improve performance for some applications. A leaky ReLU layer performs a
threshold operation, where any input value less than zero is multiplied by a fixed scalar.
Create a leaky ReLU layer using leakyReluLayer. A clipped ReLU layer performs a
threshold operation, where any input value less than zero is set to zero and any value
above the clipping ceiling is set to that clipping ceiling.. This clipping prevents the output
from becoming too large. Create a clipped ReLU layer using clippedReluLayer.
1-47
1 Deep Networks
Cross Channel Normalization (Local Response Normalization)

Layer
Create a cross channel normalization layer using crossChannelNormalizationLayer.
A channel-wise local response (cross-channel) normalization layer carries out channel-

wise normalization.
This layer performs a channel-wise local response normalization. It usually follows the
ReLU activation layer. This layer replaces each element with a normalized value it obtains
using the elements from a certain number of neighboring channels (elements in the
normalization window). That is, for each element in the input, trainNetwork
x
computes a normalized value ’ using
x
x
x’ = ,
b
Ê a * ss ˆ
Á K + windowChannelSize ˜
Ë ¯
where K, α, and β are the hyperparameters in the normalization, and ss is the sum of
squares of the elements in the normalization window [2]. You must specify the size of the
normalization window using the windowChannelSize argument of the
crossChannelNormalizationLayer function. You can also specify the
hyperparameters using the Alpha, Beta, and K name-value pair arguments.
The previous normalization formula is slightly different than what is presented in [2]. You
can obtain the equivalent formula by multiplying the alpha value by the
windowChannelSize.
Max and Average Pooling Layers

A max pooling layer performs down-sampling by dividing the input into rectangular
pooling regions, and computing the maximum of each region. Create a max pooling layer
using maxPooling2dLayer.
An average pooling layer performs down-sampling by dividing the input into rectangular
pooling regions and computing the average values of each region. Create an average
pooling layer using averagePooling2dLayer.
Pooling layers follow the convolutional layers for down-sampling, hence, reducing the
number of connections to the following layers. They do not perform any learning
1-48
themselves, but reduce the number of parameters to be learned in the following layers.
They also help reduce overfitting.
A max pooling layer returns the maximum values of rectangular regions of its input. The
size of the rectangular regions is determined by the poolSize argument of
maxPoolingLayer. For example, if poolSize equals [2,3], then the layer returns the
maximum value in regions of height 2 and width 3. An average pooling layer outputs the
average values of rectangular regions of its input. The size of the rectangular regions is
determined by the poolSize argument of averagePoolingLayer. For example, if
poolSize is [2,3], then the layer returns the average value of regions of height 2 and
width 3.
Pooling layers scan through the input horizontally and vertically in step sizes you can
specify using the 'Stride' name-value pair argument. If the pool size is smaller than or
equal to the stride, then the pooling regions do not overlap.
For nonoverlapping regions (Pool Size and Stride are equal), if the input to the pooling
layer is n-by-n, and the pooling region size is h-by-h, then the pooling layer down-samples
the regions by h [6]. That is, the output of a max or average pooling layer for one channel
of a convolutional layer is n/h-by-n/h. For overlapping regions, the output of a pooling
layer is (Input Size – Pool Size + 2*Padding)/Stride + 1.
Dropout Layer
Create a dropout layer using dropoutLayer.
A dropout layer randomly sets input elements to zero with a given probability.
At prediction time the output of a dropout layer is equal to its input. At training time, the
operation corresponds to temporarily dropping a randomly chosen unit and all of its
connections from the network during training. So, for each new input element,
trainNetwork randomly selects a subset of neurons, forming a different layer
architecture. These architectures use common weights, but because the learning does not
depend on specific neurons and connections, the dropout layer might help prevent
overfitting [7], [2]. Similar to max or average pooling layers, no learning takes place in
this layer.
Fully Connected Layer

Create a fully connected layer using fullyConnectedLayer.
1-49
1 Deep Networks
A fully connected layer multiplies the input by a weight matrix and then adds a bias
vector.
The convolutional (and down-sampling) layers are followed by one or more fully
connected layers.
As the name suggests, all neurons in a fully connected layer connect to all the neurons in
the previous layer. This layer combines all of the features (local information) learned by
the previous layers across the image to identify the larger patterns. For classification
problems, the last fully connected layer combines the features to classify the images. This
is the reason that the outputSize argument of the last fully connected layer of the
network is equal to the number of classes of the data set. For regression problems, the
output size must be equal to the number of response variables.
You can also adjust the learning rate and the regularization parameters for this layer
using the related name-value pair arguments when creating the fully connected layer. If
you choose not to adjust them, then trainNetwork uses the global training parameters
defined by the trainingOptions function. For details on global and layer training
options, see “Set Up Parameters and Train Convolutional Neural Network” on page 1-55.
A fully connected layer multiplies the input by a weight matrix W and then adds a bias
vector b.
If the input to the layer is a sequence (for example, in an LSTM network), then the fully
connected layer acts independently on each time step. For example, if the layer before the
fully connected layer outputs an array X of size D-by-N-by-S, then the fully connected
layer outputs an array Z of size outputSize-by-N-by-S. At time step t, the corresponding
entry of Z is
WX t + b , where X t denotes time step t of X.
Output Layers
Softmax and Classification Layers
A softmax layer applies a softmax function to the input. Create a softmax layer using
softmaxLayer.
A classification layer computes the cross entropy loss for multi-class classification
problems with mutually exclusive classes. Create a classification layer using
classificationLayer.
1-50
For classification problems, a softmax layer and then a classification layer must follow the
final fully connected layer.
The output unit activation function is the softmax function:
exp ( ar ( x ) )
yr ( x ) = ,
k
Â exp ( a j ( x ))
j =1
k
where 0 £ yr £ 1 and Â yj = 1 .
j =1
The softmax function is the output unit activation function after the last fully connected
layer for multi-class classification problems:
P ( x, q cr ) P ( cr ) exp ( ar ( x,q ) )
P ( cr x,q ) = = ,
k k
Â P ( x,q c j ) P (c j ) Â exp (a j ( x,q ))
j =1 j =1
k
where 0 £ P ( cr x,q ) £ 1 and Â P ( cj ) (
x,q = 1 . Moreover, ar = ln P ( x,q cr ) P ( cr ) , )
j =1
P ( x ,q cr ) is the conditional probability of the sample given class r, and P ( cr ) is the
class prior probability.
The softmax function is also known as the normalized exponential and can be considered
the multi-class generalization of the logistic sigmoid function [8].
For typical classification networks, the classification layer must follow the softmax layer.
In the classification layer, trainNetwork takes the values from the softmax function and
assigns each input to one of the K mutually exclusive classes using the cross entropy
function for a 1-of-K coding scheme [8]:
N K
loss = Â Âtij ln yij ,
i=1 j =1
1-51
1 Deep Networks
t
where N is the number of samples, K is the number of classes, ij is the indicator that the
y
ith sample belongs to the jth class, and ij is the output for sample i for class j, which in
this case, is the value from the softmax function. That is, it is the probability that the
network associates the ith input with class j.
Regression Layer
Create a regression layer using regressionLayer.
A regression layer computes the half-mean-squared-error loss for regression problems.

For typical regression problems, a regression layer must follow the final fully connected
layer.
The mean-squared-error is given by:
R
(ti - yi ) 2
MSE = Â R
,
i =1
where R is the number of responses, ti is the target output, and yi is the network’s
prediction for the response variable corresponding to observation i.
The loss function of the regression layer is the half-mean-squared-error:
1
R
(ti - yi )2
loss = Â
2 i =1 R
,
References
[1] Murphy, K. P. Machine Learning: A Probabilistic Perspective. Cambridge,
Massachusetts: The MIT Press, 2012.
[2] Krizhevsky, A., I. Sutskever, and G. E. Hinton. "ImageNet Classification with Deep
Convolutional Neural Networks." Advances in Neural Information Processing
Systems. Vol 25, 2012.
1-52
See Also
[3] LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel,
L.D., et al. ''Handwritten Digit Recognition with a Back-propagation Network.'' In
Advances of Neural Information Processing Systems, 1990.
[4] LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner. ''Gradient-based Learning Applied to
Document Recognition.'' Proceedings of the IEEE. Vol 86, pp. 2278–2324, 1998.
[5] Nair, V. and G. E. Hinton. "Rectified linear units improve restricted boltzmann
machines." In Proc. 27th International Conference on Machine Learning, 2010.
[6] Nagi, J., F. Ducatelle, G. A. Di Caro, D. Ciresan, U. Meier, A. Giusti, F. Nagi, J.

Schmidhuber, L. M. Gambardella. ''Max-Pooling Convolutional Neural Networks
for Vision-based Hand Gesture Recognition''. IEEE International Conference on
Signal and Image Processing Applications (ICSIPA2011), 2011.
[7] Srivastava, N., G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov. "Dropout: A

Simple Way to Prevent Neural Networks from Overfitting." Journal of Machine
Learning Research. Vol. 15, pp. 1929-1958, 2014.
[8] Bishop, C. M. Pattern Recognition and Machine Learning. Springer, New York, NY,
2006.
[9] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network
training by reducing internal covariate shift." preprint, arXiv:1502.03167 (2015).
See Also
averagePooling2dLayer | batchNormalizationLayer | classificationLayer |
clippedReluLayer | convolution2dLayer | crossChannelNormalizationLayer |
dropoutLayer | fullyConnectedLayer | imageInputLayer | leakyReluLayer |
maxPooling2dLayer | regressionLayer | reluLayer | softmaxLayer |
More About
• “List of Deep Learning Layers” on page 1-33
1-53
1 Deep Networks

1-54
Set Up Parameters and Train Convolutional Neural Network
Set Up Parameters and Train Convolutional Neural

Network
In this section...
“Specify Solver and Maximum Number of Epochs” on page 1-55
“Specify and Modify Learning Rate” on page 1-56
“Specify Validation Data” on page 1-57
“Select Hardware Resource” on page 1-57
“Save Checkpoint Networks and Resume Training” on page 1-58
“Set Up Parameters in Convolutional and Fully Connected Layers” on page 1-58
“Train Your Network” on page 1-59
After you define the layers of your neural network as described in “Specify Layers of
Convolutional Neural Network” on page 1-40, the next step is to set up the training
options for the network. Use the trainingOptions function to define the global training
parameters. To train a network, use the object returned by trainingOptions as an
input argument to the trainNetwork function. For example:
options = trainingOptions('adam');
trainedNet = trainNetwork(data,layers,options);
Layers with learnable parameters also have options for adjusting the learning
parameters. For more information, see “Set Up Parameters in Convolutional and Fully
Connected Layers” on page 1-58.
Specify Solver and Maximum Number of Epochs

trainNetwork can use different variants of stochastic gradient descent to train the
network. Specify the optimization algorithm by using the solverName argument of
trainingOptions. To minimize the loss, these algorithms update the network
parameters by taking small steps in the direction of the negative gradient of the loss
function.
The 'adam' (derived from adaptive moment estimation) solver is often a good optimizer
to try first. You can also try the 'rmsprop' (root mean square propagation) and 'sgdm'
(stochastic gradient descent with momentum) optimizers and see if this improves
1-55
1 Deep Networks
training. Different solvers work better for different problems. For more information about
the different solvers, see “Stochastic Gradient Descent”.
The solvers update the parameters using a subset of the data each step. This subset is
called a mini-batch. You can specify the size of the mini-batch by using the
'MiniBatchSize' name-value pair argument of trainingOptions. Each parameter
update is called an iteration. A full pass through the entire data set is called an epoch.
You can specify the maximum number of epochs to train for by using the 'MaxEpochs'
name-value pair argument of trainingOptions. The default value is 30, but you can
choose a smaller number of epochs for small networks or for fine-tuning and transfer
learning, where most of the learning is already done.
By default, the software shuffles the data once before training. You can change this
setting by using the 'Shuffle' name-value pair argument.
Specify and Modify Learning Rate

You can specify the global learning rate by using the 'InitialLearnRate' name-value
pair argument of trainingOptions. By default, trainNetwork uses this value
throughout the entire training process. You can choose to modify the learning rate every
certain number of epochs by multiplying the learning rate with a factor. Instead of using a
small, fixed learning rate throughout the training process, you can choose a larger
learning rate in the beginning of training and gradually reduce this value during
optimization. Doing so can shorten the training time, while enabling smaller steps
towards the minimum of the loss as training progresses.
Tip If the mini-batch loss during training ever becomes NaN, then the learning rate is
likely too high. Try reducing the learning rate, for example by a factor of 3, and restarting
network training.
To gradually reduce the learning rate, use the 'LearnRateSchedule','piecewise'

name-value pair argument. Once you choose this option, trainNetwork multiplies the
initial learning rate by a factor of 0.1 every 10 epochs. You can specify the factor by which
to reduce the initial learning rate and the number of epochs by using the
'LearnRateDropFactor' and 'LearnRateDropPeriod' name-value pair arguments,
respectively.
1-56
Set Up Parameters and Train Convolutional Neural Network
Specify Validation Data

To perform network validation during training, specify validation data using the
'ValidationData' name-value pair argument of trainingOptions. By default,
trainNetwork validates the network every 50 iterations by predicting the response of
the validation data and calculating the validation loss and accuracy (root mean squared
error for regression networks). You can change the validation frequency using the
'ValidationFrequency' name-value pair argument. If your network has layers that
behave differently during prediction than during training (for example, dropout layers),
then the validation accuracy can be higher than the training (mini-batch) accuracy. You
can also use the validation data to stop training automatically when the validation loss
stops decreasing. To turn on automatic validation stopping, use the
'ValidationPatience' name-value pair argument.
Performing validation at regular intervals during training helps you to determine if your
network is overfitting to the training data. A common problem is that the network simply
"memorizes" the training data, rather than learning general features that enable the
network to make accurate predictions for new data. To check if your network is
overfitting, compare the training loss and accuracy to the corresponding validation
metrics. If the training loss is significantly lower than the validation loss, or the training
accuracy is significantly higher than the validation accuracy, then your network is
overfitting.
To reduce overfitting, you can try adding data augmentation. Use an

augmentedImageDatastore to perform random transformations on your input images.
This helps to prevent the network from memorizing the exact position and orientation of
objects. You can also try increasing the L2 regularization using the
'L2Regularization' name-value pair argument, using batch normalization layers after
convolutional layers, and adding dropout layers.
Select Hardware Resource

If a GPU is available, then trainNetwork uses it for training, by default. Otherwise,
trainNetwork uses a CPU. Alternatively, you can specify the execution environment you
want using the 'ExecutionEnvironment' name-value pair argument. You can specify a
single CPU ('cpu'), a single GPU ('gpu'), multiple GPUs ('multi-gpu'), or a local
parallel pool or compute cluster ('parallel'). All options other than 'cpu' require
Parallel Computing Toolbox. Training on a GPU requires a CUDA enabled GPU with
compute capability 3.0 or higher.
1-57
1 Deep Networks
Save Checkpoint Networks and Resume Training

Deep Learning Toolbox enables you to save networks as .mat files after each epoch during
training. This periodic saving is especially useful when you have a large network or a
large data set, and training takes a long time. If the training is interrupted for some
reason, you can resume training from the last saved checkpoint network. If you want
trainNetwork to save checkpoint networks, then you must specify the name of the path
by using the 'CheckpointPath' name-value pair argument of trainingOptions. If
the path that you specify does not exist, then trainingOptions returns an error.
trainNetwork automatically assigns unique names to checkpoint network files. In the

example name, net_checkpoint__351__2018_04_12__18_09_52.mat, 351 is the
iteration number, 2018_04_12 is the date, and 18_09_52 is the time at which
trainNetwork saves the network. You can load a checkpoint network file by double-
clicking it or using the load command at the command line. For example:
load net_checkpoint__351__2018_04_12__18_09_52.mat
You can then resume training by using the layers of the network as an input argument to
trainNetwork. For example:
trainNetwork(XTrain,YTrain,net.Layers,options)
You must manually specify the training options and the input data, because the
checkpoint network does not contain this information. For an example, see “Resume
Training from Checkpoint Network” on page 1-71.
Set Up Parameters in Convolutional and Fully Connected

Layers
You can set the learning parameters to be different from the global values specified by
trainingOptions in layers with learnable parameters, such as convolutional and fully
connected layers. For example, to adjust the learning rate for the biases or weights, you
can specify a value for the BiasLearnRateFactor or WeightLearnRateFactor
properties of the layer, respectively. The trainNetwork function multiplies the learning
rate that you specify by using trainingOptions with these factors. Similarly, you can
also specify the L2 regularization factors for the weights and biases in these layers by
specifying the BiasL2Factor and WeightL2Factor properties, respectively.
trainNetwork then multiplies the L2 regularization factors that you specify by using
trainingOptions with these factors.
1-58
See Also
Initialize Weights in Convolutional and Fully Connected Layers
By default, the initial values of the weights of the convolutional and fully connected layers
are randomly generated from a Gaussian distribution with mean 0 and standard deviation
0.01. The initial biases are by default equal to 0. You can manually change the initial
weights and biases after you create the layers. For examples, see “Specify Initial Weights
and Biases in Convolutional Layer” and “Specify Initial Weights and Biases in Fully
Connected Layer”.
Train Your Network

After you specify the layers of your network and the training parameters, you can train
the network using the training data. The data, layers, and training options are all input
arguments of the trainNetwork function, as in this example.
layers = [imageInputLayer([28 28 1])

reluLayer
softmaxLayer
options = trainingOptions('adam');
convnet = trainNetwork(data,layers,options);
Training data can be an array, a table, or an ImageDatastore object. For more

information, see the trainNetwork function reference page.
See Also
Convolution2dLayer | FullyConnectedLayer | trainNetwork | trainingOptions
More About
1-59
1 Deep Networks
Deep Learning Tips and Tricks

This page describes various training options and techniques for improving the accuracy of
deep learning networks.
Choose Network Architecture

The appropriate network architecture depends on the task and the data available.
Consider these suggestions when deciding which architecture to use and whether to use a
pretrained network or to train from scratch.
Data Description of Task Learn More

Images Classification of natural Try different pretrained
images networks. For a list of
pretrained deep learning
networks, see “Pretrained
Convolutional Neural
To learn how to interactively

prepare a network for
transfer learning using Deep
Network Designer, see
“Transfer Learning with
Deep Network Designer” on
page 2-2.
Regression of natural Try different pretrained
images networks. For an example
showing how to convert a
pretrained classification
network into a regression
network, see “Convert
Classification Network into
Regression Network”.
1-60

Classification and For an example showing
regression of non-natural how to classify tiny images,
images (for example, tiny see “Train Residual Network
images and spectrograms for Image Classification”.
For an example showing

how to classify
spectrograms, see “Speech
Command Recognition
Using Deep Learning”.
Semantic segmentation Computer Vision System
Toolbox provides tools to
create deep learning
networks for semantic
segmentation. For more
information, see “Semantic
Segmentation Basics”
(Computer Vision System
Toolbox).
Sequences, time series, and Sequence-to-label For an example, see
signals classification “Sequence Classification
Sequence-to-sequence To learn more, see
classification and regression “Sequence-to-Sequence
Classification Using Deep
Learning” and “Sequence-
to-Sequence Regression
Time series forecasting For an example, see “Time
Series Forecasting Using
Deep Learning”.
1-61
1 Deep Networks

Text Classification and Text Analytics Toolbox
regression provides tools to create
deep learning networks for
text data. For an example,
see “Classify Text Data
Text generation For an example, see
“Generate Text Using Deep
Learning”.
Audio Audio classification and For an example, see
regression “Speech Command
Recognition Using Deep
Learning”.
Choose Training Options

The trainingOptions function provides a variety of options to train your deep learning
network.
Tip More Information

Monitor training progress To turn on the training progress plot, set
the 'Plots' option in trainingOptions
to 'training-progress'.
Use validation data To specify validation data, use the
'ValidationData' option in
trainingOptions.
Note If your validation data set is too small

and does not sufficiently represent the data,
then the reported metrics might not help
you. Using a too large validation data set
can result in slower training.
1-62
Tip More Information

For transfer learning, speed up the learning Specify higher learning rate factors for new
of new layers and slow down the learning in layers by using, for example, the
the transferred layers WeightLearnRateFactor property of
convolution2dLayer.
Decrease the initial learning rate using the

'InitialLearnRate' option of
trainingOptions.
When transfer learning, you do not need to

train for as many epochs. Decrease the
number of epochs using the 'MaxEpochs'
option in trainingOptions.
To learn how to interactively prepare a

network for transfer learning using Deep
Network Designer, see “Transfer Learning
with Deep Network Designer” on page 2-
2.
Shuffle your data every epoch To shuffle your data every epoch (one full
pass of the data), set the 'Shuffle' option
in trainingOptions to 'every-epoch'.
Note For sequence data, shuffling can have

a negative impact on the accuracy as it can
increase the amount of padding or
truncated data. If you have sequence data,
then sorting the data by sequence length
can help. To learn more, see “Sequence
Padding, Truncation, and Splitting” on page
1-158.
Try different optimizers To specify different optimizers, use the
solverName argument in
trainingOptions.
For more information, see “Set Up Parameters and Train Convolutional Neural Network”
on page 1-55.
1-63
1 Deep Networks
Improve Training Accuracy

If you notice problems during training, then consider these possible solutions.
Problem Possible Solution

NaNs or large spikes in the loss Decrease the initial learning rate using the
trainingOptions.
If decreasing the learning rate does not

help, then try using gradient clipping. To
set the gradient threshold, use the
'GradientThreshold' option in
trainingOptions.
Loss is still decreasing at the end of Train for longer by increasing the number
training of epochs using the 'MaxEpochs' option in
trainingOptions.
Loss plateaus If the loss plateaus at an unexpectedly high
value, then drop the learning rate at the
plateau. To change the learning rate
schedule, use the 'LearnRateSchedule'
If dropping the learning rate does not help,

then the model might be underfitting. Try
increasing the number of parameters or
layers. You can check if the model is
underfitting by monitoring the validation
loss.
1-64
Problem Possible Solution

Validation loss is much higher than the To prevent overfitting, try one or more of
training loss the following:
• Use data augmentation. For more

information, see “Train Network with
Augmented Images”.
• Use dropout layers. For more
information, see dropoutLayer.
• Increase the global L2 regularization
factor using the 'L2Regularization'
Loss decreases very slowly Increase the initial learning rate using the
trainingOptions.
For image data, try including batch

normalization layers in your network. For
more information, see
batchNormalizationLayer.
For more information, see “Set Up Parameters and Train Convolutional Neural Network”
on page 1-55.
Fix Errors in Training

If your network does not train at all, then consider the possible solutions.
1-65
1 Deep Networks
Error Description Possible Solution

Out-of-memory error when The available hardware is Try reducing the mini-batch
training unable to store the current size using the
mini-batch, the network 'MiniBatchSize' option
weights, and the computed of trainingOptions.
activations.
If reducing the mini-batch
size does not work, then try
using a smaller network,
reducing the number of
layers, or reducing the
number of parameters or
filters in the layers.
Custom layer errors There could be an issue with Check the validity of the
the implementation of the custom layer and find
custom layer. potential issues using
checkLayer.
If a test fails when you use

checkLayer, then the
function provides a test
diagnostic and a framework
diagnostic. The test
diagnostic highlights any
layer issues, whereas the
framework diagnostic
provides more detailed
information. To learn more
about the test diagnostics
and get suggestions for
possible solutions, see
“Diagnostics” on page 1-
146.
1-66
Error Description Possible Solution

Training throws the error Sometimes, the GPU throws Try reducing the mini-batch
'CUDA_ERROR_UNKNOWN' this error when it is being size using the
used for both compute and 'MiniBatchSize' option
display requests from the of trainingOptions.
OS.
If reducing the mini-batch
size does not work, then in
Windows®, try adjusting the
Timeout Detection and
Recovery (TDR) settings.
For example, change the
TdrDelay from 2 seconds
(default) to 4 seconds
(requires registry edit).
You can analyze your deep learning network using analyzeNetwork. The
analyzeNetwork function displays an interactive visualization of the network
architecture, detects errors and issues with the network, and provides detailed
information about the network layers. Use the network analyzer to visualize and
understand the network architecture, check that you have defined the architecture
correctly, and detect problems before training. Problems that analyzeNetwork detects
include missing or disconnected layers, mismatched or incorrect sizes of layer inputs, an
incorrect number of layer inputs, and invalid graph structures.
Prepare and Preprocess Data

You can improve the accuracy by preprocessing your data.
Weight or Balance Classes
Ideally, all classes have an equal number of observations. However, for some tasks,
classes can be imbalanced. For example, automotive datasets of street scenes tend to
have more sky, building, and road pixels than pedestrian and bicyclist pixels because the
sky, buildings, and roads cover more image area. If not handled correctly, this imbalance
can be detrimental to the learning process because the learning is biased in favor of the
dominant classes.
For semantic segmentation tasks, you can specify class weights in

pixelClassificationLayer using the ClassWeights property. For image
1-67
1 Deep Networks
classification tasks, you can use the example custom classification layer provided in
“Define Custom Weighted Classification Layer” on page 1-131.
Alternatively, you can balance the classes by doing one or more of the following:
• Add new observations from the least frequent classes.

• Remove observations from the most frequent classes.
• Group similar classes. For example, group the classes "car" and "truck" into the single
class "vehicle".
Preprocess Image Data
For more information about preprocessing image data, see “Preprocess Images for Deep
Learning” on page 1-166.
Task More Information

Resize images To use a pretrained network, you must
resize images to the input size of the
network. To resize images, use
augmentedImageDatastore. For
example, this syntax resizes images in the
image datastore imds:
auimds = augmentedImageDatastore(inputSize,imds)
Tip Use augmentedImageDatastore for

efficient preprocessing of images for deep
learning including image resizing.
Do not use the readFcn option of

imageDatastore as this option is usually
significantly slower.
Image augmentation To avoid overfitting, use image
transformation. To learn more, see “Train
Network with Augmented Images”.
1-68

Normalize regression targets Normalize the predictors before you input
them to the network. If you normalize the
responses before training, then you must
transform the predictions of the trained
network to obtain the predictions of the
original responses.
For more information, see “Train

Convolutional Neural Network for
Regression”.
Preprocess Sequence Data
For more information about working with LSTM networks, see “Long Short-Term Memory

Normalize sequence data To normalize sequence data, first calculate
the per-feature mean and standard
deviation for all the sequences. Then, for
each training observation, subtract the
mean value and divide by the standard
deviation.
To learn more, see “Normalize Sequence

Data” on page 1-160.
Reduce sequence padding and truncation To reduce the amount of padding or
discarded data when padding or truncating
sequences, try sorting your data by
sequence length.
To learn more, see “Sequence Padding,

Truncation, and Splitting” on page 1-158.
Use Available Hardware

To specify the execution environment, use the 'ExecutionEnvironment' option in
trainingOptions.
1-69
1 Deep Networks
Problem More Information

Training on CPU is slow If training is too slow on a single CPU, try
using a pretrained deep learning network
as a feature extractor and train a machine
learning model. For an example, see
“Feature Extraction Using AlexNet”.
Training LSTM on GPU is slow The CPU is better suited for training an
LSTM network using mini-batches with
short sequences. To use the CPU, set the
'ExecutionEnvironment' option in
trainingOptions to 'cpu'.
Software does not use all available GPUs If you have access to a machine with
multiple GPUs, simply set the
'ExecutionEnvironment' option in
trainingOptions to 'multi-gpu'. For
more information, see “Deep Learning on
Multiple GPUs” on page 3-2.
For more information, see “Scale Up Deep Learning in Parallel and in the Cloud” on page
3-2.
See Also
Deep Network Designer | analyzeNetwork | checkLayer | trainingOptions
More About
• “Preprocess Images for Deep Learning” on page 1-166
• “Transfer Learning with Deep Network Designer” on page 2-2
• “Convert Classification Network into Regression Network”
1-70
Resume Training from Checkpoint Network

This example shows how to save checkpoint networks while training a deep learning
network and resume training from a previously saved network.
Load Sample Data
Load the sample data as a 4-D array. digitTrain4DArrayData loads the digit training
set as 4-D array data. XTrain is a 28-by-28-by-1-by-5000 array, where 28 is the height
and 28 is the width of the images. 1 is the number of channels and 5000 is the number of
synthetic images of handwritten digits. YTrain is a categorical vector containing the
labels for each observation.
[XTrain,YTrain] = digitTrain4DArrayData;
size(XTrain)
ans = 1×4
28 28 1 5000
Display some of the images in XTrain.
figure;
perm = randperm(size(XTrain,4),20);
for i = 1:20
subplot(4,5,i);
imshow(XTrain(:,:,:,perm(i)));
end
1-71
1 Deep Networks
Define Network Architecture
Define the neural network architecture.
layers = [
convolution2dLayer(3,8,'Padding','same')
reluLayer
1-72
reluLayer
reluLayer
averagePooling2dLayer(7)
softmaxLayer
Specify Training Options and Train Network
Specify training options for stochastic gradient descent with momentum (SGDM) and
specify the path for saving the checkpoint networks.
checkpointPath = pwd;
options = trainingOptions('sgdm', ...
'InitialLearnRate',0.1, ...
'MaxEpochs',20, ...
'Verbose',false, ...
'Plots','training-progress', ...
'Shuffle','every-epoch', ...
'CheckpointPath',checkpointPath);
Train the network. trainNetwork uses a GPU if there is one available. If there is no
available GPU, then it uses CPU. trainNetwork saves one checkpoint network each
epoch and automatically assigns unique names to the checkpoint files.
net1 = trainNetwork(XTrain,YTrain,layers,options);
1-73
1 Deep Networks
Load Checkpoint Network and Resume Training
Suppose that training was interrupted and did not complete. Rather than restarting the
training from the beginning, you can load the last checkpoint network and resume
training from that point. trainNetwork saves the checkpoint files with file names on the
form net_checkpoint__195__2018_07_13__11_59_10.mat, where 195 is the
iteration number, 2018_07_13 is the date, and 11_59_10 is the time trainNetwork
saved the network. The checkpoint network has the variable name net.
Load the checkpoint network into the workspace.
load('net_checkpoint__195__2018_07_13__11_59_10.mat','net')
Specify the training options and reduce the maximum number of epochs. You can also
adjust other training options, such as the initial learning rate.
1-74
options = trainingOptions('sgdm', ...

'InitialLearnRate',0.1, ...
'MaxEpochs',15, ...
'Verbose',false, ...
'Plots','training-progress', ...
'Shuffle','every-epoch', ...
'CheckpointPath',checkpointPath);
Resume training using the layers of the checkpoint network you loaded with the new
training options. If the checkpoint network is a DAG network, then use
layerGraph(net) as the argument instead of net.Layers.
net2 = trainNetwork(XTrain,YTrain,net.Layers,options);
1-75
1 Deep Networks
See Also
Related Examples
More About
1-76
See Also

1-77
1 Deep Networks
Define Custom Deep Learning Layers
Tip This topic explains how to define custom deep learning layers for your problems. For
a list of built-in layers in Deep Learning Toolbox, see “List of Deep Learning Layers” on
page 1-33.
This topic explains the architecture of deep learning layers and how to define custom
layers to use for your problems.
Type Description
Layer Define a custom deep learning layer and
specify optional learnable parameters,
forward functions, and a backward
function.
For an example showing how to define a

custom layer with learnable parameters,
see “Define a Custom Deep Learning Layer
with Learnable Parameters” on page 1-95.
Classification Output Layer Define a custom classification output layer
and specify a loss function.

custom classification output layer and
specify a loss function, see “Define a
Custom Classification Output Layer” on
page 1-120.
Regression Output Layer Define a custom regression output layer
and specify a loss function.

custom regression output layer and specify
a loss function, see “Define a Custom
Regression Output Layer” on page 1-109.
1-78
Layer Templates
You can use the following templates to define new layers.
Intermediate Layer Template
This template outlines the structure of an intermediate layer with learnable parameters.
If the layer does not have learnable parameters, then you can omit the properties
(learnable) section. For an example showing how to define a layer with learnable
parameters, see “Define a Custom Deep Learning Layer with Learnable Parameters” on
page 1-95.
classdef myLayer < nnet.layer.Layer
properties
% (Optional) Layer properties.
% Layer properties go here.

end
properties (Learnable)
% (Optional) Layer learnable parameters.
% Layer learnable parameters go here.

end
methods
function layer = myLayer()
% (Optional) Create a myLayer.
% This function must have the same name as the layer.
% Layer constructor function goes here.

end
function Z = predict(layer, X)
% Forward input data through the layer at prediction time and
% output the result.
%
% Inputs:
% layer - Layer to forward propagate through
% X - Input data
% Output:
% Z - Output of layer forward function
% Layer forward function for prediction goes here.

end
function [Z, memory] = forward(layer, X)

% (Optional) Forward input data through the layer at training
% time and output the result and a memory value.
%
% Inputs:
1-79
1 Deep Networks

% X - Input data
% Outputs:
% memory - Memory value for backward propagation
% Layer forward function for training goes here.

end
function [dLdX, dLdW1, …, dLdWn] = backward(layer, X, Z, dLdZ, memory)

% Backward propagate the derivative of the loss function through
% the layer.
%
% Inputs:
% layer - Layer to backward propagate through
% X - Input data
% dLdZ - Gradient propagated from the deeper layer
% memory - Memory value from forward function
% Outputs:
% dLdX - Derivative of the loss with respect to the
% input data
% dLdW1, ..., dLdWn - Derivatives of the loss with respect to each
% learnable parameter
% Layer backward function goes here.

end
end
end
Classification Output Layer Template
This template outlines the structure of a classification output layer with a loss function.
For an example showing how to define a classification output layer and specify a loss
function, see “Define a Custom Classification Output Layer” on page 1-120.
classdef myClassificationLayer < nnet.layer.ClassificationLayer
properties

end
methods
function layer = myClassificationLayer()
% (Optional) Create a myClassificationLayer.

end
function loss = forwardLoss(layer, Y, T)

% Return the loss between the predictions Y and the
% training targets T.
1-80
%
% Inputs:
% layer - Output layer
% Y – Predictions made by network
% T – Training targets
%
% Output:
% loss - Loss between Y and T
% Layer forward loss function goes here.

end
function dLdY = backwardLoss(layer, Y, T)

% Backward propagate the derivative of the loss function.
%
% Inputs:
%
% Output:
% dLdY - Derivative of the loss with respect to the predictions Y
% Layer backward loss function goes here.

end
end
end
Regression Output Layer Template
This template outlines the structure of a regression output layer with a loss function. For
an example showing how to define a regression output layer and specify a loss function,
see “Define a Custom Regression Output Layer” on page 1-109.
classdef myRegressionLayer < nnet.layer.RegressionLayer
properties

end
methods
function layer = myRegressionLayer()
% (Optional) Create a myRegressionLayer.

end

%
% Inputs:
1-81
1 Deep Networks
%
% Output:

end

%
% Inputs:
%
% Output:

end
end
end
Intermediate Layer Architecture

An intermediate layer has two main components: the forward pass and the backward
pass.
During the forward pass of a network, the layer takes the output x of the previous layer,
applies a function, and then outputs (forward propagates) the result z to the next layer.
At the end of a forward pass, the network calculates the loss L between the predictions Y
and the true targets T.
During the backward pass of a network, each layer takes the derivatives of the loss with
respect to z, computes the derivatives of the loss L with respect to x, and then outputs
(backward propagates) results to the previous layer. If the layer has learnable
parameters, then the layer also computes the derivatives of the layer weights (learnable
parameters) W. The layer uses the derivatives of the weights to update the learnable
parameters.
The following figure describes the flow of data through a deep neural network and
highlights the data flow through the layer.
1-82
Intermediate Layer Properties
Declare the layer properties in the properties section of the class definition.
By default, custom intermediate layers have three properties:
• Name – Layer name, specified as a character vector or a string scalar. To include a

layer in a layer graph, you must specify a nonempty unique layer name. If you train a
series network with this layer and Name is set to '', then the software automatically
assigns a name to the layer at training time.
• Description – One-line description of the layer, specified as a character vector or a
string scalar. This description appears when the layer is displayed in a Layer array. If
you do not specify a layer description, then the software displays the layer class name.
• Type – Type of the layer, specified as a character vector or a string scalar. The value of
Type appears when the layer is displayed in a Layer array. If you do not specify a
layer type, then the software displays the layer class name.
If the layer has no other properties, then you can omit the properties section.
Learnable Parameters
Declare the layer learnable parameters in the properties (Learnable) section of the
class definition. If the layer has no learnable parameters, then you can omit the
properties (Learnable) section.
Optionally, you can specify the learning rate factor and the L2 factor of the learnable
parameters. By default, each learnable parameter has its learning rate factor and L2
factor set to 1.
For both built-in and user-defined layers, you can set and get the learn rate factors and L2
regularization factors using the following functions.
1-83
1 Deep Networks
setLearnRateFactor Set the learn rate factor of a learnable
parameter.
setL2Factor Set the L2 regularization factor of a
learnable parameter.
getLearnRateFactor Get the learn rate factor of a learnable
parameter.
getL2Factor Get the L2 regularization factor of a
learnable parameter.
To specify the learning rate factor and the L2 factor of a learnable parameter, use the
syntaxes layer = setLearnRateFactor(layer,'MyParameterName',value) and
layer = setL2Factor(layer,'MyParameterName',value), respectively.
To get the value of the learning rate factor and the L2 factor of a learnable parameter, use
the syntaxes getLearnRateFactor(layer,'MyParameterName') and
getL2Factor(layer,'MyParameterName') respectively.
For example, this syntax sets the learn rate factor of the learnable parameter Alpha to
0.1.
layer = setLearnRateFactor(layer,'Alpha',0.1);
Forward Functions
A layer uses one of two functions to perform a forward pass: predict or forward. If the
forward pass is at prediction time, then the layer uses the predict function. If the
forward pass is at training time, then the layer uses the forward function. The forward
function has an additional output argument memory, which you can use during backward
propagation.
If you do not require two different functions for prediction time and training time, then
you do not need to create the forward function. By default, the layer uses predict at
training time.
The syntax for predict is Z = predict(layer,X), where X is the input data and Z is
the output of the layer forward function.
The syntax for forward is [Z,memory] = forward(layer,X), where X is the input to

the layer forward function, Z is the output of the layer forward function, and memory is
1-84
the memory value to use in backward propagation. memory is a required output argument
and it must return a value. If the layer does not require a memory value, then return an
empty value [].
The dimensions of X depend on the output of the previous layer. Similarly, the output Z
must have the appropriate shape for the next layer.
Built-in layers output 4-D arrays with size h-by-w-by-c-by-N, except for LSTM layers and
sequence input layers, which output 3-D arrays of size D-by-N-by-S.
Fully connected, ReLU, dropout, and softmax layers also accept 3-D inputs. When these
layers get inputs of this shape, they then output 3-D arrays of size D-by-N-by-S.
These dimensions correspond to the following:
• h – Height of the output

• w – Width of the output
• c – Number of channels in the output
• N – Number of observations (mini-batch size)
• D – Feature dimension of sequence
• S – Sequence length
Backward Function
The layer uses one function for a backward pass: backward. The backward function
computes the derivatives of the loss with respect to the input data and then outputs
(backward propagates) results to the previous layer. If the layer has learnable
parameters, then backward also computes the derivatives of the layer weights (learnable
parameters). During the backward pass, the layer automatically updates the learnable
parameters using these derivatives.
To calculate the derivatives of the loss, you can use the chain rule:
∂L ∂L ∂z j
=Â
∂x j ∂z j ∂x
∂L ∂L ∂z j
=Â
∂Wi j ∂z j ∂Wi
1-85
1 Deep Networks
The syntax for backward is [dLdX,dLdW1,…,dLdWn] =

backward(layer,X,Z,dLdZ,memory). For the inputs, X is the layer input data, Z is the
output of forward, dLdZ is the gradient backward propagated from the next layer, and
memory is the memory output of forward. For the outputs, dLdX is the derivative of the
loss with respect to the layer input data, and dLdW1,…,dLdWn are the derivatives of the
loss with respect to the learnable parameters.
The values of X and Z are the same as in the forward functions. The dimensions of dLdZ
are the same as the dimensions of Z.
The dimensions and data type of dLdX are the same as the dimensions and data type of X.
The dimensions and data types of dLdW1,…,dLdWn are the same as the dimensions and
data types of W1,…,Wn, respectively, where Wi is the ith learnable parameter.
During the backward pass, the layer automatically updates the learnable parameters
using the derivatives dLdW1,…,dLdWn.
GPU Compatibility
For GPU compatibility, the layer functions must support inputs and return outputs of type
gpuArray. Any other functions the layer uses must do the same. Many MATLAB built-in
functions support gpuArray input arguments. If you call any of these functions with at
least one gpuArray input, then the function executes on the GPU and returns a
gpuArray output. For a list of functions that execute on a GPU, see “Run MATLAB
Functions on a GPU” (Parallel Computing Toolbox). To use a GPU for deep learning, you
must also have a CUDA enabled NVIDIA GPU with compute capability 3.0 or higher. For
more information on working with GPUs in MATLAB, see “GPU Computing in MATLAB”
Check Validity of Layer

If you create a custom deep learning layer, then you can use the checkLayer function to
check that the layer is valid. The function checks layers for validity, GPU compatibility,
and correctly defined gradients. To check that a layer is valid, run the following
command:
checkLayer(layer,validInputSize,'ObservationDimension',dim)
where layer is an instance of the layer, validInputSize is a vector specifying the valid
input size to the layer, and dim specifies the dimension of the observations in the layer
input data. For large input sizes, the gradient checks take longer to run. To speed up the
tests, specify a smaller valid input size.
1-86
For more information, see “Check Custom Layer Validity” on page 1-141.
Check Validity of Layer Using checkLayer
Check the layer validity of the custom layer preluLayer.
Define a custom PReLU layer. To create this layer, save the file preluLayer.m in the
current folder.
Create an instance of the layer and check its validity using checkLayer. Specify the valid
input size to be the size of a single observation of typical input to the layer. The layer
expects 4-D array inputs, where the first three dimensions correspond to the height,
width, and number of channels of the previous layer output, and the fourth dimension
corresponds to the observations.
Specify the typical size of the input of an observation and set

'ObservationDimension' to 4.
layer = preluLayer(20,'prelu');
validInputSize = [24 24 20];
checkLayer(layer,validInputSize,'ObservationDimension',4)
Skipping GPU tests. No compatible GPU device found.
Running nnet.checklayer.TestCase
.......... .....
Done nnet.checklayer.TestCase
__________
Test Summary:
15 Passed, 0 Failed, 0 Incomplete, 6 Skipped.
Time elapsed: 66.797 seconds.
Here, the function does not detect any issues with the layer.
Include Layer in Network
You can use a custom layer in the same way as any other layer in Deep Learning Toolbox.
current folder.
Create a layer array that includes the custom layer preluLayer.
1-87
1 Deep Networks
layers = [
preluLayer(20,'prelu')
softmaxLayer
Output Layer Architecture

At the end of a forward pass at training time, an output layer takes the predictions
(outputs) y of the previous layer and calculates the loss L between these predictions and
the training targets. The output layer computes the derivatives of the loss L with respect
to the predictions y and outputs (backward propagates) results to the previous layer.
The following figure describes the flow of data through a convolutional neural network
and an output layer.
Output Layer Properties
Declare the layer properties in the properties section of the class definition.
By default, custom output layers have the following properties:

1-88
you do not specify a layer description, then the software displays "Classification
Output" or "Regression Output".
Custom classification layers also have the following property:
• Classes – Classes of the output layer, specified as a categorical vector, string array,
cell array of character vectors, or 'auto'. If Classes is 'auto', then the software
automatically sets the classes at training time. If you specify the string array or cell
array of character vectors str, then the software sets the classes of the output layer
to categorical(str,str). The default value is 'auto'.
Custom regression layers also have the following property:
• ResponseNames – Names of the responses, specified a cell array of character vectors

or a string array. At training time, the software automatically sets the response names
according to the training data. The default is {}.
Loss Functions
The output layer uses two functions to compute the loss and the derivatives:
forwardLoss and backwardLoss. The forwardLoss function computes the loss L. The
backwardLoss function computes the derivatives of the loss with respect to the
predictions.
The syntax for forwardLoss is loss = forwardLoss(layer, Y, T). The input Y

corresponds to the predictions made by the network. These predictions are the output of
the previous layer. The input T corresponds to the training targets. The output loss is the
loss between Y and T according to the specified loss function. The output loss must be
scalar.
The syntax for backwardLoss is dLdY = backwardLoss(layer, Y, T). The inputs Y

are the predictions made by the network and T are the training targets. The output dLdY
1-89
1 Deep Networks
is the derivative of the loss with respect to the predictions Y. The output dLdY must be
the same size as the layer input Y.
For classification problems, the dimensions of T depend on the type of problem.
Classification Task Dimensions of Layer Input

Image classification 4-D array of size 1-by-1-by-K-by-N, where K
is the number of classes and N is the mini-
batch size.
Sequence-to-label classification Matrix of size K-by-N, where K is the
number of classes and N is the mini-batch
size.
Sequence-to-sequence classification 3-D array of size K-by-N-by-S, where K is
the number of classes, N is the mini-batch
size, and S is the sequence length.
The size of Y depends on the output of the previous layer. To ensure that Y is the same
size as T, you must include a layer that outputs the correct size before the output layer.
For example, to ensure that Y is a 4-D array of prediction scores for K classes, you can
include a fully connected layer of size K followed by a softmax layer before the output
layer.
For regression problems, the dimensions of T also depend on the type of problem.
Regression Task Dimensions of Layer Input

Image regression 4-D array of size 1-by-1-by-R-by-N, where R
is the number of responses and N is the
mini-batch size.
Image-to-image regression 4-D array of size h-by-w-by-c-by-N, where h,
w, and c denote the height, width, and
number of channels of the output
respectively, and N is the mini-batch size.
Sequence-to-one regression Matrix of size R-by-N, where R is the
number of responses and N is the mini-
batch size.
1-90

Sequence-to-sequence regression 3-D array of size R-by-N-by-S, where R is
the number of responses, N is the mini-
batch size, and S is the sequence length.
For example, if the network defines an image regression network with one response and
has mini-batches of size 50, then T is a 4-D array of size 1-by-1-by-1-by-50.
For example, for image regression with R responses, to ensure that Y is a 4-D array of the
correct size, you can include a fully connected layer of size R before the output layer.
The forwardLoss and backwardLoss functions have the following output arguments.
Output Argument Description

loss (forwardLoss only) Calculated loss between the predictions Y
and the true target T.
dLdY (backwardLoss only) Derivative of the loss with respect to the
predictions Y.
If you want to include a user-defined output layer after a built-in layer, then
backwardLoss must output dLdY with the size expected by the previous layer. Built-in
layers expect dLdY to be the same size as Y.
GPU Compatibility
1-91
1 Deep Networks
Include Custom Regression Output Layer in Network
You can use a custom output layer in the same way as any other output layer in Deep
Learning Toolbox. This section shows how to create and train a network for regression
using a custom output layer.
The example constructs a convolutional neural network architecture, trains a network,

and uses the trained network to predict angles of rotated, handwritten digits. These
predictions are useful for optical character recognition.
Define a custom mean absolute error regression layer. To create this layer, save the file
maeRegressionLayer.m in the current folder.
Load the example training data.

[XTrain,~,YTrain] = digitTrain4DArrayData;
Create a layer array and include the custom regression output layer
maeRegressionLayer.
layers = [
reluLayer
maeRegressionLayer('mae')]
layers =
6x1 Layer array with layers:
1 '' Image Input 28x28x1 images with 'zerocenter' normalization

2 '' Convolution 20 5x5 convolutions with stride [1 1] and paddi
3 '' Batch Normalization Batch normalization
4 '' ReLU ReLU
5 '' Fully Connected 1 fully connected layer
6 'mae' Regression Output Mean absolute error
Set the training options and train the network.

net = trainNetwork(XTrain,YTrain,layers,options);
Training on single CPU.

Initializing image normalization.
1-92
|======================================================================================
| Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Base Learning
| | | (hh:mm:ss) | RMSE | Loss | Rate
|======================================================================================
| 1 | 1 | 00:00:00 | 25.52 | 22.0 | 0.010
| 2 | 50 | 00:00:04 | 12.67 | 10.2 | 0.010
| 3 | 100 | 00:00:08 | 12.23 | 9.9 | 0.010
| 4 | 150 | 00:00:12 | 11.56 | 8.9 | 0.010
| 6 | 200 | 00:00:17 | 11.72 | 8.7 | 0.010
| 7 | 250 | 00:00:21 | 11.63 | 7.8 | 0.010
| 8 | 300 | 00:00:25 | 11.09 | 8.3 | 0.010
| 9 | 350 | 00:00:30 | 9.48 | 6.9 | 0.010
| 11 | 400 | 00:00:34 | 9.86 | 7.4 | 0.010
| 12 | 450 | 00:00:38 | 8.14 | 6.0 | 0.010
| 13 | 500 | 00:00:43 | 8.46 | 6.6 | 0.010
| 15 | 550 | 00:00:47 | 7.76 | 5.1 | 0.010
| 16 | 600 | 00:00:51 | 10.24 | 7.8 | 0.010
| 17 | 650 | 00:00:56 | 8.24 | 6.1 | 0.010
| 18 | 700 | 00:01:00 | 7.93 | 5.9 | 0.010
| 20 | 750 | 00:01:04 | 7.94 | 5.6 | 0.010
| 21 | 800 | 00:01:09 | 7.51 | 5.2 | 0.010
| 22 | 850 | 00:01:13 | 7.94 | 6.4 | 0.010
| 24 | 900 | 00:01:18 | 7.16 | 5.3 | 0.010
| 25 | 950 | 00:01:22 | 8.71 | 6.7 | 0.010
| 26 | 1000 | 00:01:26 | 9.56 | 8.0 | 0.010
| 27 | 1050 | 00:01:30 | 7.65 | 5.8 | 0.010
| 29 | 1100 | 00:01:34 | 5.88 | 4.3 | 0.010
| 30 | 1150 | 00:01:38 | 7.19 | 5.4 | 0.010
| 30 | 1170 | 00:01:40 | 7.73 | 6.0 | 0.010
|======================================================================================
Evaluate the network performance by calculating the prediction error between the
predicted and actual angles of rotation.
[XTest,~,YTest] = digitTest4DArrayData;
YPred = predict(net,XTest);
predictionError = YTest - YPred;
Calculate the number of predictions within an acceptable error margin from the true
angles. Set the threshold to 10 degrees and calculate the percentage of predictions within
this threshold.
thr = 10;
numCorrect = sum(abs(predictionError) < thr);
1-93
1 Deep Networks
numTestImages = size(XTest,4);
accuracy = numCorrect/numTestImages
accuracy = 0.7840
See Also
assembleNetwork | checkLayer | getL2Factor | getLearnRateFactor |
setL2Factor | setLearnRateFactor
More About
1-94
Define a Custom Deep Learning Layer with Learnable Parameters
Define a Custom Deep Learning Layer with Learnable

Parameters
If Deep Learning Toolbox does not provide the layer you require for your classification or
regression problem, then you can define your own custom layer using this example as a
guide. This example shows how to create a PReLU layer, which is a layer with a learnable
parameter and use it in a convolutional neural network.
To define a custom deep learning layer, you can use the template provided in this
example, which takes you through the following steps:
1 Name the layer – Give the layer a name so it can be used in MATLAB.
2 Declare the layer properties – Specify the properties of the layer and which
parameters are learned during training.
3 Create a constructor function (optional) – Specify how to construct the layer and
initialize its properties. If you do not specify a constructor function, then the software
initializes the properties with [] at creation.
4 Create forward functions – Specify how data passes forward through the layer
(forward propagation) at prediction time and at training time.
5 Create a backward function – Specify the derivatives of the loss with respect to the
input data and the learnable parameters (backward propagation).
A PReLU layer performs a threshold operation, where for each channel, any input value
less than zero is multiplied by a scalar learned at training time.[1] For values less than
a
zero, a PReLU layer applies scaling coefficients i to each channel of the input. These
coefficients form a learnable parameter, which the layer learns during training.
This figure from [1] compares the ReLU and PReLU layer functions.
1-95
1 Deep Networks
Layer with Learnable Parameters Template

Copy the layer with learnable parameters template into a new file in MATLAB. This
template outlines the structure of a layer with learnable parameters and includes the
functions that define the layer behavior.
classdef myLayer < nnet.layer.Layer
properties

end
% (Optional) Layer learnable parameters.
% Layer learnable parameters go here.

end
methods
function layer = myLayer()
% (Optional) Create a myLayer.
% This function must have the same name as the layer.

end
% Forward input data through the layer at prediction time and
% output the result.
1-96
%
% Inputs:
% X - Input data
% Output:
% Layer forward function for prediction goes here.

end
function [Z, memory] = forward(layer, X)

% (Optional) Forward input data through the layer at training
% time and output the result and a memory value.
%
% Inputs:
% X - Input data
% Outputs:
% memory - Memory value for backward propagation
% Layer forward function for training goes here.

end
function [dLdX, dLdW1, …, dLdWn] = backward(layer, X, Z, dLdZ, memory)

% Backward propagate the derivative of the loss function through
% the layer.
%
% Inputs:
% X - Input data
% memory - Memory value from forward function
% Outputs:
% input data
% dLdW1, ..., dLdWn - Derivatives of the loss with respect to each
% learnable parameter
% Layer backward function goes here.

end
end
end
Name the Layer

First, give the layer a name. In the first line of the class file, replace the existing name
myLayer with preluLayer.
classdef preluLayer < nnet.layer.Layer
...
end
1-97
1 Deep Networks
Next, rename the myLayer constructor function (the first function in the methods
section) so that it has the same name as the layer.
methods
function layer = preluLayer()
...
end
...
end
Save the Layer
Save the layer class file in a new file named preluLayer.m. The file name must match
the layer name. To use the layer, you must save the file in the current folder or in a folder
on the MATLAB path.
Declare Properties and Learnable Parameters

Declare the layer properties in the properties section and declare learnable
parameters by listing them in the properties (Learnable) section.
By default, custom intermediate layers have three properties:

you do not specify a layer description, then the software displays the layer class name.
A PReLU layer does not require any additional properties, so you can remove the
properties section.
1-98
A PReLU layer has only one learnable parameter, the scaling coefficient a. Declare this
learnable parameter in the properties (Learnable) section and call the parameter
Alpha.
% Layer learnable parameters
% Scaling coefficient
Alpha
end
Create Constructor Function

Create the function that constructs the layer and initializes the layer properties. Specify
any variables required to create the layer as inputs to the constructor function.
The PReLU layer constructor function requires only one input, the number of channels of
the expected input data. This input specifies the size of the learnable parameter Alpha.
Specify two input arguments named numChannels and name in the preluLayer
function. Add a comment to the top of the function that explains the syntax of the
function.
function layer = preluLayer(numChannels, name)

% layer = preluLayer(numChannels) creates a PReLU layer with
% numChannels channels and specifies the layer name.
...
end
Initialize Layer Properties
Initialize the layer properties, including learnable parameters in the constructor function.
Replace the comment % Layer constructor function goes here with code that
initializes the layer properties.
Set the Name property to the input argument name.
% Set layer name.

layer.Name = name;
Give the layer a one-line description by setting the Description property of the layer.
Set the description to describe the type of layer and its size.
1-99
1 Deep Networks
% Set layer description.

layer.Description = "PReLU with " + numChannels + " channels";
For a PReLU layer, when the input values are negative, the layer multiplies each channel
of the input by the corresponding channel of Alpha. Initialize the learnable parameter
Alpha to be a random vector of size 1-by-1-by-numChannels. With the third dimension
specified as size numChannels, the layer can use element-wise multiplication of the input
in the forward function. Alpha is a property of the layer object, so you must assign the
vector to layer.Alpha.
% Initialize scaling coefficient.

layer.Alpha = rand([1 1 numChannels]);
View the completed constructor function.

% layer = preluLayer(numChannels, name) creates a PReLU layer
% with numChannels channels and specifies the layer name.
% Set layer name.

layer.Name = name;


end
With this constructor function, the command preluLayer(3) creates a PReLU layer with
three channels.
Create Forward Functions

Create the layer forward functions to use at prediction time and training time.
Create a function named predict that propagates the data forward through the layer at
prediction time and outputs the result. The syntax for predict is Z = predict(layer,
X), where X is the input data and Z is the output of the layer forward function. By default,
the layer uses predict as the forward function at training time. To use a different
forward function at training time, or retain a value required for the backward function,
you must also create a function named forward.
1-100
The dimensions of X depend on the output of the previous layer. Similarly, the output Z
must have the appropriate shape for the next layer.
Built-in layers output 4-D arrays with size h-by-w-by-c-by-N, except for LSTM layers and
sequence input layers, which output 3-D arrays of size D-by-N-by-S.
Fully connected, ReLU, dropout, and softmax layers also accept 3-D inputs. When these
layers get inputs of this shape, they then output 3-D arrays of size D-by-N-by-S.
These dimensions correspond to the following:
• h – Height of the output

• w – Width of the output
• c – Number of channels in the output
• N – Number of observations (mini-batch size)
• D – Feature dimension of sequence
• S – Sequence length
The forward function propagates the data forward through the layer at training time and
also outputs a memory value. The syntax for forward is [Z, memory] =
forward(layer, X), where memory is the output memory value. You can use this value
as an input to the backward function.
The forward function of a PReLU layer is
Ï x if xi > 0
f ( xi ) = Ì i
Óa i xi if xi £ 0
where
xi is the input of the nonlinear activation f on channel i, and a i is the coefficient
controlling the slope of the negative part. The subscript i in

a i indicates that the
nonlinear activation can vary on different channels.
Implement the forward function in predict. In predict, the input X corresponds to x in
the equation. The output Z corresponds to

f (x )
i . The PReLU layer does not require
memory or a different forward function for training, so you can remove the forward
function from the class file. Add a comment to the top of the function that explains the
syntaxes of the function.
1-101
1 Deep Networks
% Z = predict(layer, X) forwards the input data X through the
% layer and outputs the result Z.
Z = max(0, X) + layer.Alpha .* min(0, X);

end
Create Backward Function

Implement the derivatives of the loss with respect to the input data and the learnable
parameters in the backward function.
The syntax for backward is [dLdX,dLdW1,…,dLdWn] =

backward(layer,X,Z,dLdZ,memory). For the inputs, X is the layer input data, Z is the
output of forward, dLdZ is the gradient backward propagated from the next layer, and
memory is the memory output of forward. For the outputs, dLdX is the derivative of the
loss with respect to the layer input data, and dLdW1,…,dLdWn are the derivatives of the
loss with respect to the learnable parameters.
The dimensions of X and Z are the same as in the forward functions. The dimensions of
dLdZ are the same as the dimensions of Z.
The dimensions and data type of dLdX are the same as the dimensions and data type of X.
The dimensions and data types of dLdW1,…,dLdWn are the same as the dimensions and
data types of W1,…,Wn respectively where Wi is the ith learnable parameter.
During the backward pass, the layer automatically updates the learnable parameters
using the derivatives dLdW1,…,dLdWn.
If you want to include a custom layer after a built-in layer in a network, then the layer
functions must accept inputs X which are the outputs of the previous layer, and backward
propagate dLdX with the same size as X. If you want to include a custom layer before a
built-in layer, then the forward functions must output arrays Z with the size expected by
the next layer. Similarly, backward must accept inputs dLdZ with the same size as Z.
The derivative of the loss with respect to the input data is
∂L ∂L ∂f ( xi )
=
∂xi ∂f ( xi ) ∂xi
1-102
where
∂L / ∂f ( x )
i is the gradient propagated from the deeper layer, and the gradient of
the activation is
∂f ( xi ) Ï 1 if xi ≥ 0
=Ì .
∂xi Óa i if x i < 0
The derivative of the loss with respect to the learnable parameters is
∂L ∂L ∂f ( xij )
=Â
∂a i j ∂f ( xij ) ∂ai
where i indexes the channels, j indexes the elements over height, width, and observations,
∂L / ∂f ( xi ) is the gradient propagated from the deeper layer, and the gradient of the
and
activation is
∂f ( xi ) Ï 0 if xi ≥ 0
=Ì .
∂a i Ó xi if xi < 0
In backward, replace the output dLdW with the output dLdAlpha. In backward, the
f ( xi ) . The input dLdZ corresponds
input X corresponds to x. The input Z corresponds to
∂L / ∂f ( xi ) . The output dLdX corresponds to ∂L / ∂xi . The output dLdAlpha
to
corresponds to
∂L / ∂a i .
Add a comment to the top of the function that explains the syntaxes of the function.
function [dLdX, dLdAlpha] = backward(layer, X, Z, dLdZ, memory)

% [dLdX, dLdAlpha] = backward(layer, X, Z, dLdZ, memory)
% backward propagates the derivative of the loss function
% through the layer.
% Inputs:
% X - Input data
1-103
1 Deep Networks
% memory - Memory value which can be used in backward

% propagation
% Outputs:
% input data
% dLdAlpha - Derivative of the loss with respect to the
% learnable parameter Alpha
dLdX = layer.Alpha .* dLdZ;

dLdX(X>0) = dLdZ(X>0);
dLdAlpha = min(0,X) .* dLdZ;
dLdAlpha = sum(sum(dLdAlpha,1),2);
% Sum over all observations in mini-batch.

dLdAlpha = sum(dLdAlpha,4);
end
Completed Layer
View the completed layer class file.
classdef preluLayer < nnet.layer.Layer
% Example custom PReLU layer.
% Layer learnable parameters
% Scaling coefficient
Alpha
end
methods
% layer = preluLayer(numChannels, name) creates a PReLU layer
% with numChannels channels and specifies the layer name.
% Set layer name.

layer.Name = name;


1-104
end
% Z = predict(layer, X) forwards the input data X through the
% layer and outputs the result Z.
Z = max(0, X) + layer.Alpha .* min(0, X);

end
function [dLdX, dLdAlpha] = backward(layer, X, Z, dLdZ, memory)

% [dLdX, dLdAlpha] = backward(layer, X, Z, dLdZ, memory)
% backward propagates the derivative of the loss function
% through the layer.
% Inputs:
% X - Input data
% memory - Memory value which can be used in backward
% propagation
% Outputs:
% input data
% dLdAlpha - Derivative of the loss with respect to the
% learnable parameter Alpha
dLdX = layer.Alpha .* dLdZ;

dLdX(X>0) = dLdZ(X>0);
dLdAlpha = min(0,X) .* dLdZ;
dLdAlpha = sum(sum(dLdAlpha,1),2);
% Sum over all observations in mini-batch.

dLdAlpha = sum(dLdAlpha,4);
end
end
end
GPU Compatibility
1-105
1 Deep Networks
The MATLAB functions used in predict, forward, and backward all support gpuArray
inputs, so the layer is GPU compatible.
Check Validity of Layer Using checkLayer

Check the layer validity of the custom layer preluLayer.
current folder.
Create an instance of the layer and check its validity using checkLayer. Specify the valid
input size to be the size of a single observation of typical input to the layer. The layer
expects 4-D array inputs, where the first three dimensions correspond to the height,
width, and number of channels of the previous layer output, and the fourth dimension
corresponds to the observations.
Specify the typical size of the input of an observation and set

'ObservationDimension' to 4.
layer = preluLayer(20,'prelu');
checkLayer(layer,validInputSize,'ObservationDimension',4)
Running nnet.checklayer.TestCase
.......... .....
Done nnet.checklayer.TestCase
__________
Test Summary:
Here, the function does not detect any issues with the layer.
1-106
Include Custom Layer in Network

You can use a custom layer in the same way as any other layer in Deep Learning Toolbox.
This section shows how to create and train a network for digit classification using the
PReLU layer you created earlier.
current folder. Create a layer array including the custom layer preluLayer.
layers = [
preluLayer(20,'prelu')
softmaxLayer
options = trainingOptions('adam','MaxEpochs',10);

|======================================================================================
| | | (hh:mm:ss) | Accuracy | Loss | Rate
|======================================================================================
| 1 | 1 | 00:00:00 | 10.94% | 2.6157 | 0.001
| 2 | 50 | 00:00:07 | 77.34% | 0.7741 | 0.001
| 3 | 100 | 00:00:15 | 91.41% | 0.3383 | 0.001
| 4 | 150 | 00:00:23 | 95.31% | 0.2010 | 0.001
| 6 | 200 | 00:00:31 | 96.88% | 0.1289 | 0.001
| 7 | 250 | 00:00:38 | 99.22% | 0.0762 | 0.001
| 8 | 300 | 00:00:46 | 100.00% | 0.0625 | 0.001
| 9 | 350 | 00:00:53 | 100.00% | 0.0369 | 0.001
| 10 | 390 | 00:00:59 | 100.00% | 0.0318 | 0.001
|======================================================================================
1-107
1 Deep Networks
Evaluate the network performance by predicting on new data and calculating the
accuracy.
[XTest,YTest] = digitTest4DArrayData;
YPred = classify(net,XTest);
accuracy = sum(YTest==YPred)/numel(YTest)
accuracy = 0.9436
References
[1] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving deep into
rectifiers: Surpassing human-level performance on ImageNet classification." In
Proceedings of the IEEE international conference on computer vision, pp.
1026-1034. 2015.
See Also
assembleNetwork | checkLayer
More About
1-108
Define a Custom Regression Output Layer
Tip To create a regression output layer with mean squared error loss, use
regressionLayer. If you want to use a different loss function for your regression
problems, then you can define a custom regression output layer using this example as a
guide.
This example shows how to create a custom regression output layer with the mean
absolute error (MAE) loss.
To define a custom regression output layer, you can use the template provided in this
2 Declare the layer properties – Specify the properties of the layer.
3 Create a constructor function – Specify how to construct the layer and initialize its
properties. If you do not specify a constructor function, then the software initializes
the properties with '' at creation.
4 Create a forward loss function – Specify the loss between the predictions and the
training targets.
5 Create a backward loss function – Specify the derivative of the loss with respect to
the predictions.
A regression MAE layer computes the mean absolute error loss for regression problems.
MAE loss is an error measure between two continuous random variables. For predictions
Y and training targets T, the MAE loss between Y and T is given by
1 N
Ê1 R ˆ
L= Â Á Â Yni - Tni ˜,
N n =1 Ë R i =1 ¯
where N is the number of observations and R is the number of responses.
Regression Output Layer Template

Copy the regression output layer template into a new file in MATLAB. This template
outlines the structure of a regression output layer and includes the functions that define
the layer behavior.
1-109
1 Deep Networks
classdef myRegressionLayer < nnet.layer.RegressionLayer
properties

end
methods
function layer = myRegressionLayer()
% (Optional) Create a myRegressionLayer.

end

%
% Inputs:
%
% Output:

end

%
% Inputs:
%
% Output:

end
end
end
Name the Layer

myRegressionLayer with maeRegressionLayer.
classdef maeRegressionLayer < nnet.layer.RegressionLayer
...
end
Next, rename the myRegressionLayer constructor function (the first function in the
methods section) so that it has the same name as the layer.
1-110
methods
function layer = maeRegressionLayer()
...
end
...
end
Save the Layer
Save the layer class file in a new file named maeRegressionLayer.m. The file name
must match the layer name. To use the layer, you must save the file in the current folder
or in a folder on the MATLAB path.
Declare Layer Properties

Declare the layer properties in the properties section.

1-111
1 Deep Networks

The layer does not require any additional properties, so you can remove the properties
section.

To initialize the Name property at creation, specify the input argument name. Add a
comment to the top of the function that explains the syntax of the function.
function layer = maeRegressionLayer(name)
% layer = maeRegressionLayer(name) creates a
% mean-absolute-error regression layer and specifies the layer
% name.
...
end
Set the Name property to the input argument name. Set the description to describe the
type of layer and its size.

% name.
% Set layer name.

layer.Name = name;

layer.Description = 'Mean absolute error';
end
1-112
Create Forward Loss Function

Create a function named forwardLoss that returns the MAE loss between the
predictions made by the network and the training targets. The syntax for forwardLoss is
loss = forwardLoss(layer, Y, T), where Y is the output of the previous layer and
T contains the training targets.
For regression problems, the dimensions of T also depend on the type of problem.

Image regression 4-D array of size 1-by-1-by-R-by-N, where R
is the number of responses and N is the
mini-batch size.
Image-to-image regression 4-D array of size h-by-w-by-c-by-N, where h,
w, and c denote the height, width, and
number of channels of the output
respectively, and N is the mini-batch size.
Sequence-to-one regression Matrix of size R-by-N, where R is the
number of responses and N is the mini-
batch size.
Sequence-to-sequence regression 3-D array of size R-by-N-by-S, where R is
the number of responses, N is the mini-
batch size, and S is the sequence length.
For example, if the network defines an image regression network with one response and
has mini-batches of size 50, then T is a 4-D array of size 1-by-1-by-1-by-50.
For example, for image regression with R responses, to ensure that Y is a 4-D array of the
correct size, you can include a fully connected layer of size R before the output layer.
A regression MAE layer computes the mean absolute error loss for regression problems.
MAE loss is an error measure between two continuous random variables. For predictions
Y and training targets T, the MAE loss between Y and T is given by
1 N
Ê1 R ˆ
L= Â Á Â Yni - Tni ˜,
N n =1 Ë R i =1 ¯
1-113
1 Deep Networks
where N is the number of observations and R is the number of responses.
The inputs Y and T correspond to Y and T in the equation, respectively. The output loss
corresponds to L. To ensure that loss is scalar, output the mean loss over the mini-batch.
Add a comment to the top of the function that explains the syntaxes of the function.
% loss = forwardLoss(layer, Y, T) returns the MAE loss between
% the predictions Y and the training targets T.
% Calculate MAE.
R = size(Y,3);
meanAbsoluteError = sum(abs(Y-T),3)/R;
% Take mean over mini-batch.

N = size(Y,4);
loss = sum(meanAbsoluteError)/N;
end
Create Backward Loss Function

Create the backward loss function.
Create a function named backwardLoss that returns the derivatives of the MAE loss
with respect to the predictions Y. The syntax for backwardLoss is loss =
backwardLoss(layer, Y, T), where Y is the output of the previous layer and T
contains the training targets.
The dimensions of Y and T are the same as the inputs in forwardLoss.
The derivative of the MAE loss with respect to the predictions Y is given by
∂L 1
= sign(Yi - Ti ),
∂Yi NR
where N is the number of observations and R is the number of responses. Add a comment
to the top of the function that explains the syntaxes of the function.
% Returns the derivatives of the MAE loss with respect to the predictions Y
R = size(Y,3);
1-114
N = size(Y,4);
dLdY = sign(Y-T)/(N*R);
end
Completed Layer
View the completed regression output layer class file.
classdef maeRegressionLayer < nnet.layer.RegressionLayer
% Example custom regression layer with mean-absolute-error loss.
methods
% name.
% Set layer name.

layer.Name = name;

layer.Description = 'Mean absolute error';
end

% loss = forwardLoss(layer, Y, T) returns the MAE loss between
% Calculate MAE.
R = size(Y,3);
meanAbsoluteError = sum(abs(Y-T),3)/R;

N = size(Y,4);
loss = sum(meanAbsoluteError)/N;
end

% Returns the derivatives of the MAE loss with respect to the predictions Y
R = size(Y,3);
N = size(Y,4);
dLdY = sign(Y-T)/(N*R);
end
1-115
1 Deep Networks
end
end
GPU Compatibility
The MATLAB functions used in forwardLoss, and backwardLoss in

maeRegressionLayer all support gpuArray inputs, so the layer is GPU compatible.
Check Output Layer Validity

Check the layer validity of the custom classification output layer maeRegressionLayer.
Define a custom mean absolute error regression layer. To create this layer, save the file
maeRegressionLayer.m in the current folder. Create an instance of the layer.
layer = maeRegressionLayer('mae');
Check the layer is valid using checkLayer. Specify the valid input size to be the size of a
single observation of typical input to the layer. The layer expects a 1-by-1-by-R-by-N array
inputs, where R is the number of responses, and N is the number of observations in the
mini-batch.
checkLayer(layer,validInputSize,'ObservationDimension',4);
Running nnet.checklayer.OutputLayerTestCase
.......... ...
Done nnet.checklayer.OutputLayerTestCase
__________
1-116
Test Summary:
The test summary reports the number of passed, failed, incomplete, and skipped tests.
Include Custom Regression Output Layer in Network

Learning Toolbox. This section shows how to create and train a network for regression
using the custom output layer you created earlier.
The example constructs a convolutional neural network architecture, trains a network,

and uses the trained network to predict angles of rotated, handwritten digits. These
predictions are useful for optical character recognition.
[trainImages,~,trainAngles] = digitTrain4DArrayData;
Create a layer array including the regression output layer maeRegressionLayer.
layers = [
reluLayer
maeRegressionLayer('mae')]
layers =

2 '' Convolution 20 5x5 convolutions with stride [1 1] and paddi
4 '' ReLU ReLU
6 'mae' Regression Output Mean absolute error
1-117
1 Deep Networks
net = trainNetwork(trainImages,trainAngles,layers,options);

|======================================================================================
| | | (hh:mm:ss) | RMSE | Loss | Rate
|======================================================================================
| 1 | 1 | 00:00:00 | 25.52 | 22.0 | 0.010
| 2 | 50 | 00:00:05 | 12.67 | 10.2 | 0.010
| 3 | 100 | 00:00:10 | 12.23 | 9.9 | 0.010
| 4 | 150 | 00:00:14 | 11.56 | 8.9 | 0.010
| 6 | 200 | 00:00:19 | 11.72 | 8.7 | 0.010
| 7 | 250 | 00:00:23 | 11.63 | 7.8 | 0.010
| 8 | 300 | 00:00:28 | 11.09 | 8.3 | 0.010
| 9 | 350 | 00:00:32 | 9.48 | 6.9 | 0.010
| 11 | 400 | 00:00:37 | 9.86 | 7.4 | 0.010
| 12 | 450 | 00:00:42 | 8.14 | 6.0 | 0.010
| 13 | 500 | 00:00:46 | 8.46 | 6.6 | 0.010
| 15 | 550 | 00:00:51 | 7.76 | 5.1 | 0.010
| 16 | 600 | 00:00:56 | 10.24 | 7.8 | 0.010
| 17 | 650 | 00:01:01 | 8.24 | 6.1 | 0.010
| 18 | 700 | 00:01:06 | 7.93 | 5.9 | 0.010
| 20 | 750 | 00:01:12 | 7.94 | 5.6 | 0.010
| 21 | 800 | 00:01:17 | 7.51 | 5.2 | 0.010
| 22 | 850 | 00:01:21 | 7.94 | 6.4 | 0.010
| 24 | 900 | 00:01:26 | 7.16 | 5.3 | 0.010
| 25 | 950 | 00:01:31 | 8.71 | 6.7 | 0.010
| 26 | 1000 | 00:01:36 | 9.56 | 8.0 | 0.010
| 27 | 1050 | 00:01:40 | 7.65 | 5.8 | 0.010
| 29 | 1100 | 00:01:45 | 5.88 | 4.3 | 0.010
| 30 | 1150 | 00:01:50 | 7.19 | 5.4 | 0.010
| 30 | 1170 | 00:01:51 | 7.73 | 6.0 | 0.010
|======================================================================================
Evaluate the network performance by calculating the prediction error between the
predicted and actual angles of rotation.
[testImages,~,testAngles] = digitTest4DArrayData;
predictedTestAngles = predict(net,testImages);
predictionError = testAngles - predictedTestAngles;
1-118
See Also
Calculate the number of predictions within an acceptable error margin from the true
angles. Set the threshold to be 10 degrees and calculate the percentage of predictions
within this threshold.
thr = 10;
numCorrect = sum(abs(predictionError) < thr);
numTestImages = size(testImages,4);
accuracy = numCorrect/numTestImages
accuracy = 0.7840
See Also
assembleNetwork | checkLayer | regressionLayer
More About
1-119
1 Deep Networks
Define a Custom Classification Output Layer
Tip To construct a classification output layer with cross entropy loss for k mutually
exclusive classes, use classificationLayer. If you want to use a different loss
function for your classification problems, then you can define a custom classification
output layer using this example as a guide.
This example shows how to define a custom classification output layer with the sum of
squares error (SSE) loss and use it in a convolutional neural network.
To define a custom classification output layer, you can use the template provided in this
training targets.
the predictions.
A classification SSE layer computes the sum of squares error loss for classification
problems. SSE is an error measure between two continuous random variables. For
predictions Y and training targets T, the SSE loss between Y and T is given by
N K
1
L=
N
ÂÂ(Y ni - Tni ) 2 ,
n =1 i =1
where N is the number of observations and K is the number of classes.

Copy the classification output layer template into a new file in MATLAB. This template
outlines the structure of a classification output layer and includes the functions that
define the layer behavior.
1-120
properties

end
methods

end

%
% Inputs:
%
% Output:

end

%
% Inputs:
%
% Output:

end
end
end
Name the Layer

myClassificationLayer with sseClassificationLayer.
1-121
1 Deep Networks
classdef sseClassificationLayer < nnet.layer.ClassificationLayer

...
end
Next, rename the myClassificationLayer constructor function (the first function in

the methods section) so that it has the same name as the layer.
methods
function layer = sseClassificationLayer()
...
end
...
end
Save the Layer
Save the layer class file in a new file named sseClassificationLayer.m. The file
name must match the layer name. To use the layer, you must save the file in the current
folder or in a folder on the MATLAB path.
Declare Layer Properties

Declare the layer properties in the properties section.

1-122

In this example, the layer does not require any additional properties, so you can remove
the properties section.

Specify the input argument name to assign to the Name property at creation. Add a
comment to the top of the function that explains the syntax of the function.
function layer = sseClassificationLayer(name)
% layer = sseClassificationLayer(name) creates a sum of squares
% error classification layer and specifies the layer name.
...
end
Set the Name property to the input argument name.
1-123
1 Deep Networks
% Set layer name.

layer.Name = name;

layer.Description = 'Sum of squares error';
end
Create Forward Loss Function

Create a function named forwardLoss that returns the SSE loss between the predictions
made by the network and the training targets. The syntax for forwardLoss is loss =
forwardLoss(layer, Y, T), where Y is the output of the previous layer and T
represents the training targets.
For classification problems, the dimensions of T depend on the type of problem.
Classification Task Dimensions of Layer Input

Image classification 4-D array of size 1-by-1-by-K-by-N, where K
is the number of classes and N is the mini-
batch size.
Sequence-to-label classification Matrix of size K-by-N, where K is the
number of classes and N is the mini-batch
size.
Sequence-to-sequence classification 3-D array of size K-by-N-by-S, where K is
the number of classes, N is the mini-batch
size, and S is the sequence length.
For example, to ensure that Y is a 4-D array of prediction scores for K classes, you can
include a fully connected layer of size K followed by a softmax layer before the output
layer.
A classification SSE layer computes the sum of squares error loss for classification
problems. SSE is an error measure between two continuous random variables. For
predictions Y and training targets T, the SSE loss between Y and T is given by
1-124
N K
1
L=
N
ÂÂ(Y ni - Tni ) 2 ,
n =1 i =1
where N is the number of observations and K is the number of classes.
The inputs Y and T correspond to Y and T in the equation, respectively. The output loss
corresponds to L. Add a comment to the top of the function that explains the syntaxes of
the function.

% loss = forwardLoss(layer, Y, T) returns the SSE loss between
% Calculate sum of squares.

sumSquares = sum((Y-T).^2);

N = size(Y,4);
loss = sum(sumSquares)/N;
end
Create Backward Loss Function

Create the backward loss function.
Create a function named backwardLoss that returns the derivatives of the SSE loss with
respect to the predictions Y. The syntax for backwardLoss is loss =
backwardLoss(layer, Y, T), where Y is the output of the previous layer and T
represents the training targets.
The dimensions of Y and T are the same as the inputs in forwardLoss.
The derivative of the SSE loss with respect to the predictions Y is given by
dL 2
= (Yi - Ti )
d Yi N
where N is the number of observations. Add a comment to the top of the function that
explains the syntaxes of the function.
1-125
1 Deep Networks

% dLdY = backwardLoss(layer, Y, T) returns the derivatives of
% the SSE loss with respect to the predictions Y.
N = size(Y,4);
dLdY = 2*(Y-T)/N;
end
Completed Layer
View the completed classification output layer class file.
classdef sseClassificationLayer < nnet.layer.ClassificationLayer
% Example custom classification layer with sum of squares error loss.
methods
% Set layer name.

layer.Name = name;

layer.Description = 'Sum of squares error';
end

% loss = forwardLoss(layer, Y, T) returns the SSE loss between
% Calculate sum of squares.

sumSquares = sum((Y-T).^2);

N = size(Y,4);
loss = sum(sumSquares)/N;
end

% dLdY = backwardLoss(layer, Y, T) returns the derivatives of
% the SSE loss with respect to the predictions Y.
N = size(Y,4);
1-126
dLdY = 2*(Y-T)/N;
end
end
end
GPU Compatibility
The MATLAB functions used in forwardLoss, and backwardLoss all support gpuArray
inputs, so the layer is GPU compatible.
Check Output Layer Validity

Check the layer validity of the custom classification output layer
sseClassificationLayer.
Define a custom sum-of-squares error classification layer. To create this layer, save the file
sseClassificationLayer.m in the current folder. Create an instance of the layer.
layer = sseClassificationLayer('sse');
Check the layer is valid using checkLayer. Specify the valid input size to be the size of a
single observation of typical input to the layer. The layer expects a 1-by-1-by-K-by-N array
inputs, where K is the number of classes, and N is the number of observations in the mini-
batch.
checkLayer(layer,validInputSize,'ObservationDimension',4);
Running nnet.checklayer.OutputLayerTestCase
.......... ...
1-127
1 Deep Networks
Done nnet.checklayer.OutputLayerTestCase
__________
Test Summary:
The test summary reports the number of passed, failed, incomplete, and skipped tests.
Include Custom Classification Output Layer in Network

Learning Toolbox. This section shows how to create and train a network for classification
using the custom classification output layer that you created earlier.

Define a custom sum-of-squares error classification layer. To create this layer, save the file
sseClassificationLayer.m in the current folder. Create an instance of the layer.
Create a layer array including the custom classification output layer
sseClassificationLayer.
layers = [
reluLayer
softmaxLayer
sseClassificationLayer('sse')]
layers =

2 '' Convolution 20 5x5 convolutions with stride [1 1] and pad
4 '' ReLU ReLU
6 '' Softmax softmax
7 'sse' Classification Output Sum of squares error
1-128

|======================================================================================
| | | (hh:mm:ss) | Accuracy | Loss | Rate
|======================================================================================
| 1 | 1 | 00:00:00 | 14.84% | 0.8972 | 0.010
| 2 | 50 | 00:00:05 | 75.00% | 0.3203 | 0.010
| 3 | 100 | 00:00:10 | 92.97% | 0.1297 | 0.010
| 4 | 150 | 00:00:15 | 93.75% | 0.0868 | 0.010
| 6 | 200 | 00:00:20 | 96.88% | 0.0600 | 0.010
| 7 | 250 | 00:00:25 | 97.66% | 0.0494 | 0.010
| 8 | 300 | 00:00:30 | 100.00% | 0.0084 | 0.010
| 9 | 350 | 00:00:35 | 100.00% | 0.0139 | 0.010
| 11 | 400 | 00:00:39 | 99.22% | 0.0190 | 0.010
| 12 | 450 | 00:00:44 | 100.00% | 0.0059 | 0.010
| 13 | 500 | 00:00:49 | 99.22% | 0.0130 | 0.010
| 15 | 550 | 00:00:54 | 100.00% | 0.0046 | 0.010
| 16 | 600 | 00:00:59 | 99.22% | 0.0132 | 0.010
| 17 | 650 | 00:01:04 | 100.00% | 0.0032 | 0.010
| 18 | 700 | 00:01:09 | 99.22% | 0.0136 | 0.010
| 20 | 750 | 00:01:14 | 99.22% | 0.0131 | 0.010
| 21 | 800 | 00:01:19 | 99.22% | 0.0101 | 0.010
| 22 | 850 | 00:01:24 | 100.00% | 0.0019 | 0.010
| 24 | 900 | 00:01:29 | 100.00% | 0.0017 | 0.010
| 25 | 950 | 00:01:34 | 100.00% | 0.0016 | 0.010
| 26 | 1000 | 00:01:39 | 100.00% | 0.0008 | 0.010
| 27 | 1050 | 00:01:44 | 100.00% | 0.0010 | 0.010
| 29 | 1100 | 00:01:48 | 100.00% | 0.0012 | 0.010
| 30 | 1150 | 00:01:54 | 100.00% | 0.0010 | 0.010
| 30 | 1170 | 00:01:56 | 100.00% | 0.0009 | 0.010
|======================================================================================
Evaluate the network performance by making predictions on new data and calculating the
accuracy.
[XTest,YTest] = digitTest4DArrayData;
YPred = classify(net, XTest);
accuracy = mean(YTest == YPred)
1-129
1 Deep Networks
accuracy = 0.9856
See Also
assembleNetwork | checkLayer | classificationLayer
More About
1-130
Define Custom Weighted Classification Layer
Define Custom Weighted Classification Layer
Tip To construct a classification output layer with cross entropy loss for k mutually
exclusive classes, use classificationLayer. If you want to use a different loss
function for your classification problems, then you can define a custom classification
output layer using this example as a guide.
This example shows how to define and create a custom weighted classification output
layer with weighted cross entropy loss. Use a weighted classification layer for
classification problems with an imbalanced distribution of classes. For an example
showing how to use a weighted classification layer in a network, see “Speech Command
Recognition Using Deep Learning”.
To define a custom classification output layer, you can use the template provided in this
training targets.
the predictions.
A weighted classification layer computes the weighted cross entropy loss for classification
problems. Weighted cross entropy is an error measure between two continuous random
variables. For prediction scores Y and training targets T, the weighted cross entropy loss
between Y and T is given by
N K
1
L=-
N
Â Âw T i ni log(Yni ),
n =1 i =1
where N is the number of observations, K is the number of classes, and w is a vector of

weights for each class.
1-131
1 Deep Networks

Copy the classification output layer template into a new file in MATLAB. This template
outlines the structure of a classification output layer and includes the functions that
define the layer behavior.
properties

end
methods

end

%
% Inputs:
%
% Output:

end

%
% Inputs:
%
% Output:

end
end
end
1-132

Nnet - Ug 1 150 PDF

Uploaded by

Copyright:

Available Formats

Nnet - Ug 1 150 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nnet - Ug 1 150 PDF

Uploaded by

Copyright:

Available Formats

Deep Learning Toolbox™

Mark Hudson Beale

Latest news: www.mathworks.com

Sales and services: www.mathworks.com/sales_and_services

User community: www.mathworks.com/matlabcentral

Technical support: www.mathworks.com/support/contact_us

The MathWorks, Inc.

Try Deep Learning in 10 Lines of MATLAB Code . . . . . . . . . . 1-10

Deep Learning with Big Data on GPUs and in Parallel . . . . . 1-13

Construct Deep Network Using Autoencoders . . . . . . . . . . . . 1-18

Pretrained Convolutional Neural Networks . . . . . . . . . . . . . . 1-21

Learn About Convolutional Neural Networks . . . . . . . . . . . . . 1-29

List of Deep Learning Layers . . . . . . . . . . . . . . . . . . . . . . . . . . 1-33

Specify Layers of Convolutional Neural Network . . . . . . . . . . 1-40

Set Up Parameters and Train Convolutional Neural

Deep Learning Tips and Tricks . . . . . . . . . . . . . . . . . . . . . . . . . 1-60

Resume Training from Checkpoint Network . . . . . . . . . . . . . . 1-71

Define Custom Deep Learning Layers . . . . . . . . . . . . . . . . . . . 1-78

Define a Custom Deep Learning Layer with Learnable

Define a Custom Regression Output Layer . . . . . . . . . . . . . . 1-109

Define a Custom Classification Output Layer . . . . . . . . . . . . 1-120

Define Custom Weighted Classification Layer . . . . . . . . . . . . 1-131

Check Custom Layer Validity . . . . . . . . . . . . . . . . . . . . . . . . . 1-141

Long Short-Term Memory Networks . . . . . . . . . . . . . . . . . . . 1-154

Preprocess Images for Deep Learning . . . . . . . . . . . . . . . . . . 1-166

Develop Custom Mini-Batch Datastore . . . . . . . . . . . . . . . . . 1-170

Define Custom Mini-Batch Datastore For Super-Resolution

Deep Network Designer

Deep Learning in the Cloud

Deep Learning with MATLAB on Multiple GPUs . . . . . . . . . . . . 3-7

Neural Network Design Book

Neural Network Objects, Data, and Training Styles

Four Levels of Neural Network Design . . . . . . . . . . . . . . . . . . . 4-4

Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5

Create Neural Network Object . . . . . . . . . . . . . . . . . . . . . . . . . 4-17

Configure Neural Network Inputs and Outputs . . . . . . . . . . . 4-21

Understanding Deep Learning Toolbox Data Structures . . . . 4-23

Neural Network Training Concepts . . . . . . . . . . . . . . . . . . . . . 4-28

Multilayer Shallow Neural Networks and

Multilayer Shallow Neural Network Architecture . . . . . . . . . . . 5-4

Prepare Data for Multilayer Shallow Neural Networks . . . . . . 5-8

Choose Neural Network Input-Output Processing

Divide Data for Optimal Neural Network Training . . . . . . . . . 5-12

Train and Apply Multilayer Shallow Neural Networks . . . . . . 5-17

Analyze Shallow Neural Network Performance After

Limitations and Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-30

Dynamic Neural Networks

How Dynamic Neural Networks Work . . . . . . . . . . . . . . . . . . . . 6-3

Design Time Series Time-Delay Neural Networks . . . . . . . . . 6-14

Design Time Series Distributed Delay Neural Networks . . . . 6-20

Design Time Series NARX Feedback Neural Networks . . . . . 6-23

Design Layer-Recurrent Neural Networks . . . . . . . . . . . . . . . 6-31

Create Reference Model Controller with MATLAB Script . . . 6-34

Multiple Sequences with Dynamic Neural Networks . . . . . . . 6-41

Train Neural Networks with Error Weights . . . . . . . . . . . . . . . 6-44

Normalize Errors of Multiple Outputs . . . . . . . . . . . . . . . . . . . 6-47

Multistep Neural Network Prediction . . . . . . . . . . . . . . . . . . . 6-52

Design Neural Network Predictive Controller in Simulink . . . 7-4

Design NARMA-L2 Neural Controller in Simulink . . . . . . . . . 7-14