Heuristics For Backpropagation Algorithm

The document outlines several heuristics to enhance the performance of the Backpropagation algorithm, including using sequential learning, randomizing input presentation, and applying a momentum factor. It emphasizes the importance of using antisymmetric output functions, normalizing input patterns, careful parameter initialization, and adjusting learning rates across layers. These strategies aim to improve convergence speed and avoid local minima during training.

Uploaded by

Muhammad Zohaib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views2 pages

Heuristics For Backpropagation Algorithm

Uploaded by

Muhammad Zohaib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Heuristics to improve performance of the Backpropagation algorithm1

i) Use sequential mode of learning. This mode of learning enables weights and biases to update after
presentation of a pattern within a training epoch. As soon as we discover that a certain pattern did not
produce the desired output pattern, we change the parameters. Compare this with the batch mode where the
entire set of training instances are shown to the network and parameters are updated only after the epoch.

ii) Perturb the order of presentation of the input patterns to the network. This randomization will tend
to make the search in the parameter space stochastic thus possibly avoiding local minima. A combination
of sequential mode of learning and the randomization of the order of presentation of the input is sometimes
referred to as ‘stochastic gradient descent’ or ‘SGD’2.

iii) Use the momentum factor. Momentum factor uses a fraction of the last weight update with the current
one. This helps create ‘momentum’ effect thereby making it possible to get out of a local extrema.
Momentum is applied as follows:

𝑤𝑖𝑗 𝑡+1 = 𝑤𝑖𝑗 𝑡 + ∆𝑤𝑖𝑗 𝑡 + 𝛽(∆𝑤𝑖𝑗 𝑡−1 ) ; where ‘t’ represents previous pattern or epoch.

iv) Use an antisymmetric output function. For faster convergence of network parameters, use an
antisymmetric logistic function, aka tangent sigmoid or hyperbolic tangent. The output for such a function
is as follows:
𝜎(−𝑥) = −𝜎(𝑥)
v) Choose target values within the range of the sigmoid activation. It is important to take this step as it
will ensure that the backpropagation algorithm may not drive the parameters to infinity. For example, in
case of hyperbolic tangent output function, we can choose the desired response of the jth node in the output
layer as follows:

𝑑𝑗 = 𝑎 ± 𝜖; where a is the limiting value of the hyperbolic tangent function and can be set at 1.72 and 𝜖
at 0.72. This setting would conveniently make the value of 𝑑𝑗 =1

vi) Normalize the input patterns. This should be done such that the mean (averaged over the entire training
set) of each input variable is as close to zero or it’s smaller than its standard deviation. Moreover, the input
variables should be decorrelated and should be scaled such that their covariances are equal. This
preprocessing should help in faster parameter convergence.

vii) Parameter initialization should be done carefully. Avoid using very large or small values for
initialization of parameters. The weights and biases should be initialized from a uniform distribution with

1
Adapted from Simon Haykin’s book. Neural Networks: A Comprehensive Foundation, 2nd Ed.
2
There is a common variant called ‘mini-batch’ gradient descent, which provides a compromise between stochastic and full-
batch gradient descent.
zero mean and variance equal to the reciprocal of the number of synaptic connections ‘m’ of a neuron, i.e.,
1
𝜎𝑤 = 𝑚
√

viii) Learning rate. The learning rate ‘c’ should vary across layers and within the layer at the level of a
larger
neuron. It should be set smaller in the last layers (due to larger error gradients) and smaller in the front
layers. Within a layer, it should be inversely proportional to the square root of the number of weights.

Course Outline - ECON 443-23-24
No ratings yet
Course Outline - ECON 443-23-24
4 pages
Gradient Descent
No ratings yet
Gradient Descent
17 pages
QB Unit 3
No ratings yet
QB Unit 3
14 pages
04 Batch SGD Mini Batch Gradient Descent Algorithms
No ratings yet
04 Batch SGD Mini Batch Gradient Descent Algorithms
3 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
UNIT3
No ratings yet
UNIT3
37 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
1 Intro
No ratings yet
1 Intro
91 pages
ML Lec 09 ANN Quadratic Training
No ratings yet
ML Lec 09 ANN Quadratic Training
44 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Cours 5
No ratings yet
Cours 5
23 pages
Unit 4 - GRADIENT LEARNING
No ratings yet
Unit 4 - GRADIENT LEARNING
3 pages
DL_Exp2
No ratings yet
DL_Exp2
6 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Backpropagation, Sgmiod Neuron & Gradient Discend
No ratings yet
Backpropagation, Sgmiod Neuron & Gradient Discend
29 pages
Optimizer
No ratings yet
Optimizer
13 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
l6 - Generalized Delta Ruled
No ratings yet
l6 - Generalized Delta Ruled
16 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
CS601 - Machine Learning - Unit 2 New
No ratings yet
CS601 - Machine Learning - Unit 2 New
56 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
Week 06 - Deep Feedforward Networks - Optimization
No ratings yet
Week 06 - Deep Feedforward Networks - Optimization
83 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Stochastic Gradient Descent - Term Paper
No ratings yet
Stochastic Gradient Descent - Term Paper
8 pages
Op Tim Ization
No ratings yet
Op Tim Ization
22 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
S09 DNN Gradients Wip
No ratings yet
S09 DNN Gradients Wip
28 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Gradient Descent New
No ratings yet
Gradient Descent New
42 pages
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Cs3491-Artificial Intelligence and Machine Learning-1221091049-Unit 5 Aiml
No ratings yet
Cs3491-Artificial Intelligence and Machine Learning-1221091049-Unit 5 Aiml
38 pages
Deep Learning - Summary - Deep - Learning
No ratings yet
Deep Learning - Summary - Deep - Learning
17 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
DL UNIT II PART II (IMP) Optimization For Training Deep Model
No ratings yet
DL UNIT II PART II (IMP) Optimization For Training Deep Model
81 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Implement 03-1
No ratings yet
Implement 03-1
24 pages
UNIT2
No ratings yet
UNIT2
25 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
Notes 7sem Pec Csm701
No ratings yet
Notes 7sem Pec Csm701
23 pages
Neural Networks
No ratings yet
Neural Networks
14 pages
2,5 Stochastic Gradient Descent
No ratings yet
2,5 Stochastic Gradient Descent
11 pages
Deep Learning Unit 1
No ratings yet
Deep Learning Unit 1
35 pages
Training NNs
No ratings yet
Training NNs
34 pages
7 Optimization2 Stochastic Gradient
No ratings yet
7 Optimization2 Stochastic Gradient
114 pages
04 Optimization
No ratings yet
04 Optimization
62 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
Paper 2
No ratings yet
Paper 2
27 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Arkhangelsky SyntheticDifferenceinDifferences 2021
No ratings yet
Arkhangelsky SyntheticDifferenceinDifferences 2021
32 pages
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
No ratings yet
1 Computation Questions: STA3002: Generalized Linear Models Spring 2023
2 pages
MTH618 WK 3 Lect 1: Random Variables. Probability Distributions
No ratings yet
MTH618 WK 3 Lect 1: Random Variables. Probability Distributions
17 pages
1.5cumulative Distribution Function
No ratings yet
1.5cumulative Distribution Function
3 pages
ECON1007 PS3 2025 Solutions
No ratings yet
ECON1007 PS3 2025 Solutions
7 pages
Decision Science Assignment 3
No ratings yet
Decision Science Assignment 3
2 pages
05 Goldburd Khare Tevet PDF
No ratings yet
05 Goldburd Khare Tevet PDF
106 pages
Full Factorial
No ratings yet
Full Factorial
4 pages
Concordance
No ratings yet
Concordance
20 pages
ECOS3904 2022S2 VAR Models Handout
No ratings yet
ECOS3904 2022S2 VAR Models Handout
96 pages
ISO GUM, Uncertainty Quantification, and Philosophy of Statistics
No ratings yet
ISO GUM, Uncertainty Quantification, and Philosophy of Statistics
43 pages
Return Period of Bivariate Distributed Extreme Hydrological Events
No ratings yet
Return Period of Bivariate Distributed Extreme Hydrological Events
16 pages
Tutorial 1 STA416
No ratings yet
Tutorial 1 STA416
3 pages
STAT 339 Syllabus - 2022 - 2023
No ratings yet
STAT 339 Syllabus - 2022 - 2023
4 pages
EXAM 4 Review Fall 2010 Converted RTF With Key
No ratings yet
EXAM 4 Review Fall 2010 Converted RTF With Key
11 pages
Kernel Methods For Vine Copula Estimation: Technische Universität München Department of Mathematics
No ratings yet
Kernel Methods For Vine Copula Estimation: Technische Universität München Department of Mathematics
133 pages
Mendez-Villanueva Et Al. - 2013 - Match Play Intensity Distribution in Youth Soccer
No ratings yet
Mendez-Villanueva Et Al. - 2013 - Match Play Intensity Distribution in Youth Soccer
11 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
10 pages
MBM - Sampling Distributions Review Questions Solution
No ratings yet
MBM - Sampling Distributions Review Questions Solution
8 pages
Matthew Hong JMP
No ratings yet
Matthew Hong JMP
48 pages
Methods of Research Week 15 Assessment
No ratings yet
Methods of Research Week 15 Assessment
8 pages
Sol EQ2310 Jan 2024
No ratings yet
Sol EQ2310 Jan 2024
6 pages
An Introduction To Medical Statistics 4th Edition Complete DOCX Download
100% (20)
An Introduction To Medical Statistics 4th Edition Complete DOCX Download
14 pages
GS49521 - Individual Assignment - Data Analysis Annual VKT
No ratings yet
GS49521 - Individual Assignment - Data Analysis Annual VKT
29 pages
Anova Manova
100% (1)
Anova Manova
23 pages
Experiment 2
No ratings yet
Experiment 2
2 pages
Variance Cond-Tioning On An Event Multiple Random Variables
No ratings yet
Variance Cond-Tioning On An Event Multiple Random Variables
19 pages
Confidence Interval Estimate
No ratings yet
Confidence Interval Estimate
30 pages
Outline Data Analysis
No ratings yet
Outline Data Analysis
3 pages

Heuristics For Backpropagation Algorithm

Uploaded by

Heuristics For Backpropagation Algorithm

Uploaded by

Heuristics to improve performance of the Backpropagation algorithm1

You might also like