Activation Function: Deep Neural Networks

Activation function
Deep Neural Networks

Deep Neural Networks
Deep neural network is feedforward network.
More than one nonlinear hidden layers.
There is no loop in network.

General mapping process from input x to
output y(x)
y(x) = F(x)
Deep refers to number of
hidden layers.
Biological neuron & mathematical neuron
Activation Functions
The activation function of a node defines the output of that node given an
input or set of inputs.
In Biologically inspired the activation function is usually an abstraction
representing the rate of action potential firing in the cell.
These functions should be nonlinear to encode complex patterns of the data.
The activation functions used in deep neural network are multi-state
activation,Sigmoid, Tanh,ReLU.
Multi-state activation functions[1]
The multi-state functions do extra classification based on the 2-state

Logistic function.
The MSAFs reveal that these activation functions have potentials for altering
the parameter distribution of the DNN models, improving model
performances and reducing model sizes.
A Logistic function is frequently used as an activation function in DNNs,

which is monotonically increasing and nonlinear with a definition domain (-
∞,+ ∞) and a range (0,1), and its expression is
contd...
Adding a series of Logistic functions together, the function becomes a multi-
logistic function.
Different activation function in DNN
❖Identity Function
❖Step Function
❖Logistic Function(Sigmoid Function)
❖TanH
❖Arch Tan
❖Rectified Linear Unit(ReLU’s)
❖Soft Plus
❖Leaky Rectified Linear Unit(LReLU’s)
❖Parameteric rectified linear unit (PReLU)
❖Randomized leaky rectified linear unit (RReLU)
❖Exponential linear unit (ELU)
❖S-shaped rectified linear activation unit (SReLU)
❖Adaptive piecewise linear (APL)
❖SoftExponential
➢Identity Function
f(x)=x
Derivative of f(x)
f ‘(x)=1
Range(-∞ , ∞)
It is called Linear activation Function.
➢Binary Step Function:
f(x) = 0 for x < 0

1 for x ≥ 0
Derivative of f(x)
0 for x ≠ 0
f ‘(x) = ? for x ≥ 0
Range{0 , 1}
➢Sigmoidal Function:
Derivative of
Range(0 , 1)
It normalizes an input real value into range between 0 and
1
➢TanH Function:
f(x) = tanh(x) =
Derivative of
Range(-1 , 1)
The tanh(x) function is a rescaled version of the sigmoid,
and its output range is [−1 , 1] instead of [0 , 1].
➢Rectified Linear Unit:
f(x) = 0 for x < 0

x for x ≥ 0
Derivative of f(x)
0 for x < 0
f ‘(x) = 1 for x ≥ 0
Range{0 , ∞}
ReLUs improve neural networks is by speeding up training. The gradient
➢Soft Plus:
Derivative of f(x)
Range(0,∞)
The softplus function can be approximated by max
function (or hard max) i.e max(0,x+N(0,1))
➢Exponential Linear Unit:
Derivative of f(x)
Range(-α , ∞)
Exponential linear unit (ELU) which speeds up learning in deep neural
networks and leads to higher classification accuracies.
Application
Deep Neural Network Activation Functions
Activation Function
1. Logistic Function
2. SoftMax Function
3. Rectifier Linear Unit(ReLU)
Logistic Function
Where,
e –> natural logarithm base (Euler’s Number)
X0 –> x – value of sigmoid’s midpoint ( -∞ to +∞)
L – > Curve’s maximum value
k – > steepness of the curve.
Logistic Function Curve
Application
• Various fields of where LOGISTIC function can be used
1. Biomathematics
2. Chemistry
3. Economics
4. GeoScience
5. Probability
6. Sciology
7. linguistic
8. Statics
9.Ecology
10. Medicine
Ecology
• Population growth
1. Logistic function Equation is common
model for Population Growth
2. Rate of Reproduction is directly
proportional to Existing Population and
amount of available resources
3. Equation
where,
P-> Population Size
r -> growth rate
t -> time
Medicine
Logistic differential equation is used to

model the growth of Tumors
ReLU
Rectifier Linear Unit
ReLU Function
• Where,
x -> input to a neuron
- This function also Ramp function
- Analogous to half-wave rectifier
Variants of ReLU
• Noisy ReLU
- used in restricted Boltzmann machines for

computer vision
• Leaky ReLU
- It allows a small, non-zero Gradient when the unit

is not ACTIVE
• Exponential Linear Unit (ELU)
where, a is a hyper parameter to be tuned and a>=0 is a

Application
• Computer Vision
– Object Detection
– Image Classification
• Speech Recognition
SoftMax Function
.
SoftMax Function
• It is generalization of Logistic Function

• It squashes a K- dimensional arbitrary reak value
to a K-dimensional vector of real values
• Range : (0,1)
• In probability theory, the output of it is used for
categorical distribution
Application
• Multinomial Logistic Regression
• Multi Class linear discriminant Analysis
• Naïve Bayes Classifier
• Artificial Neural Network
• Example:
Input = [ 1, 2, 3, 4, 1, 2, 3 ]
Softmax = [0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175 ]
SoftMax Normalisation
• SoftMax Normalization is a way of reducing

the influence of extreme values or outliers in
the data without removing them from the
dataset
Performance and analysis
Achieving Low error rates.
Enhance prediction performance.
Different Mathematical activation

Instead of having a conventional DNN
With neuron that use single activation
function.
Performance and analysis of DNN [2]
Data set used Number of data used Combination of Activation Function used
Dow Jones Industrial 206 training data and 34

Average testing data
Fitness function (mean squared errors (MSEs))

used:
oij is the predicted output
and tij is the correct output of
the j-th output neuron for the
i-th input data for K data and
L output neurons.
DNN with one activation function
DNN using the same activation
function(B,U or H).
DNN with two activation function
● DNN using the activation function

combination(B-U,B-H,U-B,U-H,H-
U,H-B).
● All the neurons in the odd layers

(1st, 3rd, etc.) use the bipolar
sigmoid function
● all the neurons in the even layers

(2nd, 4th, etc.) use the unipolar
sigmoid function
DNN with three activation function
● DNN using the activation (B-H-U,B-U-H,H-B-

U,H-U-B,U-B-H,U-H-B)
● “B-H-U” in Table VII means:
● all the neurons in the 1st, 4th, ... layers use
B,
● all the neurons in the 2nd, 5th, ... layers use
H, and
● all the neurons in the 3rd, 6th, ... layers use
U.
Conclusion
Among 15 DNN, the DNN with U-B-H is the best with the minimum average
testing MSE (0.0351) for Dow Jones Industrial Average data.
Thus DNN using different activation functions can perform better than one
using a single function.
Trainable Activation Functions [3]
Data set used Number of data used Activation Function used
MNIST handwritten digit 60,000 training images

database and 10,000 test images
in this database
Softmax is used to obtain the probability of class s

for input feature vector Xt.
L is the number of hidden

layers and N(L) is the number
of output units
Taylor series of activation functions
The Taylor series of a function f(x) that is indefinitely differentiable at a number

a is power series as:
The nonlinear activation function of the ith unit in the lth layer, σi power(l) (x), can be represented in
the power series as follows:
N is the approximation degree and a power(l)i,l are
coefficients to be retrained
Proposed activation function
Retrain activation function
The coefficient A can be retrained using an error back propagation framework,

for the same cost C as follows.
Error rate(activation
function)
● Model-level sharing where all units in the
model share the same coefficients.
● Layer-level sharing, for which all units in

the same layer share the same
coefficients.
References
[1] A Combination of Multi-state Activation Functions,Mean-normalisation and singular value

Decomposition for Learning Deep Neural Networks.
[2] Genetic Deep Neural Networks Using Different Activation Functions for Financial Data
Mining.(©2015 IEEE Luna M. Zhang Soft Tech Consulting, Inc. Chantilly, USA).
[3] Deep Neural Network Using Trainable Activation Functions.(©2016 IEEE Hoon Chung, Sung Joo
Lee and Jeon Gue Park Electronics and Telecommunications Research Institute, Daejeon, Korea)
Thank you!

Activation Function: Deep Neural Networks

Uploaded by

Copyright:

Available Formats

Activation Function: Deep Neural Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Activation Function: Deep Neural Networks

Uploaded by

Copyright:

Available Formats

Activation function

Deep Neural Networks

Deep neural network is feedforward network.

More than one nonlinear hidden layers.

There is no loop in network.

The multi-state functions do extra classification based on the 2-state

A Logistic function is frequently used as an activation function in DNNs,

f(x) = 0 for x < 0

f(x) = 0 for x < 0

Logistic differential equation is used to

- used in restricted Boltzmann machines for

- It allows a small, non-zero Gradient when the unit

where, a is a hyper parameter to be tuned and a>=0 is a

• It is generalization of Logistic Function

• SoftMax Normalization is a way of reducing

Achieving Low error rates.

Enhance prediction performance.

Different Mathematical activation

Dow Jones Industrial 206 training data and 34

Fitness function (mean squared errors (MSEs))

DNN using the same activation

● DNN using the activation function

● All the neurons in the odd layers

● all the neurons in the even layers

● DNN using the activation (B-H-U,B-U-H,H-B-

Data set used Number of data used Activation Function used

MNIST handwritten digit 60,000 training images

Softmax is used to obtain the probability of class s

L is the number of hidden

The Taylor series of a function f(x) that is indefinitely differentiable at a number

The coefficient A can be retrained using an error back propagation framework,

● Layer-level sharing, for which all units in

[1] A Combination of Multi-state Activation Functions,Mean-normalisation and singular value

You might also like