Activation Function: Deep Neural Networks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Activation function

Deep Neural Networks


Deep Neural Networks

Deep neural network is feedforward network.

More than one nonlinear hidden layers.

There is no loop in network.


General mapping process from input x to
output y(x)

y(x) = F(x)
Deep refers to number of
hidden layers.
Biological neuron & mathematical neuron
Activation Functions

The activation function of a node defines the output of that node given an
input or set of inputs.
In Biologically inspired the activation function is usually an abstraction
representing the rate of action potential firing in the cell.
These functions should be nonlinear to encode complex patterns of the data.
The activation functions used in deep neural network are multi-state
activation,Sigmoid, Tanh,ReLU.
Multi-state activation functions[1]

The multi-state functions do extra classification based on the 2-state


Logistic function.

The MSAFs reveal that these activation functions have potentials for altering
the parameter distribution of the DNN models, improving model
performances and reducing model sizes.

A Logistic function is frequently used as an activation function in DNNs,


which is monotonically increasing and nonlinear with a definition domain (-
∞,+ ∞) and a range (0,1), and its expression is
contd...
Adding a series of Logistic functions together, the function becomes a multi-
logistic function.
Different activation function in DNN

❖Identity Function
❖Step Function
❖Logistic Function(Sigmoid Function)
❖TanH
❖Arch Tan
❖Rectified Linear Unit(ReLU’s)
❖Soft Plus
❖Leaky Rectified Linear Unit(LReLU’s)
❖Parameteric rectified linear unit (PReLU)
❖Randomized leaky rectified linear unit (RReLU)
❖Exponential linear unit (ELU)
❖S-shaped rectified linear activation unit (SReLU)
❖Adaptive piecewise linear (APL)
❖SoftExponential
➢Identity Function

f(x)=x
Derivative of f(x)
f ‘(x)=1
Range(-∞ , ∞)
It is called Linear activation Function.
➢Binary Step Function:

f(x) = 0 for x < 0


1 for x ≥ 0

Derivative of f(x)
0 for x ≠ 0
f ‘(x) = ? for x ≥ 0
Range{0 , 1}
➢Sigmoidal Function:

Derivative of

Range(0 , 1)
It normalizes an input real value into range between 0 and
1
➢TanH Function:

f(x) = tanh(x) =

Derivative of

Range(-1 , 1)
The tanh(x) function is a rescaled version of the sigmoid,
and its output range is [−1 , 1] instead of [0 , 1].
➢Rectified Linear Unit:

f(x) = 0 for x < 0


x for x ≥ 0

Derivative of f(x)
0 for x < 0
f ‘(x) = 1 for x ≥ 0
Range{0 , ∞}
ReLUs improve neural networks is by speeding up training. The gradient
➢Soft Plus:

Derivative of f(x)

Range(0,∞)
The softplus function can be approximated by max
function (or hard max) i.e max(0,x+N(0,1))
➢Exponential Linear Unit:

Derivative of f(x)

Range(-α , ∞)
Exponential linear unit (ELU) which speeds up learning in deep neural
networks and leads to higher classification accuracies.
Application
Deep Neural Network Activation Functions
Activation Function

1. Logistic Function
2. SoftMax Function
3. Rectifier Linear Unit(ReLU)
Logistic Function

Where,
e –> natural logarithm base (Euler’s Number)
X0 –> x – value of sigmoid’s midpoint ( -∞ to +∞)
L – > Curve’s maximum value
k – > steepness of the curve.
Logistic Function Curve
Application
• Various fields of where LOGISTIC function can be used
1. Biomathematics
2. Chemistry
3. Economics
4. GeoScience
5. Probability
6. Sciology
7. linguistic
8. Statics
9.Ecology
10. Medicine
Ecology
• Population growth
1. Logistic function Equation is common
model for Population Growth
2. Rate of Reproduction is directly
proportional to Existing Population and
amount of available resources
3. Equation

where,
P-> Population Size
r -> growth rate
t -> time
Medicine

Logistic differential equation is used to


model the growth of Tumors
ReLU
Rectifier Linear Unit
ReLU Function

• Where,
x -> input to a neuron
- This function also Ramp function
- Analogous to half-wave rectifier
Variants of ReLU
• Noisy ReLU

- used in restricted Boltzmann machines for


computer vision
• Leaky ReLU

- It allows a small, non-zero Gradient when the unit


is not ACTIVE
• Exponential Linear Unit (ELU)

where, a is a hyper parameter to be tuned and a>=0 is a


Application
• Computer Vision
– Object Detection
– Image Classification

• Speech Recognition
SoftMax Function
.
SoftMax Function

• It is generalization of Logistic Function


• It squashes a K- dimensional arbitrary reak value
to a K-dimensional vector of real values
• Range : (0,1)
• In probability theory, the output of it is used for
categorical distribution
Application
• Multinomial Logistic Regression
• Multi Class linear discriminant Analysis
• Naïve Bayes Classifier
• Artificial Neural Network
• Example:
Input = [ 1, 2, 3, 4, 1, 2, 3 ]
Softmax = [0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175 ]
SoftMax Normalisation

• SoftMax Normalization is a way of reducing


the influence of extreme values or outliers in
the data without removing them from the
dataset
Performance and analysis

Achieving Low error rates.

Enhance prediction performance.

Different Mathematical activation


Instead of having a conventional DNN
With neuron that use single activation
function.
Performance and analysis of DNN [2]

Data set used Number of data used Combination of Activation Function used

Dow Jones Industrial 206 training data and 34


Average testing data

Fitness function (mean squared errors (MSEs))


used:
oij is the predicted output
and tij is the correct output of
the j-th output neuron for the
i-th input data for K data and
L output neurons.
DNN with one activation function

DNN using the same activation

function(B,U or H).
DNN with two activation function

● DNN using the activation function


combination(B-U,B-H,U-B,U-H,H-
U,H-B).

● All the neurons in the odd layers


(1st, 3rd, etc.) use the bipolar
sigmoid function

● all the neurons in the even layers


(2nd, 4th, etc.) use the unipolar
sigmoid function
DNN with three activation function

● DNN using the activation (B-H-U,B-U-H,H-B-


U,H-U-B,U-B-H,U-H-B)
● “B-H-U” in Table VII means:
● all the neurons in the 1st, 4th, ... layers use
B,
● all the neurons in the 2nd, 5th, ... layers use
H, and
● all the neurons in the 3rd, 6th, ... layers use
U.
Conclusion

Among 15 DNN, the DNN with U-B-H is the best with the minimum average
testing MSE (0.0351) for Dow Jones Industrial Average data.

Thus DNN using different activation functions can perform better than one
using a single function.
Trainable Activation Functions [3]

Data set used Number of data used Activation Function used

MNIST handwritten digit 60,000 training images


database and 10,000 test images
in this database

Softmax is used to obtain the probability of class s


for input feature vector Xt.

L is the number of hidden


layers and N(L) is the number
of output units
Taylor series of activation functions

The Taylor series of a function f(x) that is indefinitely differentiable at a number


a is power series as:

The nonlinear activation function of the ith unit in the lth layer, σi power(l) (x), can be represented in
the power series as follows:
N is the approximation degree and a power(l)i,l are
coefficients to be retrained
Proposed activation function
Retrain activation function

The coefficient A can be retrained using an error back propagation framework,


for the same cost C as follows.
Error rate(activation
function)
● Model-level sharing where all units in the
model share the same coefficients.

● Layer-level sharing, for which all units in


the same layer share the same
coefficients.
References

[1] A Combination of Multi-state Activation Functions,Mean-normalisation and singular value


Decomposition for Learning Deep Neural Networks.

[2] Genetic Deep Neural Networks Using Different Activation Functions for Financial Data
Mining.(©2015 IEEE Luna M. Zhang Soft Tech Consulting, Inc. Chantilly, USA).

[3] Deep Neural Network Using Trainable Activation Functions.(©2016 IEEE Hoon Chung, Sung Joo
Lee and Jeon Gue Park Electronics and Telecommunications Research Institute, Daejeon, Korea)
Thank you!

You might also like