Artificial Neural Networks (ANN) : 1-Introduction

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Artificial neural networks (ANN)

1-Introduction

An ANN models the relationship between a set of input signals and output signals

It uses a network of artificial neurons also referred to as Nodes to solve learning problems.

They are versatile learners since they can be used for nearly any learning task: classification, numeric
prediction, and unsupervised pattern recognition.

2-structure :

A directed network diagram defines the relationship between the input (x vars) and output signals (y
vars). Each dendrite (capteur) is weighted (w values) according to its importance, the inputs are grouped
and then processed throughout an activation function f

A typical artificial neuron with n input dendrites can be represented by:

Each neural network is defined as follows:

 Activation function: what processes the input and transforms it into an output.
 Network topology (architecture) describing the number of neurons and the number of layers.
 Training algorithm which sets the weights

3-Activation Functions:

It transforms input signals into output signals.

This process sums up the total input signals and determines whether it meets the firing threshold, if so,
the neuron passes the signal, else it does not. This is known as the threshold activation function: an
output is reached only once a specified input threshold is attained.
Unit step activation function

The threshold is rarely used in ANN since activation functions are chosen based on their capabilities to
demonstrate desirable mathematical characteristics and model the relationships.

The most common 1 is the sigmoid activation function, where the output is no longer binary byt ranges
from 0 to 1.

It is common thanks to being differentiable which will be used in determining the optimal weights.

Of course, all the inputs must be standardized and normalized to avoid the high and low ends and
squeezing the input values to small ranges. This also helps the model work faster and even train faster.

4-Network Topology:

The capacity of learning is related to its topology (patterns and structures)

There 3 main characteristics:

 The number of layers


 Whether the info can travel backward or not.
 Number of nodes within each layer.

Topology helps understand the complexity of tasks. Larger and more complex networks identify more
subtle patterns and complex decision boundaries.

a) The number of layers:


Distinguishing the artificial networks based in their position in the networks.
Input nodes receive unprocessed signals directly from the input data which will be processed
and then received by the output nodes, which will use its activation function to generate a
final prediction.
Input and output nodes are arranged in layers which we call a single layer network that ca be
used for basic pattern classification that are linearly separable.
A multilayer network adds 1 or more hidden layers that process signals from input nodes prior
to the output node (deep learning)

b) The direction of information travel:

If it is from input to output, continuously in 1 direction, we call it a feedforward network.

A recurrent network (feedback network) allows signal to travel in both directions using loops.

c) The number of nodes in each layer:


It is predetermined by the number of features in the input data, similarly, the number of output
nodes is determined by the number of outcomes to be modeled. The number of nodes in the
hidden layers is left for the user to set.

The appropriate number depends also on the amount of training data, noisy data, and the
complexity of the learning task.

5-Training NN with backpropagation.

The topology itself does not learn anything, as input data is processed, connections (weights) are either
strengthened or weakened.

Backpropagation iterates through many cycles of 2 processes. Each iteration is called an epoch. Weights
are set randomly since the model does not have prior knowledge. As a result, it keeps cycling through
processes until a stopping criterion is reached.

The 2 processes of an epoch contain:

1. A forward phase in which neurons are activated in sequence from the input layer to the
output layer, applying each neuron's weights and activation function upon reaching an
output signal.
2. A backward phase in which the output signal is compared with the target value in the
training data. The difference is an error that will be propagated backward to modify
connection weights.

The network uses the information sent backward to reduce the total error. So how much weight should
be changed?

We use the gradient descent technique. The backpropagation uses the derivative of the activation
function to identify the gradient in the direction of the oncoming weight. The gradient suggests how
steeply the error will be reduced or weight for a change in the weight. The algorithm will attempt to
change the weights that lead to the greatest reduction in error by the learning rate.

The greater this rate is the faster the algo will attempt to descend the gradients which will reduce training
time.

Fine-tuning: making modifications but keeping the same architecture.

Dimension: num of neurons in the hidden layer.

6- Batch Training:

 Divide the data into n boxes.


 Train the model on the 1st batch
 Use the weights resulting from the previous step on the 2nd box and so on, on the n boxes.
Disadvantages of ANN
It is a black box method, requires a lot of data, and may be subject to overfitting.
7- word embedding:
By using libraries (CBOW, GLOOVE…°

For CBOW (Continuous bag of words), we use RELUE as the activation function.

Example:

to eat

sandwich

there is also SKIPGRAM that uses the target word and estimate its surroundings.

You might also like