Neural Networks: Aroob Amjad Farrukh

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Neural Networks

Aroob Amjad Farrukh


Intelligent Systems, Higher School of
Technology and Experimental Sciences,
Jaume I University
Castellón, Spain
Abstract— Neural Networks or Artificial Neural Networks are and bias [1]. The function of the entire neural
machine learning algorithms that evolved from the human brain
simulation. These are great for many applications thanks to its fast network is simply the computation of the outputs
learning and training capacity to solve classification problems or of all the neurons [2].
any similar exercises. It is capable of solving not only linear but In the Artificial Neural Networks (ANNs), the
also nonlinear problems.
perceptron is an advantageous model of a
I. INTRODUCTION biological neuron, being the first algorithm with
Neural networks, also known as Simulated the ability to learn in supervised machine learning.
Neural Networks (SNNs) or Artificial Neural The main reason behind the designing of the
Networks (ANNs) are a subset of AI and are at the perceptron model was to incorporate visual inputs
core of profound learning calculations. Their name and organizing objects. The last one, also known as
and structure are motivated by the human brain, the classification problem, is one of the important
impersonating the way that biological neurons sign elements in machine learning.
to each other. The perceptron model is divided into two
Artificial Neural Networks (ANNs) are composed sections: the single-layered and the multi-layered
of many node layers, containing one or more model. A single-layer perceptron model includes a
hidden layers, an input layer and an output layer. feed-forward network that relies on a bias transfer
Each neuron connects to another with an function in its model. In basic words, various input
associated weight and threshold. The activation of values feed up to the perceptron model. The
the node is done when the value is above the model executes with the input values and if these
threshold one. values match the necessary output then the model
Neural networks depend on preparing is found out to be fulfilled.
information to learn and improve their exactness The multi-layer perceptrons, also named as
over the long run. However, these become Backpropagation algorithms, have a similar
powerful tools for Artificial Intelligence and structure to the previous one. It executes in two
Computer Science once the learning algorithms stages; the forward stage and the backward stage.
have set great accuracy. [1] In the first stage, activation functions are
There are different types of learning algorithms originated from the input layer to the output layer,
as properties of neural networks. These are and in the second stage, the error between the
supervised learning, where the process happens obtained value and requested given value is
under the supervision of an instructor; originated backward in the output layer for
unsupervised learning, where the learning process adjusting weights and bias values. In basic terms,
is done independently; semi-supervised learning, a diverse perceptrons a network of various can be
mix of both types; [2] and reinforcement learning, treated as artificial neurons overhead different
where a sequence of decisions are made. [3] layers, the activation function is non-linear such as
The way the neural networks work is through the a sigmoid function, TanH, ReLU activation
weight a node has been associated with. Each node Functions, and so on are conveyed for execution.
is represented by a linear regression model, with [4].
input data, weight, threshold or bias and an output
or function, which is the sum of weights of inputs
The remaining sections of the paper are as the truth table, which is done in the step of loading
follows: Section 2 explains the work done in the the data in an array.
laboratory sessions about single and multi-layer The last exercise of the first block is an example of
Perceptrons. Section 3 discusses the experimental a different problem which uses linear separability
results obtained with the classification problems to demonstrate the fact that perceptrons not only
and the logic ( AND and OR) exercises. Section 4 is can model logical functions, but they are linear
a conclusion to the topic and the experiment. classifiers that work well for linearly separable
problems.
In this case, instead of using a simple linear array,
II. WORK WITH PERCEPTRONS IN LABORATORY SESSIONS
we use a matrix x with as many rows as dots, and
The laboratory sessions are divided into two two columns, and the vector y with as many
blocks: the first block is an introduction to elements as dots. The value of y[i] is 0 for red dots
perceptrons and the second one is about and 1 for black dots. The requirements are to have
multilayer perceptrons (MLPs). between 1 to 7 x values, 5 to 9 red dots, the values
of the vector y must be 0s and 1s and lastly, the set
A. Introduction to perceptrons of red dots must be linearly separable from the set
The perceptron is a binary classifier supervised of black dots. The next step is to plot the data to
learning algorithm. Binary classifiers are functions represent it graphically and afterwards building the
that show if an input, represented by a vector of model by creating the perceptron object. Lastly,
numbers, belongs to some specific class or not. the result is shown by training the set until the
Moreover, perceptrons are linear types of model converges. This can be done by manually
classifiers, which make predictions based on a repeating the process up until we get the desired
linear predictor function combining a set of result or by adding some modifications in the code,
weights with the feature vector [5]. such as writing a while loop that stops its execution
The first session consists of three examples: AND when the model is completely separated. The
and OR linear separation models and a condition to break the loop is when the target is
demonstration of another linear separation equal to the prediction.
problem.
The first example is about learning the AND B. Multilayer perceptrons
function with a perceptron. To start with, we are “A multilayer perceptron (MLP) is a feedforward
given the truth table of the AND function. The artificial neural network model that maps sets of
exercise consists of linear separability using a input data onto a set of appropriate outputs. An
single perceptron. MLP consists of multiple layers of nodes in a
In order to get the desired result, the complete directed graph, with each layer fully connected to
separation of the samples of each class (0s and 1s) the next one. Except for the input nodes, each node
through a boundary, is done through the following is a neuron (or processing element) with a
steps: loading the data in an array, the truth table. nonlinear activation function. MLP is a modification
The next step is to plot the data for better of the standard linear perceptron and can
representation through red (0s) and black dots distinguish data that are not linearly separable.”
(1s). Later on, building the model by creating the [5]
perceptron object and finally training the model up
until we get the complete separation. The last step The second block contains four different
is done many times until the model converges. exercises: a XOR problem, a classification example,
The second example, the OR exercise is done the handwritten digits recognition and German Traffic
same way as mentioned in the previous model Sign Recognition Benchmark.
(AND function) with the same steps but differing in
XOR problem is slightly different from the other The dataset selected for this experiment consists
two examples (AND and OR). MLP can outperform of 1797 images of digits, each one made of 8×8
the perceptron and solve this problem. pixels. This dataset is stored in an image attribute.
We start by loading the truth table in an array and Each image has a target, which represents the digit
then building the model using the backpropagation that we want to show. Nextly, we turn the data
technique, stochastic gradient descent solver. The into a matrix to flatten the image. After that, cross-
MLP object is created with one hidden layer and 5 validation takes place, where we evaluate the
neurons and a maximum of 4000 iterations. The network performance. To avoid the overfitting
rest of the arguments are set to default values [6]. issue, we hold out part of the available data as a
In the next step, we train the model using the fit test set by doing a random split into training and
function, which automatically iterates a certain test sets. Later on, we scale the large set of data
amount of times up until convergence or the through its standardization in a normal
maximum number of iterations is reached. Lastly, distribution, which is Gaussian with zero mean and
we plot the data to represent the decision unit variance. And lastly, like the other
boundary. experiments, we build the model with the same
Furthermore, we can check the convergence and arguments and train the network.
the non convergence percentages of the network The final exercise of this second block consists of
and the loss curve. The non convergence can classification with the German Traffic Sign
happen because the network can get stuck in a Recognition Benchmark. This real world problem
local minima, showing an incorrect result. has a difficulty level greater than the previous
The second example is another type of exercises because there are a number of important
classification. In this one, we are not limited to issues that need to be taken into consideration
linear separations, which happened because we such as illumination conditions, direction of sign's
were working with perceptrons, but we can work face, status of paint on signs and so on. Neural
with linearly non-separable problems. networks are used to implement this classification
A classification task is done in three stages: exercise as they are proven to be good classifiers
preprocessing, which means getting raw data, with a big success rate of solving several object
loading it, scaling it, and splitting the data into recognition problems.
training and testing sets. Afterwards, training and The most important task here is dealing with the
selecting a predictive model and finally testing data data as it consists of more than 40 classes, 50,000
set to estimate how well it performs on this unseen images, a large database and a complex training
data to estimate the generalization error [7]. set archive. In the first step, we get the dataset,
The stages of this experiment are loading the data which are directories with images and annotations
through random sample generators to build containing meta information about the image and
artificial datasets of controlled size and complexity, the class. Moreover, the image processing takes
evaluation of the network performance, scaling the place, where we have to note down important
large amount of data, which uses a utility class information about images, such as the size, the
StandardScaler to compute the mean and standard border percentage and other properties. The pixels
deviation on a training set to later reapply the of the image will be the inputs of the neural
same transformation on the testing set. Later on, network. In the processing function, the default
we build the model with one hidden layer, 5 size of the image is set to 20 x 20 pixels.
neurons and 4000 maximum iterations and train To summarize, the image processing consists of
the network. The last step is to plot the decision cropping the image, scale to a fixed resolution,
boundary as a contour plot. scale the red channel, adjusting the contrast and
Hand-written digits recognition is another normalizing the pixel values to the interval [-1,+1].
example to demonstrate how this can use scikit- The next steps are the same as the previous
learn to solve the problem [6]. exercises, build the dataset and the model, with 10
classes of 15 tracks each and train it. The result is Fig. 2 OR plot showing the blue line as a boundary between red and black
dots, linear classification.
saved in a pkl file for further analysis, which is
discussed in the next section.
The third exercise, which is about linear separable
III. EXPERIMENTAL RESULTS problems to show how it can also work well with
this and not limited to only logical exercises. In this
The main purpose of single-layer perceptron case, we can see the dataset is much bigger than
problems is to show linear separability. the previous ones and a bit more difficult to
separate the classes (red and black dots). Visually it
In the first example, the AND problem shows
may not seem they have separated but we can
the perfect boundary between red and black
confirm it by checking the variable values of target
dots (e.g. Fig. 1), representing the complete
and prediction. If these match it means it has
distinction of the samples of each class (0's and
converged, as shown in Fig. 3.
1's).

Fig. 3 Complete sample separation of classes (0s and 1s) through the blue
boundary line
Fig. 1 AND plot showing the blue line as a boundary between red and black
dots, linear classification.

Multilayer perceptrons have a bigger complexity


Fig. 2 shows the convergence of an OR problem, level. So, it takes different kinds of analysis steps to
with a similar result but differs in the number of show the results, such as curves, matrixes and
red and black dots. other calculations.
The XOR problem is solved with MLPs. The visual
representation of the solution is different from the
AND and OR problems. When the network
converges (e.g. Fig. 4) all the dots are clearly
separated in different areas. However, in some
situations the convergence doesn’t take place and
the algorithm stops, showing an abnormal
separation (e.g. Fig. 5).
Fig. 6 Decision boundary of 250 samples

Fig. 4 XOR plot showing the blue zone for red dots and brown zones for
black dots as a separation of each class.
Fig. 7 Decision boundary of 25000 samples

The next two problems build a classification


report for measuring the accuracy of the
classification through precision, recall and f1-score
metrics. In addition, the confusion matrix, the loss
curve (e.g. Fig. 9) and samples of predictions and
targets are also shown for better comprehension.
In the handwritten digits recognition problem, the
problem succeeds when the target and prediction
match and fail in the opposite case (e.g. Fig. 8). The
Fig. 5 Failed XOR plot, network stuck in a local minima. number of iterations during the training are around
1450. For better result measurements, we can take
100 different test sets extracted from the dataset,
Multilayer perceptrons have the advantage of compute the score of the same network with a
solving non-linear problems, as experimented in different test set, and store the values in a list. And
the second classification problem. finally, getting a mean and standard deviation of
The visual results are different, as shown in Fig. 6. 0.94 and 0.0056 respectively.
Moreover, the percentage of correct classification
of the test data is also calculated, which can be
around 80-90 %, and also the number of iterations,
being 178 iterations for 90%. Also, the test is also
done on a much bigger scale of dataset, 25000
samples, with 70% and 54 iterations (e.g. Fig. 7),
which is a lot lesser than working with a small
dataset.

Fig. 8 Handwriting digits results, comparing the target and the prediction
REFERENCES
[1] IBM Cloud Education (17 August 2020). What are Neural Networks?:
https://www.ibm.com/cloud/learn/neural-networks
[2] TutorialsPoint, Learn Artificial Neural Network:
https://www.tutorialspoint.com/artificial_neural_network/artificial_n
eural_network_supervised_learning.htm
[3] Błażej Osiński and Konrad Budek (5 July 2018). What is reinforcement
learning? Big Data Science: https://deepsense.ai/what-is-
reinforcement-learning-the-complete-guide/
[4] Neelam Tyagi (27 January 2020). Understanding the Perceptron Model
in a Neural Network. Medium: https://medium.com/analytics-
steps/understanding-the-perceptron-model-in-a-neural-network-
Fig. 9 Loss curve of handwritten digits experiment 2b3737ed70a2
[5] Chris Nicholson. A Beginner's Guide to Neural Networks and Deep
Learning. Pathmind: https://wiki.pathmind.com/neural-network
[6] An introduction to machine learning with scitkit-learn. Scikit-learn:
The results of the last problem are similar to the https://scikit-
previous exercise but getting a loss curve with learn.org/stable/tutorial/basic/tutorial.html#introduction
[7] 1st Edition of Python Machine Learning book (23 September 2015):
smaller values (e.g. Fig. 10). https://github.com/rasbt/python-machine-learning-book
[8] German Traffic Sign Recognition Benchmark (GTSRB):
https://benchmark.ini.rub.de/?section=gtsrb&subsection=news
[9] Rinu Gour (22 April 2019). Neural Network Algorithms:
https://towardsdatascience.com/neural-network-algorithms-learn-
how-to-train-ann-736dab9e6299

Fig. 10 Loss curve of German Traffic Sign Recognition

IV. CONCLUSIONS

Henceforth, Neural Networks are normally


hard to arrange and ease back to prepare.
However, once arranged, these are
exceptionally quick in the application. They are
designed and large planned as models to
conquer the numerical, computational, and
engineering issues. Since, there is a ton of
knowledge in mathematics, neurobiology and
computer science [9].
As seen in the previous experiments,
depending on the type of classification problem,
neural network perceptrons can be used as
single (AND and OR logic functions) or multiple
layers (XOR, digits recognition, german traffic
signals), making it less complicated to work with
large scale of data.

You might also like