0% found this document useful (0 votes)
26 views

CNN and Genetic Algorithm

تحسبن نموذج cnn باستخدام الخوارزمية الجينية
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

CNN and Genetic Algorithm

تحسبن نموذج cnn باستخدام الخوارزمية الجينية
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Genetic Algorithm for CNN Architecture

Optimization

Khalid Elghazi, Hassan Ramchoun, and Tawfik Masrour


1
National School of Arts and Crafts, My Ismail University, Meknes
2
National School of Business and Management, My Ismail University, Meknes
a
kh.elghazi@edu.umi.ac.ma
b
h.ramchoun@umi.ac.ma
c
t.masrour@ensam.umi.ac.ma

Abstract. Convolutional Neural Network (CNN) is a famous type of


deep feed-forward network that has proved a significant success in the
area of computer vision. Despite this success, the choice of the optimal
architecture for a given problem remains a challenging issue. The perfor-
mance of CNN is significantly impacted by a few of its hyper-parameters.
In this paper, the Genetic Algorithm (GA) is used to explore the architec-
ture design space of convolutional neural networks. Most of the existing
studies focus only on CNN hyperparameters namely filter number and
filter size. The proposed algorithm extends existing research in this field
by including the type of pooling. The MNIST dataset was used to evalu-
ate our proposed method, and the simulations show the ability of GA to
select the optimal CNN hyper-parameters and generate an optimal CNN
architecture.

Keywords: Convolution Neural Network · Genetic Algorithm · CNN


hyper-parameters · CNN architecture optimization.

1 Introduction

Convolutional neural networks (CNNs) have provided state-of-the-art results


in a large area of applications like object classification, object detection, and
segmentation. One of the key factors influencing this success is the creation of
several CNN architectures, such as GoogLeNet [19], ResNet [6], DenseNet [7],
etc. All these architectures are in the framework of object classification. The
majority of these architectures were manually created by experts who know this
field. As a result, some researchers are interested in automating the creation
of CNN architectures. This group includes the automatically designing CNN
architectures [18], the genetic CNN [21], and Hierarchical Evolution [11], and
the efficient architecture search method [4]. However, these networks’ accuracy
is strongly linked to their structure and design. In fact, some researchers prove
that selecting suitable values for these hyperparameters [9] significantly improves
the performance of CNNs. The list of hyperparameters to optimize in the basic
cases comprises the depth of CNN (the number of layers), the number and size
2 K. Elghazi et al.

of filters for each convolution layer, and the type of pooling in the subsampling
layer. In [13], the authors aimed to optimize the number of filters and their sizes
for each layer using the large dataset Caltech-256. Furthermore, LEIC [15] used
the GA to improve the CNN architectures by optimizing not only the number of
kernels in each layer but also the depth simultaneously. However, the crossover
operator is discarded in the main part of LEIC. Without a crossover operator,
GAs typically require a sizable population to explore the search space. On the
other hand, evoCNN [17] developed the LEIC by using a new gene encoding
strategy and a specific crossover. Additionally, a different representation strategy
is created for efficiently initializing connection weights. Note that the number of
possibilities to obtain the network structures is exponentially growing. So it is
not practical to list all the candidates and choose the best one. For this reason,
we can define this issue as an optimization problem in the large search space and
apply genetic algorithms [1] to explore a sufficient number of architectures.
In this study, we propose a strategy for using a genetic algorithm that seeks to
explore search space and discover the optimal CNN hyper-parameters for a given
classification task. In order to do this, we suggest an encoding technique that
would encode each network structure as a fixed length. Next, we introduce many
common genetic processes, such as selection, crossover, and mutation, which
aim to improve population diversity and prevent premature convergence to local
optima. Finally, we determine the quality of each individual by its accuracy on
the MNIST dataset.
The rest of this work is organized as follows. Firstly an overview of CNN
is presented in Section 2. Next, we describe the process of designing network
architectures using the genetic algorithm in Section 3 . In Section 4, experimental
results are shown. Finally, conclusions are given in Section 5.

2 CNNs—an overview

In the domain of image recognition, CNN is the most popular and frequently
utilized algorithm. The vital benefit of CNN is its ability to detect significant
features without human intervention. A CNN structure that is frequently used
deviates from Multi-Layer Perceptron (MLP) by combining numerous locally
connected layers that seek to recognize features, while the final layers are fully
connected layers that seek to classify images. Fig 1 shows an illustration of a
CNN architecture for image classification.

Convolution Layers: A convolution layer is the main component of a CNN


architecture that contains a set of learnable filters. These filters are also known
as "kernels". The objective of this layer is to extract image features. At the end
of each convolution layer, we obtain a matrix known as the feature map [5] after
sliding the filter across the input, as explained in (1) :

Y l = K l ⊗ X l−1 (1)
Genetic Algorithm for CNN Architecture Optimization 3

Fig. 1. Example of CNN architecture for image classification.

With K l being the weights of filters in convolution layer l, and ⊗ denotes the
operator of the convolution, X l−1 represents the input of the previous layer l −1.
There are three factors that determine the feature map’s size, and they are
as follows:

(A) Depth: Depth refers to the number of filters employed during the convolu-
tion process.
(B) Stride: Stride measures how many pixels our filter matrix slides across the
input matrix.
(C) Zero-padding: Zero-padding is a method that enables us to keep the orig-
inal input size, we add zeros around the edges of the input matrix.

Activation Function: After the convolution is finished, nonlinear features are


extracted by applying an activation function to all values in the output of the
convolution. In the literature, we have many activation functions like the ReLU
[14], which is the maximum value between zero and the input value f (x) =
max(0, y), and the function tanh or the sigmoid function.
The output of layer l is determined by its activation function and is described
as follows:
S l = F (Y l ) (2)

Subsampling or Pooling layers: A Pooling layer is one of the fundamental


components of a CNN architecture that is also named subsampling or down-
sampling. it aims to minimize the spatial size of the feature maps and retains
the most important information. There are two most used methods of pooling:
average pooling and max pooling.

Fully Connected Layers: After the convolution and pooling processes, we


flattened the feature maps obtained in the last layer into a one-dimensional vector
using a flattening operation vector, which is the input of the fully connected
4 K. Elghazi et al.

layer. This layer follows the basic method of MLP. Inside this layer, each neuron
is connected to all neurons of the next layer and the previous layer. The output
size of the last fully connected layer should match the number of classes in
the problem. Finally, the softmax activation function can be used to compute
posterior classification probabilities.

3 Proposed approach

This section presents a genetic algorithm to select the optimal CNN hyper-
parameters that allow designing an optimal CNN model on the MNIST dataset.
[10]. Firstly we introduce a mechanism for representing individuals, containing
their fixed and optimized parts. Following that, some genetic operations, such
as selection, mutation, and crossover, are developed to contribute to traversing
the search space and identifying the optimal solutions.
In the proposed framework, each chromosome is a candidate solution repre-
senting a CNN architecture. The classification accuracy is chosen as the fitness
function of a chromosome. In this case, the objective of the genetic algorithm is
to compute the optimal hyper-parameter value that gives the least amount of
error and consequently higher classification accuracy. The Algorithm 8 provides
a summary of the proposed algorithm.
Algorithm 1: General framework of the proposed algorithm
Input : Training data Dtr , Testing data Dts , Number of epochs Np ,
Max. number of generations G, Population size N , Crossover
probability Pc , Mutation probability Pm .
Output: The discovered optimal architecture of CNN with its
recognition accuracy (fitness)
1 P0 ← Randomly generates an initial population of N chromosomes and
computing their fitness;
2 g ← 0; /* initialize a generation counter. */
3 while g < G do
4 Selection: ← A binary tournament selection process for producing
a new generation on the previous generation;
5 Crossover: ← For every pair of parents’ chromosomes,
implementing crossover with probability Pc ;
6 Mutation: ← For every crossover individual making mutation with
probability Pm ;
7 Evaluation: ← Evaluate the fitness of each new individual;
8 g ←g+1 ;
Return: the best individual.

3.1 Chromosome Encoding

In this work, each chromosome represents a candidate solution to the problem.


Firstly, we fixed the depth of our CNN architecture on two blocks. In the first
Genetic Algorithm for CNN Architecture Optimization 5

block, we have one convolution layer and one pooling layer, followed by the
second block, which contains two convolution layers and one pooling layer. Then
the genetic algorithm inputs are 2 × 3 + 2 variables to be optimized through the
GA process. These variables 2 × 3 are pairs values of filter number per layer L
and filter size K used in the same layer. The next 2 values are the code for each
pooling layer, which refers to the pooling type P . Conceptually, our proposed
encoding extends the existing encoding [8] by adding the type of pooling. The
max pooling type is represented by a number 0, and the mean pooling type is
represented by a number 1. Figure 2 illustrates an example of a chromosome.

Fig. 2. Example of chromosome encoding.

3.2 Population Initialization

As described in Section 2 a CNN is made up of convolution layers, pooling lay-


ers, and fully connected layers. Still, the fully connected layers are dropped in
our suggested encoding method. As a result, only the convolution and pooling
layers are used to build a CNN in the proposed encoding. As previously said, the
parameters of a convolutional layer are the number of filters, and their respec-
tive sizes, the stride size, and the convolutional operation type. In the proposed
approach, we use the same settings for the stride sizes and convolutional oper-
ations, respectively. The stride sizes, in particular, are set to 1 × 1, moreover,
only one convolutional operation is employed. Then, the parameters encoded for
a convolution layer are the number and the sizes of the filters. Additionally, the
pooling layers used in the proposed encoding strategy will be 2 × 2 for the kernel
sizes and 1 × 1 for the spatial stride. To this end, the parameter that will be
optimized for a pooling layer is only the pooling type.

3.3 Selection

We use a selection mechanism at the start of each generation creation. A fitness


function is given to the n−th individual Pt−1,n , prior to the t−th generation. This
fitness function is defined as the accuracy obtained in the previous generation
or initialization.
In our case, we use binary tournament selection to identify which individuals
are still alive, this implies that the best individual has the greatest chance of
6 K. Elghazi et al.

Algorithm 2: Population Initialization


Input : The population size N
Output: The initialized population P0 (Architectures)
1 P0 ← ∅ ;
2 while |P0 | < N do
3 L←2
4 Ind ← Create a list containing L blocks
5 foreach block in Ind do
6 if block.type = 1 then
7 F N ← Uniformly generate an integer from J6..16K /* Filters
number */
8 F S ← Uniformly generate an integer from J3..6K /* Filters size
*/
9 Ind ← CONV1(F N, F S)
10 p ← Uniformly generate an integer from J0..1K
11 if p = 0 then
12 Ind ← POOL(max) /* Randomly choose max or mean */
13 else
14 Ind ← POOL(mean)

15 else
16 F N1 , F N2 ← Uniformly generate two integers from J6..16K
/* Filters number */
17 F S1 , F S2 ← Uniformly generate two integers from J3..6K
/* Filters size */
18 Ind ← CONV1(F N1 , F S1 )
19 Ind ← CONV2(F N2 , F S2 )
20 p ← Uniformly generate an integer from J0..1K
21 if p = 0 then
22 Ind ← POOL(max)
23 else
24 Ind ← POOL(mean)

25 P0 ← P0 ∪ Ind
Return: P0
26

being chosen, whereas the worst individual will always be removed, and then,
the next population Pt+1 is comprised of these selected individuals. additionally,
the best individual is chosen, and to verify whether it has been placed into the
next population Pt+1 . If not, it will replace the worst individual in Pt+1 , in order
to determine the best architecture in each generation. Moreover, the number of
individuals N does not change, and each individual of the previous generation
may be chosen more than once,
Genetic Algorithm for CNN Architecture Optimization 7

3.4 Crossover

The crossover operation involves switching two individuals at once. It is a funda-


mental GA process that creates new offspring with some of each parent’s genetic
structure. In the proposed approach the crossover is closest to the [18] crossover.
Each parent is divided into two parts at random, and the two parts from the
two parents are switched to create two children. Figure 3 shows two examples of
the crossover.

Fig. 3. Examples of the two kinds of crossover.

3.5 Mutation

To increase the population’s diversity (having various network architectures)


and ability to avoid local maxima, we apply the Mutation method to offspring
created by the crossover mechanism.
In the proposed algorithms, our mutation operation is a one-step process:
Changing the parameter values of the chromosome gene at the selected position
by a random number chosen from its corresponding range, as shown in figure 4.
8 K. Elghazi et al.

Fig. 4. Examples of mutation.

3.6 Evaluation

Algorithm 3: Fitness Evaluation


Input : The individual (chromosome) ind, the numbers of epochs Np ,
the training data Dtr , the testing data (fitness evaluation
data) Dts .
Output: The fitness and its corresponding individual.
1 CN N ← Using the information encoded in ind, create a CNN with a
classifier;
2 f its ← 0;
3 t ← 0;
4 while t < Np do
5 CN N ← Train the CN N on Dtr ;
6 t ← t + 1;
7 end
8 f its ← Calculate the classification accuracy on Dts ;
Return: ind, f its .
Algorithm 8 provides details for assessing individual fitness. After decoding
CNN from the chromosome, we add a classifier according to the image classifica-
tion dataset given. Afterward, a softmax classifier is added [2], and the particular
image dataset given is used to determine the specific number of classes. Each
output of the convolution is followed by a rectifier activation function. Moreover,
the Stochastic Gradient Descent (SGD) algorithm [3] is used to train CNN using
the training data Dtr . When the training phase is over, the individual’s fitness
is set as the classification accuracy of the data set Dts .

4 Experiments

Due to its high computational resource requirements, the proposed genetic algo-
rithm cannot be directly tested on large-scale datasets like ILSVRC2012 [16]. To
find a solution, we evaluate the proposed algorithm on small datasets MNIST
[10]. The parameter settings for the proposed algorithm are shown in table 1.
Genetic Algorithm for CNN Architecture Optimization 9

Table 1. Parameters settings

Categories Parameters Value ranges


Filter number 6,7,8...16
Search space Filter size 3,4,5,6
Pooling max,mean
Epochs 2
SGD Batch size 128
Learning rate 0.01
Population size 10
Genetic process Max generation number 10
Crossover probability 0.7
Mutation probability 0.3

Table 3 shows a summary of the results obtained by our proposed approach.


We observe that all the important statistics about classification accuracy grow
from generation to generation. Additionally, we provide the best network struc-
tures for each generation in the same table 3. Fig. 5 illustrates the evolutionary
trajectory of the proposed method on the MNIST dataset. We also link the max-
imal and average classification accuracy for each generation using a dashed line
and a solid line, respectively.
In order to prove the effectiveness of the proposed algorithm and compare
it with other methods, we adopted the same procedure used in[12]. We increase
the number of epochs to 50 and use a batch size of 64 to retrain the best model.
Furthermore, we add the fully connected part of LeNet-5 [10], and keep the
same hyper-parameters related to the back-propagation.
We propose to compare our results to five different methods, the first ones are
those that were created by Lecun in 1998, named LeNet-1, LeNet-4, and LeNet-
5. These CNN architectures are dedicated to the problem of digit recognition.
The last ones are evoCNN [17] and IPPSO [20], which are population-based
algorithms. These algorithms are the most similar to our proposed algorithm.

Table 2. Comparison with existing models on MNIST

Model Testing error %


LeNet-1 [10] 1.7
LeNet-4 [10] 1.1
LeNet-5 [10] 0.95
evoCNN [17] 1.18
IPPSO [20] 1.13
Our Approach 0.73
10 K. Elghazi et al.

Based on the results shown in Table 2, we can say that our method is more
effective for the MNIST dataset.

Table 3. Classification accuracy on the MNIST testing set

Gen Max Acc Min Acc Avg Acc STD Best Network Encoding
00 0.9685 0.9197 0.9573 0.01320 10-3-mean-16-6-15-3-mean
01 0.9715 0.9538 0.9629 0.00531 7-6-max-8-3-13-4-mean
02 0.9715 0.9538 0.9633 0.00477 7-6-max-8-3-13-4-mean
03 0.9715 0.9577 0.9666 0.00380 7-6-max-8-3-13-4-mean
04 0.9728 0.9660 0.9701 0.00263 7-6-mean-8-3-14-3-mean
05 0.9750 0.9728 0.9732 0.00091 9-4-mean-8-3-14-3-mean
06 0.9768 0.9728 0.9738 0.00163 10-3-mean-8-3-14-3-mean
07 0.9768 0.9718 0.9745 0.00199 10-3-mean-8-3-14-3-mean
08 0.9770 0.9768 0.9768 0.00009 10-3-mean-8-3-7-5-mean
09 0.9814 0.9727 0.9769 0.00194 11-4-mean-8-3-14-3-mean

MNIST
0.98
0.97
Classification Accuracy

0.96
0.95
0.94
0.93
best result
0.92 average result
0 2 4 6 8 10
Number of Generation

Fig. 5. Evolution of the maximal and average classification accuracy over the gener-
ations. The bars indicate the maximal and minimal accuracies in the corresponding
generation.

5 Conclusion

This study aimed to formulate the CNN hyperparameters as an optimization


problem in a large search space, and the Genetic Algorithm is used to solve
it. Firstly, we suggest a way to encode each network structure, then mutation
Genetic Algorithm for CNN Architecture Optimization 11

and crossover operations are used to navigate the search space. In addition, the
binary tournament selection is used to determine which individuals survive, and
we used an efficient way to assess fitness.
Our results demonstrate that the proposed method can discover an opti-
mum CNN architecture for the MNIST dataset. The comparison of the obtained
results with results from the literature has shown that our developments give
satisfactory results.

References

1. Beheshti, Z., Shamsuddin, S.M.H.: A review of population-based meta-heuristic


algorithms. Int. J. Adv. Soft Comput. Appl 5(1), 1–35 (2013)
2. Bishop, C.M., Nasrabadi, N.M.: Pattern recognition and machine learning, vol. 4.
Springer (2006)
3. Bottou, L.: Stochastic gradient descent tricks. In: Neural networks: Tricks of the
trade, pp. 421–436. Springer (2012)
4. Cai, H., Chen, T., Zhang, W., Yu, Y., Wang, J.: Efficient architecture search by
network transformation. In: Proceedings of the AAAI Conference on Artificial In-
telligence. vol. 32 (2018)
5. Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning. arXiv
preprint arXiv:1603.07285 (2016)
6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition.
pp. 770–778 (2016)
7. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected
convolutional networks. In: Proceedings of the IEEE conference on computer vision
and pattern recognition. pp. 4700–4708 (2017)
8. Johnson, F., Valderrama, A., Valle, C., Crawford, B., Soto, R., Ñanculef, R.: Au-
tomating configuration of convolutional neural network hyperparameters using ge-
netic algorithm. IEEE Access 8, 156139–156152 (2020)
9. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con-
volutional neural networks. Communications of the ACM 60(6), 84–90 (2017)
10. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to
document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
11. Liu, H., Simonyan, K., Vinyals, O., Fernando, C., Kavukcuoglu, K.: Hierarchical
representations for efficient architecture search. arXiv preprint arXiv:1711.00436
(2017)
12. Liu, H., Simonyan, K., Yang, Y.: Darts: Differentiable architecture search. arXiv
preprint arXiv:1806.09055 (2018)
13. Loussaief, S., Abdelkrim, A.: Convolutional neural network hyper-parameters opti-
mization based on genetic algorithms. International Journal of Advanced Computer
Science and Applications 9(10) (2018)
14. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann ma-
chines. In: Icml (2010)
15. Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y.L., Tan, J., Le, Q.V., Ku-
rakin, A.: Large-scale evolution of image classifiers. In: International Conference
on Machine Learning. pp. 2902–2911. PMLR (2017)
12 K. Elghazi et al.

16. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z.,
Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recog-
nition challenge. International journal of computer vision 115(3), 211–252 (2015)
17. Sun, Y., Xue, B., Zhang, M., Yen, G.G.: Evolving deep convolutional neural net-
works for image classification. IEEE Transactions on Evolutionary Computation
24(2), 394–407 (2019)
18. Sun, Y., Xue, B., Zhang, M., Yen, G.G., Lv, J.: Automatically designing cnn ar-
chitectures using the genetic algorithm for image classification. IEEE transactions
on cybernetics 50(9), 3840–3854 (2020)
19. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,
Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings
of the IEEE conference on computer vision and pattern recognition. pp. 1–9 (2015)
20. Wang, B., Sun, Y., Xue, B., Zhang, M.: Evolving deep convolutional neural net-
works by variable-length particle swarm optimization for image classification. In:
2018 IEEE Congress on Evolutionary Computation (CEC). pp. 1–8. IEEE (2018)
21. Xie, L., Yuille, A.: Genetic cnn. In: Proceedings of the IEEE international confer-
ence on computer vision. pp. 1379–1388 (2017)

You might also like