CNN and Genetic Algorithm
CNN and Genetic Algorithm
Optimization
1 Introduction
of filters for each convolution layer, and the type of pooling in the subsampling
layer. In [13], the authors aimed to optimize the number of filters and their sizes
for each layer using the large dataset Caltech-256. Furthermore, LEIC [15] used
the GA to improve the CNN architectures by optimizing not only the number of
kernels in each layer but also the depth simultaneously. However, the crossover
operator is discarded in the main part of LEIC. Without a crossover operator,
GAs typically require a sizable population to explore the search space. On the
other hand, evoCNN [17] developed the LEIC by using a new gene encoding
strategy and a specific crossover. Additionally, a different representation strategy
is created for efficiently initializing connection weights. Note that the number of
possibilities to obtain the network structures is exponentially growing. So it is
not practical to list all the candidates and choose the best one. For this reason,
we can define this issue as an optimization problem in the large search space and
apply genetic algorithms [1] to explore a sufficient number of architectures.
In this study, we propose a strategy for using a genetic algorithm that seeks to
explore search space and discover the optimal CNN hyper-parameters for a given
classification task. In order to do this, we suggest an encoding technique that
would encode each network structure as a fixed length. Next, we introduce many
common genetic processes, such as selection, crossover, and mutation, which
aim to improve population diversity and prevent premature convergence to local
optima. Finally, we determine the quality of each individual by its accuracy on
the MNIST dataset.
The rest of this work is organized as follows. Firstly an overview of CNN
is presented in Section 2. Next, we describe the process of designing network
architectures using the genetic algorithm in Section 3 . In Section 4, experimental
results are shown. Finally, conclusions are given in Section 5.
2 CNNs—an overview
In the domain of image recognition, CNN is the most popular and frequently
utilized algorithm. The vital benefit of CNN is its ability to detect significant
features without human intervention. A CNN structure that is frequently used
deviates from Multi-Layer Perceptron (MLP) by combining numerous locally
connected layers that seek to recognize features, while the final layers are fully
connected layers that seek to classify images. Fig 1 shows an illustration of a
CNN architecture for image classification.
Y l = K l ⊗ X l−1 (1)
Genetic Algorithm for CNN Architecture Optimization 3
With K l being the weights of filters in convolution layer l, and ⊗ denotes the
operator of the convolution, X l−1 represents the input of the previous layer l −1.
There are three factors that determine the feature map’s size, and they are
as follows:
(A) Depth: Depth refers to the number of filters employed during the convolu-
tion process.
(B) Stride: Stride measures how many pixels our filter matrix slides across the
input matrix.
(C) Zero-padding: Zero-padding is a method that enables us to keep the orig-
inal input size, we add zeros around the edges of the input matrix.
layer. This layer follows the basic method of MLP. Inside this layer, each neuron
is connected to all neurons of the next layer and the previous layer. The output
size of the last fully connected layer should match the number of classes in
the problem. Finally, the softmax activation function can be used to compute
posterior classification probabilities.
3 Proposed approach
This section presents a genetic algorithm to select the optimal CNN hyper-
parameters that allow designing an optimal CNN model on the MNIST dataset.
[10]. Firstly we introduce a mechanism for representing individuals, containing
their fixed and optimized parts. Following that, some genetic operations, such
as selection, mutation, and crossover, are developed to contribute to traversing
the search space and identifying the optimal solutions.
In the proposed framework, each chromosome is a candidate solution repre-
senting a CNN architecture. The classification accuracy is chosen as the fitness
function of a chromosome. In this case, the objective of the genetic algorithm is
to compute the optimal hyper-parameter value that gives the least amount of
error and consequently higher classification accuracy. The Algorithm 8 provides
a summary of the proposed algorithm.
Algorithm 1: General framework of the proposed algorithm
Input : Training data Dtr , Testing data Dts , Number of epochs Np ,
Max. number of generations G, Population size N , Crossover
probability Pc , Mutation probability Pm .
Output: The discovered optimal architecture of CNN with its
recognition accuracy (fitness)
1 P0 ← Randomly generates an initial population of N chromosomes and
computing their fitness;
2 g ← 0; /* initialize a generation counter. */
3 while g < G do
4 Selection: ← A binary tournament selection process for producing
a new generation on the previous generation;
5 Crossover: ← For every pair of parents’ chromosomes,
implementing crossover with probability Pc ;
6 Mutation: ← For every crossover individual making mutation with
probability Pm ;
7 Evaluation: ← Evaluate the fitness of each new individual;
8 g ←g+1 ;
Return: the best individual.
block, we have one convolution layer and one pooling layer, followed by the
second block, which contains two convolution layers and one pooling layer. Then
the genetic algorithm inputs are 2 × 3 + 2 variables to be optimized through the
GA process. These variables 2 × 3 are pairs values of filter number per layer L
and filter size K used in the same layer. The next 2 values are the code for each
pooling layer, which refers to the pooling type P . Conceptually, our proposed
encoding extends the existing encoding [8] by adding the type of pooling. The
max pooling type is represented by a number 0, and the mean pooling type is
represented by a number 1. Figure 2 illustrates an example of a chromosome.
3.3 Selection
15 else
16 F N1 , F N2 ← Uniformly generate two integers from J6..16K
/* Filters number */
17 F S1 , F S2 ← Uniformly generate two integers from J3..6K
/* Filters size */
18 Ind ← CONV1(F N1 , F S1 )
19 Ind ← CONV2(F N2 , F S2 )
20 p ← Uniformly generate an integer from J0..1K
21 if p = 0 then
22 Ind ← POOL(max)
23 else
24 Ind ← POOL(mean)
25 P0 ← P0 ∪ Ind
Return: P0
26
being chosen, whereas the worst individual will always be removed, and then,
the next population Pt+1 is comprised of these selected individuals. additionally,
the best individual is chosen, and to verify whether it has been placed into the
next population Pt+1 . If not, it will replace the worst individual in Pt+1 , in order
to determine the best architecture in each generation. Moreover, the number of
individuals N does not change, and each individual of the previous generation
may be chosen more than once,
Genetic Algorithm for CNN Architecture Optimization 7
3.4 Crossover
3.5 Mutation
3.6 Evaluation
4 Experiments
Due to its high computational resource requirements, the proposed genetic algo-
rithm cannot be directly tested on large-scale datasets like ILSVRC2012 [16]. To
find a solution, we evaluate the proposed algorithm on small datasets MNIST
[10]. The parameter settings for the proposed algorithm are shown in table 1.
Genetic Algorithm for CNN Architecture Optimization 9
Based on the results shown in Table 2, we can say that our method is more
effective for the MNIST dataset.
Gen Max Acc Min Acc Avg Acc STD Best Network Encoding
00 0.9685 0.9197 0.9573 0.01320 10-3-mean-16-6-15-3-mean
01 0.9715 0.9538 0.9629 0.00531 7-6-max-8-3-13-4-mean
02 0.9715 0.9538 0.9633 0.00477 7-6-max-8-3-13-4-mean
03 0.9715 0.9577 0.9666 0.00380 7-6-max-8-3-13-4-mean
04 0.9728 0.9660 0.9701 0.00263 7-6-mean-8-3-14-3-mean
05 0.9750 0.9728 0.9732 0.00091 9-4-mean-8-3-14-3-mean
06 0.9768 0.9728 0.9738 0.00163 10-3-mean-8-3-14-3-mean
07 0.9768 0.9718 0.9745 0.00199 10-3-mean-8-3-14-3-mean
08 0.9770 0.9768 0.9768 0.00009 10-3-mean-8-3-7-5-mean
09 0.9814 0.9727 0.9769 0.00194 11-4-mean-8-3-14-3-mean
MNIST
0.98
0.97
Classification Accuracy
0.96
0.95
0.94
0.93
best result
0.92 average result
0 2 4 6 8 10
Number of Generation
Fig. 5. Evolution of the maximal and average classification accuracy over the gener-
ations. The bars indicate the maximal and minimal accuracies in the corresponding
generation.
5 Conclusion
and crossover operations are used to navigate the search space. In addition, the
binary tournament selection is used to determine which individuals survive, and
we used an efficient way to assess fitness.
Our results demonstrate that the proposed method can discover an opti-
mum CNN architecture for the MNIST dataset. The comparison of the obtained
results with results from the literature has shown that our developments give
satisfactory results.
References
16. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z.,
Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recog-
nition challenge. International journal of computer vision 115(3), 211–252 (2015)
17. Sun, Y., Xue, B., Zhang, M., Yen, G.G.: Evolving deep convolutional neural net-
works for image classification. IEEE Transactions on Evolutionary Computation
24(2), 394–407 (2019)
18. Sun, Y., Xue, B., Zhang, M., Yen, G.G., Lv, J.: Automatically designing cnn ar-
chitectures using the genetic algorithm for image classification. IEEE transactions
on cybernetics 50(9), 3840–3854 (2020)
19. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,
Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings
of the IEEE conference on computer vision and pattern recognition. pp. 1–9 (2015)
20. Wang, B., Sun, Y., Xue, B., Zhang, M.: Evolving deep convolutional neural net-
works by variable-length particle swarm optimization for image classification. In:
2018 IEEE Congress on Evolutionary Computation (CEC). pp. 1–8. IEEE (2018)
21. Xie, L., Yuille, A.: Genetic cnn. In: Proceedings of the IEEE international confer-
ence on computer vision. pp. 1379–1388 (2017)