Efficient Net B0

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

The popularity of using CNNs for image analysis has grown over the past few years, largely

due to
the impressive results that have been obtained. In contrast, CNNs require a large amount of training
data before they can begin to work. Using CNNs trained on millions of images has been used to solve
this problem, but it is time-consuming and requires a huge amount of data. Using a pre-trained
model avoids the need for a large amount of data and maximizes processing resources. The most
common problem is when the intended data collection is too small for it to be useful. There may be
instances in which overfitting is an issue, and data augmentation is not always enough to solve it.
This study proposes a transfer learning method that uses EfficientNetB0, an emotion recognition
system, to focus on salient face areas.

The Proposed Approach

In this paper, we suggest that the EfficientNetB0Convolutional Neural Network model be used as the
basis for transferring learning. Real-world photographs of facial expressions were used to test the
suggested approach. In this study, EfficientnetB0 was used to transfer learning. While creating a data
generator for images, there are a few things to keep in mind (Rescale by 255, split data 20 percent ,
rotation by 5, shift height and weight by 20 percent , zoom 20 percent ) Datasets were divided into
three sections after being cleaned up: training, testing, and validation. Target picture size 128x128
has been used for all three datasets. For all three datasets, batch size 64 has been used. Select seven
different label types (such as angry, disgusted, fearful, happy, neutral, sad, and surprised).Our
training set has 22968 photos, whereas the validation set contains 5741, and the test collection
contains 7178. The EfficientNetB0 and imagenet weight were then loaded using transfer learning.
Adam optimizer and categorical cross entropy loss function are used to construct the model. At 10,
50, and 100 epochs, the model has been trained. Weight should be used for testing when training is
complete. A haarcascade frontal face classifier is used to recognize facial characteristics and
determine emotional state from seven categories using OpenCV.

a) EfficientNet-B0 Model

One can think of EfficientNet as a collection of convolutional neural networks. Despite its complexity,
however, it still proves to be more efficient than most of its predecessors. In the EfficientNet model
family, there are eight models from B0 through B7, each specifying a model with increased accuracy
and more parameters There are more than one million photos in the ImageNet collection that was
used to train EfficientNet-B0. The network is able to identify over 1000 item types such as
keyboards, mice, and other animals. A broad variety of pictures are represented by more complex
feature representations in the network. The network accepts images with a resolution of 224 by 224
pixels.[1]

1) EfficientNet Architecture

 In this section, we will discuss the EfficientNet-B0 architecture in detail.

 In B0, which is a mobile-sized architecture, there are 11M trainable parameters.

 Let's have a look at table 1 first to see how the new architecture appears.

Table 1: EfficientNet-B0 network


In the architecture, 7 inverted residual blocks are used, but each is set differently. Additionally, these
blocks use squeeze & excitation blocks along with swish activation.

Swish Activation

Swish is the product of a linear and a sigmoid activation.

Swish(x) = x * sigmoid(x)

Inverted Residual Block

In MobileNet's residual block, convolution is based on depthwise separable convolution, which


employs depthwise convolution first and pointwise convolution after that. The result is a reduction
in the number of parameters that can be trained. The inverted residual block results in skip
connections linking narrow layers, while larger ones are left between skip connections.

Squeeze and Excitation Block

When generating an output feature map, CNN weights the convolutional layers equally Rather than
treating each channel equally, the squeeze and excitation (SE) block gives each channel more weight.
The SE square gives an yield of shape (1 x 1 x channels) that characterizes the weightage for each
channel, and the astounding portion is that the neural organize can learn this weightage by itself like
other parameters, which may be a enormous advantage.

EfficientNet’ MBConv Block

To begin with, the MBConv piece requires information and the moment, the MBConv block's
contentions. The final layer yields the information. Qualities such as input and yield channels,
development and press proportion, and so on are all portion of a square contention which will be
used inside an MBConv square. As portrayed within the Modified remaining square, we'll develop
our layer and broaden it (associated squares are contract and inward squares are more extensive,
here we are making layer more extensive fair by expanding the number of channels). After the
extension, we execute profundity shrewd convolution with the part estimate indicated within the
piece parameter. Amid this phase, we extricate worldwide highlights utilizing global average pooling
and crush channel numbers utilizing se proportion. Convolution is utilized to make the yield channels
said within the contention piece after we wrap up the se piece

b) Transfer Learning
Utilizing exchange learning, a demonstrate learned for one errand may be utilized for another
movement that's comparable. Multi-parameter deep learning systems are resource-intensive and
exorbitant to compute. Since of the amount of information utilized to prepare these networks, they
are not overfitted. As a result, it is common for analysts to spend a critical sum of time preparing for
a state-of-the-art demonstrate. The idea of exchange learning was born out of the thought that a
state-of-the-art model is trained employing a expansive sum of assets, and thus the focal points of
such investments should be figured it out numerous times over. You'll reuse all or parcel of your
show utilizing exchange learning, which is the finest portion! We will dodge preparing the entire
show this way. In specific, exchange learning spares time and progresses comes about. For case, a
demonstrate prepared to identify automobiles may presently be utilized to perceive trucks.

In exchange learning, a include space and its negligible likelihood dissemination are regularly alluded
to as the space. A errand has a name space and an objective work that needs to be advanced for a
certain space. An perfect source space is one that has sufficient information tests, numerous names,
and conceivably tall quality (such as a lab environment)[2]. Information from a target domain, on the
other hand, may have less tests, less names, or no names at all, making it more likely that it'll be
noisy. Exchange learning strategies utilize information from the source space to upgrade learning of
the target task.

c) Loss Function Categorical Cross Entropy:

For categorical classification, cross-entropy loss contributed by training data point 𝒊,(𝒙𝒊 , 𝒚𝒊), is
simply the "negative log-likelihood (NLL)":

since the ground truth probability is one for the correct label 𝒚𝒊 and zero for every other label.

d) Adam Optimizer (Adaptive Moment Estimation)

It is an calculation for angle plummet optimization. When managing with enormous issues with a
part of information or parameters, this approach is exceptionally viable. It needs less capacity and is
compelling. Instinctively, the "angle plunge" calculation, as well as the "RMSP" calculation, are
combined. Scientific Angle of Adam Optimizer Taking the equations utilized within the over two
strategies, we get:

𝑚𝑡 and 𝑣𝑡 are gauges of 1st minute (cruel) & 2nd minute (uncentered fluctuation) of slopes
correspondingly, from now on strategy title. As 𝑚𝑡& 𝑣𝑡 are set as vectors of 0's, analysts of
Adam identify that they are one-sided towards 0, particularly all through beginning time steps,
&particularly when rot rates are little (such that 𝛽1& 𝛽2are about 1).

Proposed Algorithm

Step 1: Load FER2013 Dataset .

Step 2: Preprocess the image by defining image data generator.

Step 3: load dataset of training set, validation set and test set using target image size 128x128 and
batch size 64 .
Step 4: Select class label into 7 categories including {0: "Angry", 1: "Disgusted", 2: "Fearful", 3:
"Happy", 4: "Neutral", 5: "Sad", 6: "Surprised"}

Step 5: Divide the images into training, testing and validation set

Step 6: Apply EfficientNetB0 model to train the network

Step 7: load the weight of ImageNet.

Step 8: Use transfer learning to select a related predictive modeling problem with an abundance of
data .

Step 9: Compile the model using Adam optimizer.

Step 10: Calculate the loss using categorical cross entropy .

Step 11: Train the model for 10,50, and 100 epochs to detecting the facial emotions.

Step 12: Test the model using OpenCV using haarcascade frontalface classifier to detect facial
features and predict the emotion form seven category.

Step 13: Obtain accurate expression detection results.

[1] Research article 'EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks' was
written by Mingxing Tan and Quoc V. Le of Google Research, Brain team.

[2] Feng Kexin, Chaspari Theodora, “A Review of Generalizable Transfer Learning in Automatic
Emotion Recognition”, Frontiers in Computer Science, vol. 2, 2020

You might also like