REgnet
REgnet
Abstract—The ResNet and its variants have achieved remark- communication is somehow ignored and some reusable in-
able successes in various computer vision tasks. Despite its formation learned from previous blocks tends to be forgotten
success in making gradient flow through building blocks, the in later blocks. To illustrate this point, we visualize the
simple shortcut connection mechanism limits the ability of re-
exploring new potentially complementary features due to the output(residual) feature maps learned by consecutive blocks in
arXiv:2101.00590v1 [eess.IV] 3 Jan 2021
additive function. To address this issue, in this paper, we propose ResNet in Fig. 1(a). It can be see that due to the summation
to introduce a regulator module as a memory mechanism to operation among blocks, the adjacent outputs 𝑂 𝑡 , 𝑂 𝑡+1 and
extract complementary features, which are further fed to the 𝑂 𝑡+2 look very similar to each other, which indicates that less
ResNet. In particular, the regulator module is composed of new information has been learned through consecutive blocks.
convolutional RNNs (e.g., Convolutional LSTMs or Convolutional
GRUs), which are shown to be good at extracting spatio-temporal A potential solution to address the above problems is to cap-
information. We named the new regulated networks as RegNet. ture the spatio-temporal dependency between building blocks
The regulator module can be easily implemented and appended while constraining the speed of parameter increasing. To this
to any ResNet architectures. We also apply the regulator mod- end, we introduce a new regulator mechanism in parallel to
ule for improving the Squeeze-and-Excitation ResNet to show the shortcuts in ResNets for controlling the necessary memory
the generalization ability of our method. Experimental results
on three image classification datasets have demonstrated the information passing to the next building block. In detail, we
promising performance of the proposed architecture compared adopt the Convolutional RNNs (“ConvRNNs") [12] as the
with the standard ResNet, SE-ResNet, and other state-of-the-art regulator to encode the spatio-temporal memory. We name
architectures. the new architecture as RNN-Regulated Residual Networks,
Index Terms—Residue Networks, Convolutional Recurrent or “RegNet" for short. As shown in Fig. 1(a), at the 𝑖 𝑡 ℎ
Neural Networks, Convolutional Neural Networks building block, a recurrent unit in the convolutional RNN
takes the feature from the current building block as the input
(denoted by 𝐼 𝑖 ), and then encodes both the input and the serial
I. I NTRODUCTION information to generate the hidden state (denoted by 𝐻 𝑖 ); the
hidden state will be concatenated with the input for reuse in the
Convolutional neural networks (CNNs) have achieved abun-
next convolution operation (leading to the output feature 𝑂 𝑖 ),
dant breakthroughs in a number of computer vision tasks [1].
and will also be transported to the next recurrent unit. To better
Since the champion achieved by AlexNet [2] at the ImageNet
understand the role of the regulator, we visualize the feature
competition in 2012, various new architectures have been
maps, as shown in Fig. 1(a). We can see that the 𝐻 𝑖 generated
proposed, including VGGNet [3], GoogLeNet [4], ResNet [5],
by ConvRNN can complement with the input features 𝐼 𝑖 . After
DenseNet [6], and recent NASNet [7].
conducting convolution on the concatenated features of 𝐻 𝑖 and
Among these deep architectures, ResNet and its vari-
𝐼 𝑖 , the proposed model gets more meaningful features with
ants [8]–[11] have obtained significant attention with out-
rich edge information 𝑂 𝑖 than ResNet does. For quantitatively
standing performances in both low-level and high-level vision
evaluating the information contained in the feature maps, we
tasks. The remarkable success of ResNets is mainly due to the
test their classification ability on test data (by adding average
shortcut connection mechanism, which makes the training of
pooling layer and the last fully connected layer to the 𝑂 𝑖 of
a deeper network possible, where gradients can directly flow
the last three blocks). As shown in Fig. 1(b), we can find that
through building blocks and the gradient vanishing problem
the new architecture can get higher prediction accuracy, which
can be avoided in some sense. However, the shortcut con-
indicates the effectiveness of the regulator from ConvRNNs.
nection mechanism makes each block focus on learning its
Thanks to the kind of parallel structure of the regulator
respective residual output, where the inner block information
module, the RNN-based regulator is easy to implement and
can be applicable to other ResNet-based structures, such as
Jing Xu and Zenglin Xu are with the School of Science and Technology,
Harbin Institute of Technology, Shenzhen, Shenzhen 510085, Guangdong, the SE-ResNet [11], Wide ResNet [8], Inception-ResNet [9],
China. ResNetXt [10], Dual Path Network(DPN) [13], and so on.
Yu Pan and Xinglin Pan are with the Department of SMILE Lab, School Without loss of generality, as another instance to demonstrate
of Computer Science and Engineering, University of Electronic Science and
Technology of China, Chengdu 610031, China. the effectiveness of the proposed regulator, we also apply the
Steven Hoi is with the School of Information Systems (SIS) Singapore ConvRNN module for improving the Squeeze-and-Excitation
Management University, Singapore ResNet (shorted as “SE-RegNet").
Zhang yi is with the Machine Intelligence Laboratory, College of Computer
Science, Sichuan University, Chengdu 610065, China For evaluation, we apply our model to the task of image
Zenglin Xu is the corresponding author (e-mail:zenglin@gmail.com) classification on three highly competitive benchmark datasets,
2
ResNet RegNet
Ht 1
t th t th It Building
Building ConvRNN
Block 91.6 92.7
Block
90 ResNet
Input O t
H t O t
RegNet
(t + 1)th (t +1) th 80
I t+1 Building
Building
test accuracy(%)
ConvRNN 71.7
Block Block
70 67.9
O t+1 H t+1 O t+1
60.2
(t + 2)th 60
Building (t + 2)th I t+2 Building 55.6
Block ConvRNN Block
O t+2 O t+2
50
Oi Oi Hi Ii H t+2
40
th th 7 8 9
O i : The output of i building block I i H i : The inputouput of ConvRNN at i building block the output of i th block
(a) (b)
Fig. 1. (a):Visualization of feature maps in the ResNet [5] and RegNet. We visualize the outputs 𝑂 𝑖 feature maps of the 𝑖 𝑡 ℎ building blocks, 𝑖 ∈ {𝑡 , 𝑡 +1, 𝑡 +2}.
In RegNets, 𝐼 𝑖 denotes the input feature maps. 𝐻 𝑖 denotes the hidden states generated by the ConvRNN at step 𝑖. By applying convolution operations over
the concatenation 𝐼 𝑖 with 𝐻 𝑖 , we can get the regulated outputs( denoted by 𝑂 𝑖 ) of the 𝑖 𝑡 ℎ building block. (b): The prediction on test data based on the
output feature maps of consecutive building blocks. During the test time, we add an average pooling layer and the last fully connected layer to the outputs of
the last three building blocks(𝑖 ∈ {7, 8, 9}) in ResNet-20 and RegNet-20 to get the classification results. It can be seen that the output of each block aided
with the memory information results in higher classification accuracy.
including CIFAR-10, CIFAR-100, and ImageNet. In com- to reuse all of the feature maps of previous layers. Obviously,
parison with the ResNet and SE-ResNet, our experimental not all feature maps need to be reused in the future layers,
results have demonstrated that the proposed architecture can and consequently the densely connected network also leads
significantly improve the classification accuracy on all the to some redundancy with extra computational costs. Recently,
datasets. We further show that the regulator can reduce the Dual Path Network [13] and Mixed link Network [23] are
required depth of ResNets while reaching the same level of the trade-offs between ResNets and DenseNets. In addition,
accuracy. some module-based architectures are proposed to improve the
performance of the original ResNet. SENet [11] proposes
II. R ELATED W ORK a lightweight module to get the channel-wise attention of
intermediate feature maps. CBAM [24] and BAM [25] design
Deep neural networks have been achieved empirical break-
modules to infer attention maps along both channel and
throughs in machine learning. However, training networks with
spatial dimensions. Despite their success, those modules try to
sufficient depths is a very tricky problem. Shortcut connection
regulate the intermediate feature maps based on the attention
has been proposed to address the difficulty in optimization
information learned by the intermediate feature themselves, so
to some extent [5], [14]. Via the shortcut, information can
the full utilization of historical spatio-temporal information of
flow across layers without attenuation. A pioneering work is
previous features still remains an open problem.
the Highway Network [14], which implements the shortcut
On the other hand, convolutional RNNs (shorted as Con-
connections by using a gating mechanism. In addition, the
vRNN), such as ConvLSTM [12] and ConvGRU [26], have
ResNet [5] explicitly requests building blocks fitting a residual
been used to capture spatio-temporal information in a num-
mapping, which is assumed to be easier for optimization.
ber of applications, such as rain removal [27], video super-
Due to the powerful capabilities in dealing with vision
resolution [28], video compression [29], video object detection
tasks of ResNets, a number of variants have been proposed,
and segemetation [30], [31]. Most of those works embed Con-
including WRN [8], Inception-ResNet [9], ResNetXt [10], ,
vRNNs into models to capture the dependency information in
WResNet [15], and so on. ResNet and ResNet-based models
a sequence of images. In order to regulate the information flow
have achieved impressive, record-breaking performance in
of ResNet, we propose to leverage ConvRNNs as a separate
many challenging tasks. In object detection, 50- and 101-
module aiming to extracting spatio-temporal information as
layered ResNets are usually used as basic feature extractors in
complementary to the original feature maps of ResNets.
many models: Faster R-CNN [16], RetinaNet [17], Mask R-
CNN [18] and so on. The most recent models aiming at image
III. O UR M ODEL
super-resolution tasks, such as SRResNet [19], EDSR and
MDSR [20], are all based on ResNets, with a little modifica- In the section, we first revisit the background of ResNets
tion. Meanwhile, in [21], the ResNet is introduced to remove and two advanced ConvRNNs: ConvLSTM and ConvGRU.
rain streaks and obtains the state-of-the-art performance. Then we present the proposed RegNet architectures.
Despite the success in many applications, ResNets still
suffer from the depth issue [22]. DenseNet proposed by [6] A. ResNet
concatenates the input features with the output features using The degradation problem which makes the traditional net-
a densely connected path in order to encourage the network work hard to converge, is exposed when the architecture goes
3
X X
Conv
Conv X 1t X 1t
Ht 1
3 3 Ht 1
1 1
Conv
BN + RELU
Conv t
BN + RELU
X Xt
ConvRNN 2
ConvRNN 2
BN + RELU BN + RELU 3 3
(a) (b) H t BN + RELU
Ht
X 3t
Fig. 2. 2(a) shows the original underlying mapping while 2(b) shows the T Concat T Concat
1 1
residual mapping in ResNet [5]. BN + RELU
1 1
X 3t
BN + RELU
X 4t
3 3 1 1
BN
BN
X 4t X 5t
B. ConvRNN and its Variants
RELU RELU
RNN and its classical variants LSTM and GRU have
X 1t+1 X 1t+1
achieved great success in the field of sequence processing.
To tackle the spatio-temporal problems, we adopt the ba- (a) (b)
sic ConvRNN and its variants ConvLSTM and ConvGRU,
which are transformed from the vanilla RNNs by replacing Fig. 3. The RegNet module is shown in 3(a). The bottleneck RegNet block
is shown in 3(b). The 𝑇 denotes the number of building blocks as well as the
their fully-connected operators with convolutional operators. total time steps of ConvRNN.
Furthermore, for reducing the computational overhead, we
delicately design the convolutional operation in ConvRNNs.
In our implementation, the ConvRNN can be formulated as block. Based on those, by applying ConvRNNs as regulators,
𝑡 2𝑁 we get RNN-Regulated ResNet building module and bottle-
H = 𝑡𝑎𝑛ℎ( Wℎ𝑁 𝑡
∗ [X , H 𝑡−1
] + bℎ ), (1)
neck RNN-Regulated ResNet building module correspond-
where 𝑋 𝑡 is the input 3D feature map, 𝐻 𝑡−1 is the hidden state ingly.
obtained from the earlier output of ConvRNN and 𝐻 𝑡 is the 1) RNN-Regulated ResNet Module (RegNet module): The
output 3D feature map at this state. Both the number of input illustration of RegNet module is shown in Fig. 3(a). Here, we
𝑋 𝑡 and output 𝐻 𝑡 channels in the ConvRNN are N. choose ConvLSTM for expounding. 𝐻 𝑡−1 denotes the earlier
Additionally, 2𝑁 W 𝑁 ∗ X denotes a convolution operation output from ConvLSTM, and 𝐻 𝑡 is output of the ConvLSTM
between weights W and input X with the input channel 2N and at 𝑡-th module . 𝑋𝑖𝑡 denotes the 𝑖-th feature map at the 𝑡-th
the output channel N. To make the ConvRNN more efficient, module.
inspired by [30], [32], given input X with 2N channels, we The 𝑡-th RegNet(ConvLSTM) module can be expressed as
conduct the convolution operation in 2 steps:
(1) Divide the input X with 2N channels into N groups, X2𝑡 = 𝑅𝑒𝐿𝑈 (𝐵𝑁 (W12
𝑡
∗ X1𝑡 + b12
𝑡
),
and use grouped convolutions [33] with 1 × 1 kernel to [H𝑡 , C 𝑡 ] = 𝑅𝑒𝐿𝑈 (𝐵𝑁 (𝐶𝑜𝑛𝑣𝐿𝑆𝑇 𝑀 (X2𝑡 , [H𝑡−1 , C 𝑡−1 ]))),
process each group separately for fusing input channels. X3𝑡 = 𝑅𝑒𝐿𝑈 (𝐵𝑁 (W23
𝑡
∗ 𝐶𝑜𝑛𝑐𝑎𝑡 [X2𝑡 , H𝑡 ])),
(2) Divide the feature map obtained by (1) into N groups,
X4𝑡 = 𝐵𝑁 (W34
𝑡
∗ X3𝑡 + b34
𝑡
),
and use grouped convolutions with 3 × 3 kernel to
process each group separately for capturing the spatial X1𝑡+1 = 𝑅𝑒𝐿𝑈 (X1𝑡 + X4𝑡 ), (2)
information per input channel.
where W𝑖𝑡 𝑗 denotes the convolutional kernel which mapping
Directly applying the original convolutions with 3×3 kernels feature map X𝑖𝑡 to X𝑡𝑗 and b𝑖𝑡 𝑗 denotes the correlative bias.
suffers from high computational complexity. As detailed in Both W12 𝑡 𝑡
and W34 are 3 × 3 convolutional kernels. The W23𝑡
Table I, the new modification reduces the required computation is 1×1 kernel. BN(·) indicates batch normalization. 𝐶𝑜𝑛𝑐𝑎𝑡 [·]
by 18N/11 times with comparable result. Similarly, all the refers to the concatenate operation.
convolutions in ConvGRU and ConvLSTM are replaced with
Notice that in Eq (2) the input feature X2𝑡 and the previous
the light-weight modification.
output of ConvLSTM H𝑡 are the inputs of ConvLSTM in 𝑡-th
module. According to the inputs, the ConvLSTM automati-
C. RNN-Regulated ResNet cally decides whether the information in memory cell will be
To deal with the CIFAR-10/100 datasets and the Imagenet propagated to the output hidden feature map H𝑡 .
dataset, [5] proposed two kinds of ResNet building blocks: 2) Bottleneck RNN-Regulated ResNet Module (bottleneck
the non-bottleneck building block and the bottleneck building RegNet module): The bottleneck RegNet module based on the
4
TABLE IV
T EST ERROR RATES ON CIFAR-10/100. W E USE C ONV GRU AND C ONV LSTM AS REGULATORS OF R ES N ET. W E LIST THE INCREASE OF PARAMETER
THE ARCHITECTURES AT THE RIGHT CORNER OF THE ERROR RATES .
C-10 C-100
layer ResNet +ConvGRU +ConvLSTM ResNet +ConvGRU +ConvLSTM
20 8.38 7.42 (+0.04𝑀 ) 7.28 (+0.04𝑀 ) 31.72 29.69 (+0.04𝑀 ) 29.81 (+0.04𝑀 )
32 7.54 6.60 (+0.06𝑀 ) 6.88 (+0.07𝑀 ) 29.86 27.42 (+0.07𝑀 ) 28.11 (+0.07𝑀 )
56 6.78 6.39 (+0.11𝑀 ) 6.45 (+0.12𝑀 ) 28.14 27.02 (+0.11𝑀 ) 27.26 (+0.12𝑀 )
TABLE VI
8.5 S INGLE - CROP VALIDATION ERROR RATES ON I MAGE N ET AND
ResNet 32 ResNet
8.0 RegNet(ConvGRU) RegNet(ConvGRU) COMPLEXITY COMPARISONS . B OTH R ES N ET AND R EG N ET ARE
RegNet(ConvLSTM) 31 RegNet(ConvLSTM)
50- LAYER . R ES N ET∗ MEANS WE REPRODUCE THE RESULT BY OURSELF.
Test Error(%)
Test error(%)
7.5 30
(a) (b)
models on the ImageNet validation set. Compared with the [12] X. Shi, Z. Chen, H. Wang, D. Yeung, W. Wong, and W. Woo, “Convo-
baseline ResNet, our RegNet-50 with 31.3M parameters and lutional LSTM network: A machine learning approach for precipitation
nowcasting,” CoRR, vol. abs/1506.04214, 2015.
5.12G FLOPs not only surpasses the ResNet-50 but also [13] Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, and J. Feng, “Dual path
outperforms ResNet-101 with 44.6M parameters and 7.9G networks,” CoRR, vol. abs/1707.01629, 2017. [Online]. Available:
FLOPs. Since the proposed regulator module is essentially a http://arxiv.org/abs/1707.01629
[14] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Highway networks,”
beneficial makeup to the short cut mechanism in ResNets, one CoRR, vol. abs/1505.00387, 2015.
can easily apply the regulator module to other ResNet-based [15] F. Shen, R. Gan, and G. Zeng, “Weighted residuals for very deep
models, such as SE-ResNet, WRN-18 [8], ResNetXt [10], networks,” international conference on systems, pp. 936–941, 2016.
[16] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: towards
Dual Path Network (DPN) [13], etc. Due to computation re- real-time object detection with region proposal networks,” CoRR, vol.
source limitation, we leave the implementation of the regulator abs/1506.01497, 2015.
module in these ResNet extensions as our future work. [17] T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal loss for
dense object detection,” CoRR, vol. abs/1708.02002, 2017.
[18] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick, “Mask R-CNN,”
V. C ONCLUSIONS CoRR, vol. abs/1703.06870, 2017.
[19] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. P. Aitken, A. Te-
In this paper, we proposed to employ a regulator module jani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image
with Convolutional RNNs to extract complementary features super-resolution using a generative adversarial network,” CoRR, vol.
abs/1609.04802, 2016.
for improving the representation power of the ResNets. Ex- [20] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual
perimental results on three image-classification datasets have networks for single image super-resolution,” in CVPR, Workshops, July
demonstrated the promising performance of the proposed ar- 2017.
[21] X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley, “Removing
chitecture in comparison with standard ResNets and Squeeze- rain from single images via a deep detail network,” in 2017 IEEE
and-Excitation ResNets as well as other state-of-the-art archi- Conference on Computer Vision and Pattern Recognition, CVPR 2017,
tectures. Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 2017,
pp. 1715–1723.
In the future, we intend to further improve the efficiency of [22] G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, “Deep
the proposed architecture and to apply the regulator module networks with stochastic depth,” CoRR, vol. abs/1603.09382, 2016.
to other ResNet-based architectures [8]–[10] to increase their [23] W. Wang, X. Li, J. Yang, and T. Lu, “Mixed link networks,” CoRR, vol.
abs/1802.01808, 2018.
capacity. Besides, we will further explore RegNets for other [24] S. Woo, J. Park, J. Lee, and I. S. Kweon, “CBAM: convolutional block
challenging tasks, such as object detection [16], [17], image attention module,” CoRR, vol. abs/1807.06521, 2018.
super-resolution [19], [20], and so on. [25] J. Park, S. Woo, J. Lee, and I. S. Kweon, “BAM: bottleneck attention
module,” CoRR, vol. abs/1807.06514, 2018.
[26] N. Ballas, L. Yao, C. Pal, and A. C. Courville, “Delving deeper into
ACKNOWLEDGMENT convolutional networks for learning video representations,” CoRR, vol.
abs/1511.06432, 2015.
This work was partially supported by the National [27] X. Li, J. Wu, Z. Lin, H. Liu, and H. Zha, “Recurrent squeeze-and-
Key Research and Development Program of China (No. excitation context aggregation net for single image deraining,” CoRR,
vol. abs/1807.05698, 2018.
2018AAA0100204). [28] Z. Wang, P. Yi, K. Jiang, J. Jiang, Z. Han, T. Lu, and J. Ma, “Multi-
memory convolutional neural network for video super-resolution,” IEEE
R EFERENCES TIP, vol. 28, no. 5, pp. 2530–2544, May 2019.
[29] Y. Xu, L. Gao, K. Tian, S. Zhou, and H. Sun, “Non-local convlstm
[1] Y. LeCun, Y. Bengio et al., “Convolutional networks for images, speech, for video compression artifact reduction,” CoRR, vol. abs/1910.12286,
and time series,” The handbook of brain theory and neural networks, 2019.
vol. 3361, no. 10, p. 1995, 1995. [30] M. Liu, M. Zhu, M. White, Y. Li, and D. Kalenichenko, “Looking fast
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification and slow: Memory-guided mobile video object detection,” CoRR, vol.
with deep convolutional neural networks,” in Advances in Neural Infor- abs/1903.10172, 2019.
mation Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and [31] M. Siam, S. Valipour, M. Jägersand, and N. Ray, “Convolutional gated
K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105. recurrent networks for video segmentation,” CoRR, vol. abs/1611.05435,
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks 2016.
for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014. [32] 2018 IEEE Conference on Computer Vision and Pattern Recognition,
[Online]. Available: http://arxiv.org/abs/1409.1556 CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. IEEE
[4] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, Computer Society, 2018.
D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with [33] T. Zhang, G. Qi, B. Xiao, and J. Wang, “Interleaved group convolutions
convolutions,” CoRR, vol. abs/1409.4842, 2014. for deep neural networks,” CoRR, vol. abs/1707.02725, 2017.
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image [34] A. Krizhevsky and G. Hinton, “Learning multiple layers of features
recognition,” CoRR, vol. abs/1512.03385, 2015. from tiny images,” Master’s thesis, Department of Computer Science,
[6] G. Huang, Z. Liu, and K. Q. Weinberger, “Densely connected convolu- University of Toronto, 2009.
tional networks,” CoRR, vol. abs/1608.06993, 2016. [35] G. Lebanon and S. V. N. Vishwanathan, Eds., Proceedings of the
[7] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning trans- Eighteenth International Conference on Artificial Intelligence and
ferable architectures for scalable image recognition,” CoRR, vol. Statistics, AISTATS 2015, San Diego, California, USA, May 9-12, 2015,
abs/1707.07012, 2017. ser. JMLR Workshop and Conference Proceedings, vol. 38. JMLR.org,
[8] S. Zagoruyko and N. Komodakis, “Wide residual networks,” CoRR, vol. 2015. [Online]. Available: http://jmlr.org/proceedings/papers/v38/
abs/1605.07146, 2016. [36] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are
[9] C. Szegedy, S. Ioffe, and V. Vanhoucke, “Inception-v4, inception- features in deep neural networks?” CoRR, vol. abs/1411.1792, 2014.
resnet and the impact of residual connections on learning,” CoRR, vol.
abs/1602.07261, 2016.
[10] S. Xie, R. B. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual
transformations for deep neural networks,” CoRR, vol. abs/1611.05431,
2016.
[11] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” CoRR,
vol. abs/1709.01507, 2017.