Received February 4, 2021, accepted March 24, 2021, date of publication March 29, 2021, date of current version

September 14, 2021.

Digital Object Identifier 10.1109/ACCESS.2021.3069211

Exploring sMRI Biomarkers for Diagnosis of

Autism Spectrum Disorders Based on Multi Class
Activation Mapping Models


1 Hubei Key Laboratory of Modern Manufacturing Quality Engineering, School of Mechanical Engineering, Hubei University of Technology, Wuhan,
Hubei 430068, China
2 Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis and Treatment, College of Biomedical Engineering, South-Central University for

Nationalities, Wuhan, Hubei 430074, China

Corresponding author: Fengkai Ke (
This work was supported in part by the State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics under Grant
T152102, in part by the Scientific and Technological Research of Education Department of Hubei Province under Grant Q20181408, in
part by the Scientific Research Foundation of Science and Technology Department of Hubei Province under Grant 2018CFB190, in part by
the Doctor Launching Fund of Hubei University of Technology under Grant BSQD20160004, in part by Hubei Chenguang Talented Youth
Development Foundation (HBCG), under Grant 2017109, in part by Hubei Key Laboratory of Medical Information Analysis and Tumor
Diagnosis and Treatment, under Grant PJS140012003, and in part by Hubei Key Laboratory of Modern Manufacturing Quality
Engineering under Grant KFJJ-2020008.

ABSTRACT Due to the complexity of the etiology of autism spectrum disorders, the existing autism
diagnosis method is still based on scales. With the continuous development of artificial intelligence, image-
aided diagnosis of brain diseases has been widely studied and concerned. However, many doctors and
researchers still doubt the diagnosis basis of the neural network and think that the neural network belongs to
a limited interpretable black-box function approximator. They are not sure whether the neural network has
learned some interpretive image features like humans. In order to solve this problem, three new models (2D
CAM, 3D CAM and 3D Grad-CAM) are proposed for structural Magnetic Resonance Imaging (sMRI) data.
The Regions Of Interest (ROI) of subcortical tissues among models and between groups are analyzed based
on the heat maps of the three models. The experimental results show that these models mainly distinguish
the autism group and the control group according to the voxel value of these ROIs. There are significant
differences in mean voxel value and standard deviation of voxel value between the autism group and the
control group, such as in the left amygdala, optic chiasm and right hippocampus. According to medical
references, these ROIs are closely related to people’s speech, cognition and behavior. This can partly explain
why autistic patients have unusual symptoms such as speech communication disorder, stereotyped repetitive
behavior and so on. The proposed visualization models can provide a good bridge for doctors to understand
the brain features learned by the neural network. The research method of this paper may provide a new
way for doctors and researchers to find the diagnostic biomarkers of autism, which can greatly speed up the
process of modern medical diagnosis and treatment strategies, and liberate doctors from the traditional trial
and error.

INDEX TERMS Autism spectrum disorders, class activation mapping, sMRI, biomarker.

I. INTRODUCTION non-verbal expression difficulties and social interaction dis-

Autism is the abbreviation of Autism Spectrum Disorders order. It has obvious interest in restrictive behaviors and
(ASD). It is a disease caused by brain development dis- repetitive actions [1]–[4].
order. Its main characteristics are emotional, verbal and At present, the clinical diagnosis of autism still mainly
relies on the Autism Diagnosis Observation Scale (ADOS)
The associate editor coordinating the review of this manuscript and and Autism Diagnosis Interview Scale (ADIS) to make
approving it for publication was Giovanni Dimauro . a comprehensive assessment of the patient’s condition.
R. Yang et al.: Exploring sMRI Biomarkers for Diagnosis of ASD Based on Multi CAM Models

In addition, doctors usually need the information provided by layer includes multiple convolution layers, pooling layers and
the family members of autistic patients, as well as the behav- fully connected layers, which is responsible for extracting the
ioral characteristics of autistic patients observed by doctors. abstract features of the image; the output layer is also called
After all the information has been collected, the doctor can classification layer, which classifies the image according to
make the final diagnosis [5]–[7]. However, this method is the abstract features. Although deep learning models have
not objective and is hard to distinguish autism from other demonstrated outstanding capabilities in disease prediction,
diseases (such as depression, mental retardation and language in order to be applied to more advanced artificial intelligence
development). systems, such as large-scale surgery and large-scale control,
With the rapid development of computer technology, these models must be interpretable (to ensure that decisions
Computer Aided Diagnosis (CAD) technology has been are clearly explained) and unbiased (without showing pref-
developed rapidly in some medical developed countries, erence for some certain behaviors). This is the main reason
especially in the field of medical imaging. Statistical data why neural networks have not been fully applied to practical
show that CAD plays an active role in improving the diag- engineering applications in recent years. They are still limited
nostic accuracy, reducing missed diagnosis and improving interpretable black box function approximators [23], [24].
work efficiency [8], [9]. At present, CAD research is mostly The use of machine learning model for accurate diag-
limited to the segmental lesions of breast and chest [10]. The nosis of disease has been unable to meet the needs of
research on CT-based diagnosis of liver diseases and MRI- researchers. How to find out the biomarkers of diseases
based diagnosis of brain tumors is not mature [11], [12]. for diagnosis and treatment has become a research hotspot,
Therefore, CAD research of breast and pulmonary nodules for example, [25] proposed a multiview feature learning
can basically represent the highest level of CAD in medical method with multiatlas based functional connectivity net-
imaging. works to improve mild cognitive improvement diagnosis, [26]
Since autism is essentially a highly heterogeneous neu- identified the disease subtypes by analyzing, via unsuper-
rodevelopmental disorder, structural Magnetic Resonance vised and supervised machine learning, the power-envelope-
Imaging (sMRI) can be used to detect brain lesions. MRI based connectivity of signals reconstructed from high-density
is a new medical imaging technology, which has been used resting-state electroencephalography, [27] designed a latent-
in clinic since 1982. It uses static magnetic field and radio space machine-learning algorithm tailored for resting-state
frequency magnetic field to make human tissue image. In electroencephalography (EEG).
the process of imaging, it can obtain clear image with high In addition, the interpretation of deep neural network is
contrast without electron radiation or contrast agent. It can also helpful to find disease-related biomarkers. For the inter-
reflect the disorder and early pathological changes of human pretability of neural network, many experts have done a lot of
organs from the inside of human molecules. relevant research and analysis, such as deconvolution, occlu-
Using deep learning to develop CAD technology has sion, attention model [28], guided back-propagation, Class
become a trend. It has been widely used in the diagnosis Activation Mapping (CAM) and so on.
of diseases based on Magnetic Resonance Imaging (MRI) The concept of deconvolution was first proposed by Zeiler
data, such as liver tumor [13], breast cancer [14], [15], in [29]. In this paper, he mainly explained the relationship
brain tumor [16], [17], Alzheimer’s disease [18], [19] and between convolution layer and deconvolution layer. Decon-
AHDH [20]–[22]. In this paper, we use deep learning to volution is not actually the inverse operation of convolution,
classify autistic patients, and find out the corresponding diag- it is just the transposition of convolution. He processed the
nostic biomarkers according to the output of neural network, high-dimensional abstract features extracted by the trained
so as to provide objective diagnosis basis for doctors. CNN through depooling, deactivation, deconvolution and
other operations. Finally, the abstract features are expanded
II. RELATED WORK to the size of the original image, so as to observe the features
Some research statistics show that in the medical field, more learned by CNN after training. Occlusion is mainly derived
than 70% of clinical diagnosis technology needs medical from [30]. The basic idea of occlusion is that when training
image data. How to make full use of and correctly analyze a neural network for image classification, we want to know
these data to assist doctors in diagnosis and treatment has whether this model can locate the position of the main target
become a more and more popular research direction. Convo- in the image for classification, or only classify through some
lution Neural Network (CNN) in deep learning has become surrounding context. Through partial occlusion of the original
an indispensable means. It is the most common classification image, we can observe the changes of the features extracted
model, mainly composed of three parts. The first part is the by the middle layer and the final predicted value after modi-
first layer of the neural network, which is called the input fying the image input. The guided back-propagation method
layer. The second part is the second layer to the penultimate comes from [31]. Guided back-propagation firstly calculates
layer of the neural network, which is called the feature extrac- the gradient of output with respect to input through back-
tion layer. The third part is the last layer of the neural network, propagation, and then uses the ReLU function to only keep
which is called the output layer. For image classification, the regions that have positive influence on the output. Com-
the input layer is the original image; the feature extraction pared with the original back-propagation method, this method

R. Yang et al.: Exploring sMRI Biomarkers for Diagnosis of ASD Based on Multi CAM Models

can not only remove the noise, but also greatly improve the prevent over-fitting problem, but also retain the spatial infor-
visualization effect of neural network. mation of features. After training the neural network, there is
Although deconvolution, occlusion and guided back- a one-dimensional vector with the same dimension as the last
propagation can reflect the sensitivity of neural network to convolution channel for each class of objects. By weighting
some features or images, the biggest drawback of these meth- each piece of the last convolution layer, the interpretation
ods is that they are greatly affected by noise, the original image can be obtained. If yc is the final classification result
image area is still not so obvious, and the interpretation of neural network, then the output of CAM model can be
images of different types of images are not very different. represented by
Another kind of neural network interpretation method q u v
is represented by CAM-based models, such as CAM [32], X 1 XX k
yc = wck Aij (1)
Gradient-weighted Class Activation Mapping (Grad-CAM) Z
k i j
[33], Grad-CAM++ [34], etc. This kind of models mainly
uses the last convolution layer of CNN for mathemati- where q is the total number of the feature map Ak , wck is the
cal transformation. Compared with the above algorithm, weight of class c for feature map Ak , the width and height
the interpretation images given by these models are clearer of feature map Ak are u and v respectively, the total number
and more explanatory. of pixels is Z , and Akij is the pixel value at (i, j) of the k th
The CAM-based models have been successfully applied to feature map. In order to obtain the interpretative region of
the classification of muscular dystrophies [35], tumor diag- the specified class in the original image, the final heat map is
nosis [36], EEG signal interpretation [37] and other medical obtained as follows
research fields [38]. The biggest advantage of these models q
is that it can accurately locate the abnormal positions of c
LCAM = wck Ak (2)
features (such as electrical signals, cerebral cortex, patho-
logical images, etc.) that are difficult for human eyes to
recognize. However, most of these articles only deal with Different from the method of CAM model, Grad-CAM
two-dimensional data, and only analyze a single CAM model. does not need to change the fully connection layer of the
MRI is a kind of three-dimensional image data. Both two- model into GAP. Grad-CAM model calculates the weights of
dimensional and three-dimensional features have rich med- feature map by using gradients. If the last convolution layer
ical significance, which need to be analyzed from different of CNN is weighted by this gradient information, the ROI of
perspectives. On this basis, we change the original CAM neural network in the original image can be finally obtained.
model into three different types of neural network model. Grad-CAM calculates weights from the global average of
In addition to the above models, the innovation of this gradients, as follows
manuscript is mainly reflected in the exploration method of u v
autism diagnostic biomarkers. (1) In the application of med- 1 X X ∂yc
αkc = (3)
ical image, the subcortical tissue that CAM model focuses Z ∂Akij
i j
on is generally given by doctors or researchers. Unlike in
this paper, the analysis is based on the accurate subcortical The preliminary heat map is obtained by weighting αkc and
tissue segmentation data provided by FreeSurfer software. (2) the feature map Ak . In order to get clearer results, only the
According to the characteristics of different models focusing positive value in the feature map are used, the final heat map
on different features of the brain, this paper analyzes the is obtained as follows
similarities and differences of autism diagnostic biomarkers X q
between models and groups, so as to provide a more compre- c
LGrad−CAM = ReLU αk A
c k
hensive analysis method. k


III. CLASS ACTIVATION MAPPING Since sMRI belongs to three-dimensional data, according to
A. TRADITIONAL CAM MODELS its characteristics, we propose 2D CAM, 3D CAM and 3D
After the image data is processed by multiple convolution Grad-CAM for sMRI data based on existing CAM and Grad-
layers and pooling layers, the last convolution layer contains CAM models, as shown in Fig. 1 - Fig. 3.
the location, color, size and other abstract features of the As shown in Fig. 1, for 2D CAM, the original sMRI data
original image. Based on the CAM model, different weights is sliced into n 2D images, which are used as input to n 2D
are set for each piece of the last convolution layer, so as to CNN. The parameters of each neural network are trained
obtain the ROI for different classes of objects. independently. The corresponding 2D GAP is used in the
CAM model is based on the structure of network in net- last convolution layer of each 2D CNN, and then 1D GAP is
work (NIN) [39]. It uses the Global Average Pooling (GAP) used after obtaining one-dimensional vector. At last, the final
function to deal with the last convolution layer of the neural outputs of each 2D CNN are all connected to the classifi-
network. It can not only reduce the feature dimension and cation output layer. As a result, for the output of 2D CAM,

R. Yang et al.: Exploring sMRI Biomarkers for Diagnosis of ASD Based on Multi CAM Models

As shown in Fig. 2 and Fig. 3, the 2D image input is

adjusted to 3D sMRI input. Similarly, 2D convolution and
2D pooling are changed into 3D convolution and 3D pooling
respectively. For 3D CAM, 2D GAP is also changed to 3D
GAP in the last convolution layer. As a result, for the output
of 3D CAM, (1) becomes
q u v n
X 1 XXX k
yc = wck Aijm (7)
Z m
k i j
FIGURE 1. Flow chart of 2D CAM model for sMRI data.
where the width, height and depth of feature map Ak are u, v
and n respectively, and Akijm is the voxel value at (i, j, m) of
the k th feature map. Then, we use (7) and (2) to get the heat
map of 3D CAM.
For 3D CAM, (3) is changed as follows
u v n
1 X X X ∂yc
αkc = (8)
m ∂Aijm
Z k
i j

and then we use (8) and (4) to get the heat map of 3D
Compared with the traditional CAM models used in sMRI,
FIGURE 2. Flow chart of 3D CAM model for sMRI data.
most CAM models only analyze several slices of the sMRI
data. For 2D CAM, the independent models of each slice are
fused at the end of the model. For 3D CAM and 3D Grad-
CAM, this paper makes a complete three-dimensional feature
extraction for sMRI data. Therefore, the experimental results
of these three models are more convincing.


The data set used in this paper is from the open source
website – Autism Brain Imaging Data Exchange (ABIDE),
FIGURE 3. Flow chart of 3D Grad-CAM model for sMRI data. which specializes in autism research. According to a large
number of references and our experimental results, the pre-
diction accuracy is only about 63% when using the data
(1) becomes sets of all sites [40]. If CAM model is applied to these
d q u v data, the diagnostic biomarkers obtained are not credible.
X X 1 X X lk
yc = wcl wclk Aij (5) Therefore, we choose the optimal data set and the optimal
Z number of samples based on the classification accuracy. The
l k i j
data set comes from New York University Langone Medical
where l is the l th slice of the sMRI data, d is the total number Center, with a total of 175 samples. Then, according to the
of slices, q is the total number of the feature map Alk , wclk is image quality and image registration effect of NYU sample
the weight of class c for feature map Alk , wcl is the weight of set, 58 samples were selected, including 30 in autism group
class c for the l th slice, the width and height of feature map and 28 in control group.
Alk are u and v respectively, the total number of pixels is Z , In order to ensure the reliability and generality of the
and Alkij is the pixel value at (i, j) of the k feature map for
experiment, we compared the performance of CNN, Recur-
the l slice. rent Neural Network (RNN), 2D CAM, 3D CAM and 3D
Similarly, in order to obtain the interpretative region of Grad-CAM models in autism classification experiment. Gen-
the specified class in each slice image, for 2D CAM, (2) is erally speaking, in order to train the model in machine learn-
changed as follows ing, 5-fold Cross Validation (CV) and 10-fold CV are mostly
used. The more the number of experiments, the more the
d q
X X randomness of the experimental results can be reduced, which
L2DCAM = wcl wclk Alk (6)
can make our experimental conclusion more convincing.
l k
In addition, in the training set and test set, the proportion
Then, we use (5) and (6) to get the heat map of 2D CAM. of samples with different labels will have a certain impact

R. Yang et al.: Exploring sMRI Biomarkers for Diagnosis of ASD Based on Multi CAM Models

FIGURE 4. Training loss curve with the increase of training epochs.

FIGURE 5. Heat maps for (a) 2D CAM (b) 3D CAM (c) 3D Grad-CAM.

TABLE 1. Performance of different models.

gradient information, they are close to CNN in terms of
accuracy and other indexes. Therefore, we can predict diag-
nostic biomarkers of autism according to the heat maps by
the proposed models, as shown in Fig. 5. The red areas in the
figure represent the areas that have a positive impact on the
classification results. To identify the subcortical tissue that
each neural network focuses on, we use FreeSurfer software
on the accuracy, sensitivity and specificity, etc. Therefore, and Desikan-Killiany parcellation method to segment the
we finally use stratified 10-fold CV to train each model. The cerebral cortex and subcortical tissues of the above original
training samples and test samples are set to be 90% and 10% sMRI data [41], [42].
of the original dataset respectively. The proportion of autistic For the heat map data of each sample, we take 75% of the
patients and controls in the training set and test set of each maximum value in the heat map as the threshold, and use this
experiment is also set to be the same. threshold to segment the most concerned region of the model
The hardware configuration of the computer used in this (If greater than the threshold is 1, otherwise it is 0). When
paper is Intel i7-7700HQ quad-core processor, the main fre- more than 75% of a certain part in the subcortical tissue is
quency is 2.7Ghz, 8G DDR4 2400MHz memory, the graphics covered by 1, it is considered as the ROI of this experiment.
card is NVIDIA GeForce 2080Ti. The system is Ubuntu For the 10-fold CV method, 10 experiments are needed.
16.04, and Tensorflow 1.5 is used to build the neural network When a ROI is concerned more than 7 times, it is considered
models. that this ROI of the sample is particularly important for the
classification of autism. After finishing the 10 experiments,
B. CLASSIFICATION AND ROI ANALYSIS we can know which areas are more important and which
The training loss of the three models are shown in Fig. 4, and specific indexes of these areas are related to autism (All the
their performance indexes are shown in Table 1. above thresholds are empirical values, such as 75% and 7.
As shown in Table 1, ACC, SEN, SPE, PPV, NPV, AUC They all come from the experimental results and statistical
are abbreviations of accuracy, sensitivity, specificity, positive analysis).
predicted value, negative predicted value, area under curve, According to the above rules, the ROIs for each model
and calculation time for a single sample. The values of each and group are shown in Table 2. The numbers in the table
index in the table are the mean values of the 10 cross valida- represent a certain part of subcortical tissue, for example,
tion experiments. 85 stands for optic chiasm, 53 stands for right hippocampus,
As can be seen from Table 1, except for 2D CAM, the cal- and so on. For each model of 10 experiments, we conducted
culation time of 3D CAM and 3D Grad-CAM proposed in this the following statistic analysis. (1) The common ROIs of a
paper is not much different from that of traditional models single group (autism group or control group) for each model
(CNN, RNN). Because these two models mainly add the (2D CAM, 3D CAM, or 3D Grad-CAM), which are marked in
corresponding 3D GAP or 3D gradient calculation at the green font in Table 2; (2) The common ROIs of all groups for
end of each model. For 2D CAM, each slice of sMRI is each model (2D CAM, 3D CAM, or 3D Grad-CAM), which
an independent model, and then these independent models are marked in blue font in Table 2; (3) The common ROIs of
are combined at the end of 2D CAM. Therefore, the model a single group (autism group or control group) of all models,
parameters of 2D CAM are much larger than those of 3D which are marked in red font in Table 2.
CAM and 3D Grad-CAM. Each CAM-based model has different feature extraction
Although the proposed models (2D CAM, 3D CAM and tendency and noise. For example, some tumor diseases
3D Grad-CAM) reduce the dimension of abstract features in may be difficult to detect in the coronal plane and cross
the last convolution layer of the original CNN or introduce section, but they are very prominent in the sagittal plane.

R. Yang et al.: Exploring sMRI Biomarkers for Diagnosis of ASD Based on Multi CAM Models

FIGURE 6. (a) The histogram of different indexes in different parts, including volume, mean voxel value, standard deviation of voxel value,
maximum voxel value and minimum voxel value. (b) the violin diagram of indexes that have significant differences between the autism group and
the control group. (c) 3D schematic of ROIs in subcortical tissue of 2D CAM.

In order to make our experimental conclusions (or diagnostic of left amygdala and optic chiasm. There are significant
biomarkers) more reliable and convincing, we analyzed the differences in the mean voxel value and the minimum voxel
experimental results between models and between groups value of left amygdala, and the mean voxel value and standard
respectively in Table 2, instead of just analyzing a certain deviation of the voxel value of optic chiasm. For the data of
group of a certain model. Therefore, we will analyze the fol- the above indexes in the autism group and the control group,
lowing five situations: (1) For 2D CAM, analyze the common we respectively fitted the corresponding probability distribu-
ROIs between autism and control group; (2) For 3D CAM, tion model, as shown in Fig. 6 (b). Specifically, as shown
analyze the common ROIs between autism and control group; in Fig. 6 (b), we can see that the mean value of the proba-
(3) For 3D Grad-CAM, analyze the common ROIs between bility distribution model of the mean voxel value of the left
autism and control group; (4) For the autism group, analyze amygdala and optic chiasm in autism group is significantly
the common ROIs among the three CAM-based models; (5) higher than that of the control group. Besides, the minimum
For the control group, analyze the common ROIs among the voxel value of the left amygdala and the standard deviation
three CAM-based models; of voxel value of optic chiasm in the autism group are more
concentrated, that is to say, the variance of the corresponding
TABLE 2. The common ROIs for each model and group.
probability distribution model is smaller than that of the
control group.
(2) For 3D CAM, the ROIs shared by autism and control
groups are in the third row and fourth column of Table 2. The
subcortical tissues are 18 (Left Amygdala), 44 (Right Inferior
Lateral Ventricle) and 53 (Right Hippocampus). As shown
in Fig. 7 (a), the dotted line indicates that the p-value between
the autism group and the control group is equal to 0.05.
For 3D CAM model, it mainly determines whether the
sample has autism by identifying the voxel values of left
amygdala and right hippocampus, and the volume of right
hippocampus. There are significant differences in the mean
voxel value and the minimum voxel value of left amygdala,
and the volume and mean voxel value of right hippocampus.
For the data of the above indexes in the autism group and the
control group, we respectively fitted the corresponding proba-
bility distribution model, as shown in Fig. 7 (b). Specifically,
as shown in Fig. 7 (b), we can see that the mean value of
(1) For 2D CAM, the ROIs shared by autism and control the probability distribution model of the mean voxel value of
groups are in the second row and fourth column of Table 2. the left amygdala and right hippocampus in autism group is
The subcortical tissues are 18 (Left Amygdala) and 85 (Optic significantly higher than that of the control group. Similarly,
Chiasm). As shown in Fig. 6 (a), the dotted line indicates that the volume of the right hippocampus in autism group is also
the p-value between the autism group and the control group higher than that of the control group. Besides, the standard
is equal to 0.05. For 2D CAM model, it mainly determines deviation of the probability distribution model of the mean
whether the sample has autism by identifying the voxel values voxel value of the right hippocampus in the autism group

R. Yang et al.: Exploring sMRI Biomarkers for Diagnosis of ASD Based on Multi CAM Models

FIGURE 7. (a) The histogram of different indexes in different parts, including volume, mean voxel
value, standard deviation of voxel value, maximum voxel value and minimum voxel value. (b) the
violin diagram of indexes that have significant differences between the autism group and the control
group. (c) 3D schematic of ROIs in subcortical tissue of 3D CAM.

FIGURE 8. (a) The histogram of different indexes in different parts, including volume, mean voxel value, standard deviation of voxel value, maximum
voxel value and minimum voxel value. (b) the violin diagram of indexes that have significant differences between the autism group and the control group.
(c) 3D schematic of ROIs in subcortical tissue of 3D Grad-CAM.

are more concentrated, that is to say, the variance of the of left cerebellum cortex and 4th-ventricle in autism group is
corresponding probability distribution model is smaller than significantly different than that of the control group. Besides,
that of the control group. the standard deviation of the probability distribution model
(3) For 3D Grad-CAM, the ROIs shared by autism and of standard deviation of the voxel value of left cerebellum
control groups are in the second row and fourth column of cortex and 4th-ventricle in the autism group are also very
Table 2. The subcortical tissues are 7 (Left Cerebellum White different, that is to say, the variance of the corresponding
Matter), 8 (Left Cerebellum Cortex), 15 (4th-Ventricle) and probability distribution model is bigger (or smaller) than that
46 (Right Cerebellum White Matter). of the control group. Similarly, the standard deviation of the
As shown in Fig. 8 (a), the dotted line indicates that the probability distribution model of the maximum voxel value
p-value between the autism group and the control group is of 4th-ventricle is significantly higher in autism group than
equal to 0.05. For 3D Grad-CAM model, it mainly deter- that of the control group.
mines whether the sample has autism by identifying the voxel (4) For autism group, the ROIs shared by the three
values of left cerebellum cortex and 4th-ventricle. There are CAM-based models are in the second row and fourth column
significant differences in the mean voxel value and standard of Table 2. The subcortical tissues are 7 (Left Cerebellum
deviation of the voxel value of left cerebellum cortex, and the White Matter) and 15 (4th-Ventricle). As shown in Fig. 9 (a),
mean voxel value, the maximum voxel value and standard the dotted line indicates that the p-value between the autism
deviation of the voxel value of 4th-ventricle. For the data group and the control group is equal to 0.05. The three
of the above indexes in the autism group and the control models mainly determine whether the sample has autism
group, we respectively fitted the corresponding probabil- by identifying the voxel values of 4th-ventricle. There are
ity distribution model, as shown in Fig. 8 (b). Specifically, significant differences in the mean voxel value, the maxi-
as shown in Fig. 8 (b), we can see that the mean value of mum voxel value and standard deviation of the voxel value
the probability distribution model of the mean voxel value of 4th-ventricle. For the data of the above indexes in the

R. Yang et al.: Exploring sMRI Biomarkers for Diagnosis of ASD Based on Multi CAM Models

FIGURE 9. (a) The histogram of different indexes in different parts, including volume, mean voxel value, standard deviation of voxel value, maximum
voxel value and minimum voxel value. (b) the violin diagram of indexes that have significant differences between the autism group and the control
group. (c) 3D schematic of ROIs in subcortical tissue between three CAM-based models for autism group.

autism group and the control group, we respectively fitted Grad-CAM models. As mentioned above, CAM-based mod-
the corresponding probability distribution model, as shown els can explain the classification basis of neural network and
in Fig. 9 (b). Specifically, as shown in Fig. 9 (b), we can see reflect the features extracted by neural network. Based on
that the mean value of the probability distribution model of this, according to the heat maps of these three CAM-based
the mean voxel value, the maximum voxel value and standard models, this paper analyzes the ROIs among models and
deviation the voxel value of 4th-ventricle in autism group between groups. The results show that these models diagnose
are all more concentrated, that is to say, the variance of the autism mainly through the voxel values of subcortical tissues,
corresponding probability distribution model is smaller than such as left amygdala, chiasma and right hippocampus. Sta-
that of the control group. tistical analysis show that there are significant differences in
(5) For control group, the ROIs shared by the three the mean or standard deviation of voxel values between the
CAM-based models are in the second row and fourth column autism group and the control group.
of Table 2. The subcortical tissues are 18 (Left Amygdala) However, there are some differences in MRI acquisition
and 85 (Optic Chiasm). The three models mainly determine methods, acquisition equipment and autism diagnosis meth-
whether the sample is normal by identifying the voxel values ods in different sites. If all data sets are used, the classification
of the above two parts. These two parts are the same as those accuracy of neural network is not high. Therefore, this paper
used in previous 2D CAM model. Therefore, the analysis and only analyzes the data set of a single site to obtain the ideal
the corresponding figure of these two parts are the same as (1) accuracy. In this way, the diagnostic biomarkers obtained
and they will not be repeated here. from the analysis of three CAM-based models are more
To sum up, according to the experimental results of the reliable. In the future research, we will study how to analyze
above five cases, most of them distinguish autism from the multi site data by dimensionality reduction and clustering
control group according to the voxel value of subcortical algorithms, so as to get more accurate diagnostic biomarkers
tissue. Although the minimum and maximum voxels in some and more generalized classification algorithm.
ROIs do show significant differences, which can be used as Besides, in the training process, our proposed models only
a reference for the diagnosis of autism. However, due to the depends on the cross entropy loss function, which leads
different data collection methods of different sites and differ- to some meaningless ROI regions in some heat maps. The
ent devices, these indicators are easy to be disturbed by noise recently proposed CAM model also add the segmentation
and may only be applicable to the data of single site. For data loss function provided by the heat map, which can show the
collected from multiple sites or different devices, the standard heat region of the model more accurately, such as Guided
deviation and mean voxel values of these ROIs are relatively Attention Information Network (GAIN) [43], Score-CAM
reliable. In addition, from the above analysis, most of the [44] and so on. In addition, we will use the multi-task learn-
ROIs are related to self-learning, communication, behavioral ing algorithm to further improve the above models. While
decision-making, and some cognitive functions. Thus, there completing the previous training tasks, we will use the sMRI
is a possibility that these brain structural abnormalities of segmentation label data provided by FreeSurfer software to
autistic patients lead to behavioral abnormalities. It can partly learn how to segment each subcortical tissue part, so that the
explain why autistic patients have unusual symptoms such models can learn more medical features.
as stereotyped repetitive behavior, speech communication
disorder and so on. ADDITIONAL INFORMATION
The open source data sets come from the website ABIDE.
V. CONCLUSION The usage agreement clearly states, ‘‘Consistent with the
In this paper, we propose 2D CAM, 3D CAM and 3D policies of the 1000 functional connectors project, data usage
Grad-CAM for sMRI data based on existing CAM and is unrestricted for non-commercial research purposes.’’

R. Yang et al.: Exploring sMRI Biomarkers for Diagnosis of ASD Based on Multi CAM Models

R. Yang et al.: Exploring sMRI Biomarkers for Diagnosis of ASD Based on Multi CAM Models

[38] Y. Sato, Y. Takegami, T. Asamoto, Y. Ono, T. Hidetoshi, R. Goto, HUANPING LIU was born in Hubei, China,
A. Kitamura, and S. Honda, ‘‘A computer-aided diagnosis system using in 1996. He received the bachelor’s degree from
artificial intelligence for hip fractures -multi-institutional joint develop- the Hubei University of Technology, Wuhan,
ment research,’’ 2020, arXiv:2003.12443. [Online]. Available: http://arxiv. China, in 2019. He is currently pursuing the
org/abs/2003.12443 master’s degree with the School of Mechanical
[39] M. Lin, Q. Chen, and S. Yan, ‘‘Network in network,’’ 2013, Engineering. His current research interests include
arXiv:1312.4400. [Online]. Available: medical image processing, machine learning, and
[40] C. Cameron et al., ‘‘The neuro bureau preprocessing initiative: Open
statistical analysis of the autistic brain.
sharing of preprocessed neuroimaging data and derivatives,’’ Frontiers
Neuroinform., vol. 7, 2013.
[41] B. Alexander, W. Y. Loh, L. G. Matthews, A. L. Murray,
C. Adamson, R. Beare, J. Chen, C. E. Kelly, P. J. Anderson, L. W. Doyle,
A. J. Spittle, J. L. Y. Cheong, M. L. Seal, and D. K. Thompson, ‘‘Desikan-
killiany-tourville atlas compatible version of M-CRIB neonatal parcellated
whole brain atlas: The M-CRIB 2.0,’’ Frontiers Neurosci., vol. 13, p. 34,
Feb. 2019.
[42] O. Potvin, L. Dieumegarde, and S. Duchesne, ‘‘Freesurfer cortical norma-
tive data for adults using Desikan-Killiany-Tourville and ex vivo proto-
cols,’’ NeuroImage, vol. 156, pp. 43–46, Aug. 2017.
[43] K. Li, Z. Wu, K.-C. Peng, J. Ernst, and Y. Fu, ‘‘Tell me where to look:
Guided attention inference network,’’ in Proc. IEEE Conf. Comput. Vis. MINGCHENG ZHOU was born in Anhui, China,
Pattern Recognit. (CVPR), Feb. 2018, pp. 9215–9223. [Online]. Available: in 1995. He received the bachelor’s degree from Chuzhou University, in 2020. He is currently
[44] H. Wang, Z. Wang, M. Du, F. Yang, Z. Zhang, S. Ding, P. Mardziel, pursuing the master’s degree with the School
and X. Hu, ‘‘Score-CAM: Score-weighted visual explanations for con- of Mechanical Engineering, Hubei University of
volutional neural network,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
Technology. His current research interests include
Pattern Recognit. (CVPR), Apr. 2020, pp. 24–25. [Online]. Available:
medical image processing, machine learning, and
statistical analysis of the autistic brain.

RUI YANG is currently pursuing the bachelor’s

degree with the School of Mechanical Engineer-
ing, Hubei University of Technology, Wuhan,
Hubei, China. His main research interests include
medical image analysis of autistic brain, deep
learning, and statistical analysis.

HUI-MIN CAO received the bachelor’s degree

FENGKAI KE received the Ph.D. degree from from the Huazhong University of Science and
the School of Mechanical Science and Engineer- Technology, in 1994, and the Ph.D. degree from
ing, Huazhong University of Science and Technol- the School of Mechanical Science and Engineer-
ogy (HUST), in 2016. In 2018, he worked as a ing, Huazhong University of Science and Tech-
Postdoctoral Researcher with the Korea Advanced nology, in 2006. From 1994 to 2000, he worked
Institute of Science and Technology (KAIST), with the CNC Machining Center, Wuhan Intercon-
Daejeon, South Korea. He is currently work- tinental Communication Power Group Company
ing with the School of Mechanical Engineering, Ltd. Since 2006, he has been a Teacher with the
Hubei University of Technology, Wuhan, Hubei, South-Central University for Nationalities. He has
China. During his doctoral and postdoctoral stud- presided over the general projects of NSFC and participated in a number
ies, he participated in a number of national and provincial natural science of provincial and ministerial research projects. He is currently engaged
fund projects, and has published more than ten authoritative journals and in the research of biomedical sensors. His main research interests include
international conference papers. His main research interests include deep fluorescence sensing, micro sensing, multi-sensor, and so on.
learning, reinforcement learning, and medical image processing.

VOLUME 9, 2021 124131

