Alzheimer 4
Alzheimer 4
Alzheimer 4
Yi Ren Fung∗ , Ziqiang Guan∗ , Ritesh Kumar , Joie Yeahuay Wu and Madalina Fiterau
College of Information and Computer Sciences
University of Massachusetts Amherst
{yfung, zguan, riteshkumar, yeahuaywu, mfiterau}@cs.umass.edu
arXiv:1906.04231v1 [eess.IV] 10 Jun 2019
Table 1: Performance and methodology of some of the state-of-the-art studies on three-class Alzheimer’s Disease classification. The different
splitting schemes and subsets of the ADNI dataset used in evaluation make it hard to interpret the results meaningfully.
657 scans from 173 patients in the testing set. For consis- the training and test sets were split by visits, in which only the
tency, we used the same number of scans in all subsets for all latest visit of each patient was set aside for testing, accuracy
three splits. Our code repository is publicly available 1 , and it was slightly lower. However, accuracy dropped to around
includes the patients ids and visits that we evaluated. 50% when the training and test set were split by patient.
We note that the other state-of-the-art 2D CNN architec-
3.2 Model Architecture and Training tures we tried (DenseNet, InceptionNet, VGGNet) performed
We compared 2D and 3D CNN architecture performances. similarly to ResNet, and the choice of view in the 2D slice
For our 2D CNN architecture, we used ResNet18 [He et al., (coronal, axial, sagittal) did not lead to significant differences
2016] that was pretrained on ImageNet, allowing the model in testing accuracy as long as the slices were chosen to be
to learn how to better extract low-level features from images. close to the center of the brain. We report results on the 88th
For our 3D CNN architecture, we followed a residual slice of the coronal view for our 2D model. Our 3D model
network-based architectural design as well. We used the “bot- performs slightly better but is still limited in performance,
tleneck” configuration, where the inner convolutional layer of suffering the same problem of over-fitting to information on
each residual block contains half the number of filters, and we individual patients instead of learning what generally differ-
used the “full pre-activation” layout for the residual blocks. entiates a brain in the different stages of Alzheimer’s Disease.
See Figure 2 for more details.
For the 2D networks, we used a learning rate of 0.0001 and Ground Truth
L2 regularization constant of 0.01. For the 3D network, we AD MCI CN
used a learning rate of 0.001 and regularization constant of AD 96 59 17
0.0001. Both networks were trained for 36 epochs, with early Prediction MCI 62 153 90
stopping. CN 17 67 96
3.3 Splitting Methodology and Results Table 3: Confusion matrix of the 3D CNN experiment on the test
Our main goal is to investigate how differently the model per- set, with data split by patient.
forms under the following three scenarios: (1) random train-
ing and testing split across the brain MRIs, (2) training and
3.4 Analysis
testing split by patient ID, and (3) training and testing split
based on visit history across the patients. Analysis of the dataset shows that the frequency of disease
stage transition for patients between any two consecutive vis-
Model Train Acc. Test Acc. its is low in the ADNI dataset. There are only 152 transitions
Split by MRI randomly 99.2 ± 0.7% 83.7 ± 1.1% in the entire dataset, which contains 2,731 scans from 657
2D Split by visit history 99.0 ± 0.5% 81.2 ± 0.5% patients.
Split by patients 98.8 ± 0.6% 51.7 ± 1.2% We believe that due to the relatively few transition points in
Split by MRI randomly 95.8 ± 2.3% 84.4 ± 0.6% the dataset, the models are still able to achieve accuracy in the
3D Split by visit history 95.8 ± 2.2% 82.9 ± 0.3% 80% range for the splitting by visit experiments by repeating
Split by patients 86.3 ± 9.5% 52.4 ± 1.8% the diagnostic label of the previous visits. Out of the 152 total
transitions we found across the whole dataset, only 52 of them
Table 2: Our classification result of CNN models by different happened for patients between the n − 1th visit and nth visit.
train/test splitting scheme, averaged over five runs. This suggests that the model is able to encode the structure
of a patient’s brain from the training set, in turn aiding its
In Table 2, we present the three-way classification result of performance on the testing set.
the models described in Section 3.2. In summary, when the Furthermore, we set up additional experiments where we
training and testing sets were split by MRI scans randomly, trained on visits t1 ...tn−1 from all patients, but only tested on
the 2D and 3D models attained accuracy close to 84%. When the 52 patients that had a transition from the n − 1th to nth
visit. The classification accuracy on this experiment dropped
1 to around 54%, which suggests that the network was repeat-
https://github.com/Information-Fusion-Lab-Umass/alzheimers-
cnn-study ing information about the patient’s brain structure instead of
CN MCI AD
Figure 1: Comparison of the spatially normalized MRI scans of 4 subjects in each of the CN, MCI, and AD categories. To the human eye,
distinguishing the difference across the disease stages is a difficult task.
3x3 Conv Layer Batch Norm 3. Difficulty in distinguishing the visual difference of a
brain in the different Alzheimer’s stages. Human brains
Residual Block RELU
are distinct by nature, and the quality of MRI collections
3x3 Conv Layer 1x1 Conv from different clinical settings add to the noise level of the
Residual Block Batch Norm data. In Figure 1, we plotted out the brains of subjects
3x3 Conv Layer
in CN/MCI/AD, and show that the difference in anatomical
RELU
structure from CN to MCI to AD is very subtle.
Residual Block 1x1 Conv 4. No clear baseline. Many studies evaluated the per-
3x3 Conv Layer formance of their models on different subsets of the ADNI
Batch Norm
Residual Block dataset, making fair comparison a tricky task. In addition,
RELU
3x3 Conv Layer
studies that use a separate testing set do not report the sub-
1x1 Conv jects or scans that they used in their testing set, further com-
Residual Block plicating the comparison process.
3x3 Conv Layer We hope to keep these challenges in mind when designing
3x3 Conv Layer future experiments and ultimately design models that can re-
liably classify brain MRIs with its true stage in Alzheimer’s
Linear Layer
Disease progression, which is robust to visit number, lack of
patient transitions, and minor fluctuations in scan quality.
Figure 2: The architectures of our 3D CNN model and residual
blocks. With the exception of the first and last convolution layers, 4.2 Insights and Future Work
which have a stride of 1, all other layers have a stride of 2 for down- Many studies in Alzheimer’s disease brain MRI classifica-
sampling. The first convolution layers takes in a 1-channel image tion do not take into account how the data should be properly
and outputs a 32-channel output. split, putting into question the ability of the proposed mod-
els to generalize on unseen data. We fill this gap by provid-
learning to be discriminative among the different stages of ing detailed analysis of model performance across splitting
Alzheimer’s Disease exhibited by a particular MRI scan. schemes. Additionally, to our knowledge, none of the previ-
ous studies use all of the MRIs available in the ADNI dataset
and do not present a clear explanation for this decision. To ad-
4 Discussion dress the issue, we perform our experiments on all available
4.1 Technical Challenges data while also reporting the subjects used in the training and
We summarize the main challenges in working with the test split of all our experiments for reproducibility.
ADNI dataset as follows: In the future, we would like to explore utilizing the covari-
1. Lack of transitions in a patient’s health status be- ate data collected from patients to aid image feature extrac-
tween consecutive visits. There are only 152 transitions to- tion. Most of the studies we have come across do not use any
tal out of the entire dataset of 2,731 images collected from covariate information collected from patients. The covari-
patient visits. It is easy for the model to overfit and memorize ates, such as patient demographics and cognitive test scores,
the state of a patient at each visit instead of generalizing the may be helpful for the classification task since they correlate
key distinctions between the different stages of Alzheimer’s with the disease stage of the patient. A scenario could be a
Disease. multitask learning setup, where the model predicts the Mini-
2. Coarse-grained data labels. The data labels are coarse- Mental State Examination (MMSE) and Alzheimer’s Disease
grained in nature so our classifier may become confused when Assessment Scale (ADAS) cognitive scores in addition to the
trying to learn on cases when a patient’s cognitive state may labels. We think this may be helpful in training the model be-
be borderline, such as being between MCI and AD. The con- cause the cognitive test scores can provide finer-grained sig-
fusion matrix in Table 3 demonstrates this. nal for the model, making the prediction more robust.
References [Jain et al., 2019] Rachna Jain, Nikita Jain, Akshay Aggar-
[Ashburner and Friston, 2005] John Ashburner and Karl J wal, and D Jude Hemanth. Convolutional neural net-
work based alzheimer’s disease classification from mag-
Friston. Unified segmentation. Neuroimage, 26(3):839–
netic resonance brain images. Cognitive Systems Research,
851, 2005.
2019.
[Association and others, 2017] Alzheimer’s Association [Khvostikov et al., 2018] Alexander Khvostikov, Karim
et al. 2017 alzheimer’s disease facts and figures. Aderghal, Jenny Benois-Pineau, Andrey Krylov, and
Alzheimer’s & Dementia, 13(4):325–373, 2017. Gwenaelle Catheline. 3d cnn-based classification using
[Bäckström et al., 2018] Karl Bäckström, Mahmood Nazari, smri and md-dti images for alzheimer disease studies.
Irene Yu-Hua Gu, and Asgeir Store Jakola. An efficient 3d arXiv preprint arXiv:1801.05968, 2018.
deep convolutional network for alzheimer’s disease diag- [Korolev et al., 2017] Sergey Korolev, Amir Safiullin,
nosis using mr images. In 2018 IEEE 15th International Mikhail Belyaev, and Yulia Dodonova. Residual and plain
Symposium on Biomedical Imaging (ISBI 2018), pages convolutional neural networks for 3d brain mri classifi-
149–153. IEEE, 2018. cation. In 2017 IEEE 14th International Symposium on
[Bhagwat et al., 2018] Nikhil Bhagwat, Joseph D Vi- Biomedical Imaging (ISBI 2017), pages 835–838. IEEE,
viano, Aristotle N Voineskos, M Mallar Chakravarty, 2017.
Alzheimer’s Disease Neuroimaging Initiative, et al. [Payan and Montana, 2015] Adrien Payan and Giovanni
Modeling and prediction of clinical symptom trajectories Montana. Predicting alzheimer’s disease: a neuroimag-
in alzheimer’s disease using longitudinal data. PLoS ing study with 3d convolutional neural networks. arXiv
computational biology, 14(9):e1006376, 2018. preprint arXiv:1502.02506, 2015.
[Bron et al., 2015] Esther E Bron, Marion Smits, Wiesje M [Plant et al., 2010] Claudia Plant, Stefan J Teipel, An-
Van Der Flier, Hugo Vrenken, Frederik Barkhof, Philip nahita Oswald, Christian Böhm, Thomas Meindl, Janaina
Scheltens, Janne M Papma, Rebecca ME Steketee, Car- Mourao-Miranda, Arun W Bokde, Harald Hampel, and
olina Méndez Orellana, Rozanna Meijboom, et al. Stan- Michael Ewers. Automated detection of brain atrophy pat-
dardized evaluation of algorithms for computer-aided di- terns based on mri for the prediction of alzheimer’s dis-
agnosis of dementia based on structural mri: the cadde- ease. Neuroimage, 50(1):162–174, 2010.
mentia challenge. NeuroImage, 111:562–579, 2015. [Simonyan and Zisserman, 2014] Karen Simonyan and An-
[Farooq et al., 2017] Ammarah Farooq, SyedMuhammad drew Zisserman. Very deep convolutional networks
Anwar, Muhammad Awais, and Saad Rehman. A deep cnn for large-scale image recognition. arXiv preprint
based multi-class classification of alzheimer’s disease us- arXiv:1409.1556, 2014.
ing mri. In 2017 IEEE International Conference on Imag- [Szegedy et al., 2017] Christian Szegedy, Sergey Ioffe, Vin-
ing systems and techniques (IST), pages 1–6. IEEE, 2017. cent Vanhoucke, and Alexander A Alemi. Inception-v4,
[Fischl, 2012] Bruce Fischl. Freesurfer. Neuroimage, inception-resnet and the impact of residual connections on
62(2):774–781, 2012. learning. In Thirty-First AAAI Conference on Artificial In-
telligence, 2017.
[Gerardin et al., 2009] Emilie Gerardin, Gaël Chételat,
Marie Chupin, Rémi Cuingnet, Béatrice Desgranges, Ho- [Wang et al., 2018] Shuqiang Wang, Hongfei Wang, Yanyan
Sung Kim, Marc Niethammer, Bruno Dubois, Stéphane Shen, and Xiangyu Wang. Automatic recognition of mild
Lehéricy, Line Garnero, et al. Multidimensional clas- cognitive impairment and alzheimers disease using en-
sification of hippocampal shape features discriminates semble based 3d densely connected convolutional net-
alzheimer’s disease and mild cognitive impairment from works. In 2018 17th IEEE International Conference on
normal aging. Neuroimage, 47(4):1476–1486, 2009. Machine Learning and Applications (ICMLA), pages 517–
523. IEEE, 2018.
[He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing
Ren, and Jian Sun. Deep residual learning for image recog- [Wu et al., 2018] Congling Wu, Shengwen Guo, Yan-
nition. In Proceedings of the IEEE conference on computer jia Hong, Benheng Xiao, Yupeng Wu, Qin Zhang,
vision and pattern recognition, pages 770–778, 2016. Alzheimer’s Disease Neuroimaging Initiative, et al. Dis-
crimination and conversion prediction of mild cognitive
[Hon and Khan, 2017] Marcia Hon and Naimul Mefraz impairment using convolutional neural networks. Quanti-
Khan. Towards alzheimer’s disease classification through tative Imaging in Medicine and Surgery, 8(10):992, 2018.
transfer learning. In 2017 IEEE International Conference
on Bioinformatics and Biomedicine (BIBM), pages 1166–
1169. IEEE, 2017.
[Hosseini-Asl et al., 2016] Ehsan Hosseini-Asl, Georgy
Gimel’farb, and Ayman El-Baz. Alzheimer’s disease
diagnostics by a deeply supervised adaptable 3d con-
volutional network. arXiv preprint arXiv:1607.00556,
2016.