Sparse Coding of Pathology Slides Compared to Tran
Sparse Coding of Pathology Slides Compared to Tran
R ES EA R CH Open Access
Abstract
Background: Histopathology images of tumor biopsies present unique challenges for applying machine learning to
the diagnosis and treatment of cancer. The pathology slides are high resolution, often exceeding 1GB, have
non-uniform dimensions, and often contain multiple tissue slices of varying sizes surrounded by large empty regions.
The locations of abnormal or cancerous cells, which may constitute a small portion of any given tissue sample, are not
annotated. Cancer image datasets are also extremely imbalanced, with most slides being associated with relatively
common cancers. Since deep representations trained on natural photographs are unlikely to be optimal for classifying
pathology slide images, which have different spectral ranges and spatial structure, we here describe an approach for
learning features and inferring representations of cancer pathology slides based on sparse coding.
Results: We show that conventional transfer learning using a state-of-the-art deep learning architecture pre-trained
on ImageNet (RESNET) and fine tuned for a binary tumor/no-tumor classification task achieved between 85% and 86%
accuracy. However, when all layers up to the last convolutional layer in RESNET are replaced with a single feature map
inferred via a sparse coding using a dictionary optimized for sparse reconstruction of unlabeled pathology slides,
classification performance improves to over 93%, corresponding to a 54% error reduction.
Conclusions: We conclude that a feature dictionary optimized for biomedical imagery may in general support better
classification performance than does conventional transfer learning using a dictionary pre-trained on natural images.
Keywords: Cancer pathology slides, TCGA, Sparse coding, Locally Competitive Algorithm, Unsupervised learning,
Transfer learning, Deep learning
© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Fischer et al. BMC Bioinformatics 2018, 19(Suppl 18):489 Page 10 of 110
Table 1 Matched tumor/non-tumor tissue images reconstructed image with an amplitude equal to its acti-
Tissue of origin Tumor type Count vation. For any particular input image, the optimal sparse
Adrenal gland Pheochromocytoma and 6 representation is given by the vector of neural activations
Paraganglioma that minimizes both image reconstruction error and the
Bile duct Cholangiocarcinoma 18 number of neurons with non-zero activity. Formally, find-
Bladder Bladder Urothelial Carcinoma 45 ing a sparse representation involves finding the minimum
Breast Breast Invasive Carcinoma 429
of the following cost function:
− 2
Colon Colon Adenocarcinoma 130 → − → 1 −→ −
→ →
Colon Rectum Adenocarcinoma 27
E I , φ, a = min −
→
I − φ ∗ a + λ −a 1 . (1)
{ a , φ} 2
Cervix Cervical Squamous Cell Carcinoma 6
and Endocervical Adenocarcinoma −
→
In Eq. (1), I is an image unrolled into a vector, and φ
Stomach Stomach Adenocarcinoma 68 is a dictionary of feature kernels that are convolved with
Head and neck Head and Neck Squamous Cell 116 the feature maps − →
a that constitute a sparse representa-
Carcinoma tion of the image. The factor λ is a tradeoff parameter;
Lung Lung Adenocarcinoma 179 larger λ values encourage greater sparsity (fewer non-zero
Lung Lung Squamous Cell Carcinoma 115 coefficients) at the cost of greater reconstruction error.
Liver Liver Hepatocellular Carcinoma 118 Both the feature maps − →a and the dictionary of fea-
Esophagus Esophageal Carcinoma 16
ture kernels φ can be determined by a variety of stan-
dard methods. Here, we solved for the feature maps
Pancreas Pancreatic Adenocarcinoma 8
using a convolutional generalization, previously described
Prostate Prostate Adenocarcinoma 124 [16, 25], of the Locally Competitive Algorithm (LCA)
Kidney Kidney Chromophobe 69 [26], where the feature kernels themselves are adapted
Kidney Kidney Renal Clear Cell Carcinoma 214 according to a local Hebbian learning rule that reduces
Kidney Kidney Renal Papillary Cell 78 reconstruction error given a sparse representation. Dictio-
Carcinoma nary learning was thus performed via Stochastic Gradient
Sarcoma Sarcoma 4 Descent (SGD). Unsupervised dictionary learning used
Melanoma (skin) Skin Cutaneous Melanoma 2 the entire data set. This was not perceived to be problem-
atic as the learned features were clearly generic, and both
Thyroid Thyroid Carcinoma 114
tumor and non-tumor images were promiscuously inter-
Thymus Thymoma 4
mingled. Both dictionary learning and sparse coding was
Uterus Uterine Corpus Endometrial 54 performed using PetaVision [27], an open source neural
Carcinoma
simulation toolbox that uses MPI, OpenMP and CUDA
For each tumor from a given patient, at least one slide image was labeled as
cancerous (“primary tumor”) and at least one image as “normal” (adjacent samples
libraries to enable multi-node, multi-core and/or GPU
or clean margin) accelerated high-performance implementations of sparse
solvers derived from LCA.
Fig. 1 Preprocessing of TCGA pathology slides. Full-extent low-resolution images were used to determine image coordinates; full-resolution image
slices were used to generate sparse representations. Top: initial image; center: fast Fourier transform versus all-white, to determine optically dark
regions of the image; bottom: non-overlapping image slices representing a succession of darkest remaining portions of the image. Full resolution
regions of interest (ROIs; colored boxes) were extracted from the SVS file; the four darkest ROIs from each image were used for the analyses reported
here
we treated the (non-overlapping) ROIs as distinct sam- kernel was replicated with a stride of 4 pixels in both the
ples. The feature maps for each ROI were average-pooled, vertical and horizontal directions, resulting in a feature
producing a 512-element reduced representation of each map of size 512 × 512. The sparsity of the feature map
ROI. The pooled representation for each ROI was used to is shown in Fig. 3. The set of 512 learned feature kernels
train a linear support vector machine (SVM) [28] as well can be visualized as RGB color image patches 32 × 32 in
as an MLP to discriminate between ROIs derived from extent (Fig. 4). The learned dictionary is clearly special-
tumor and non-tumor slide images. ized for pathology images. Although some feature kernels
appear rather generic, representing short edge segments,
Results typically with a slight curvature, many feature kernels
Learned dictionary of convolutional feature kernels resemble specific cytological structures. In particular,
We trained a convolutional dictionary for sparse recon- since the two different stains bind differentially to distinct
struction of 2048 × 2048 pixel full-resolution image slices cellular components (i.e., nucleic acid/chromatin vs pro-
(ROIs) extracted from TCGA images (Fig. 1). Each feature tein/extracellular matrix), we expect feature kernels that
Fischer et al. BMC Bioinformatics 2018, 19(Suppl 18):489 Page 13 of 110
Fig. 2 Sample region-of-interest (ROI) images. Each group of 8 small images contains ROIs derived from contemporaneous normal and tumor tissue
samples from a single patient; within each group, the top row of 4 represents normal tissue; the bottom row, tumor tissue. Groups represent the
following tumor types (left to right): row 1, adrenal, bile duct, bladder, stomach; row 2, breast, breast, colon, colon; row 3, lung, liver, pancreas,
thyroid; row 4, prostate, prostate, kidney, kidney. Some sample pairs show overt tumor signatures (e.g., tissue disorganization, densely packed nuclei
associated with rapid proliferation), but other samples lack such obvious features
Fig. 3 Distribution of feature coefficients. Histogram giving the percentage of non-zero activation coefficients for each of the 512 512 × 512 feature
maps, averaged over a large set of ROIs
Fischer et al. BMC Bioinformatics 2018, 19(Suppl 18):489 Page 14 of 110
Fig. 4 Feature dictionary. Dictionary of 512 convolutional feature kernels learned from the complete set of tumor and non-tumor image ROIs
combine spectral and structural elements to encode spe- the parsimonious representation of tumor images can be
cific subcellular components. We hypothesize that some useful for classification, we used a linear support vec-
of the specialized feature kernels could be discriminative tor machine (SVM) [28] to perform binary discrimination
for tumor related pathologies. of tumor versus non-tumor on each ROI. Input to the
classifier consisted of the sparse feature maps, pooled
Image reconstructions to a 512-element vector corresponding to the average
We evaluated the effectiveness of the image abstraction by coefficient for each feature (average-pooling). By using a
reconstructing ROI images based on the feature dictionar- relatively simple linear SVM classifier, we were able to
ies and the image-specific sparse coefficients. A sample of directly test the discriminative power of the sparse repre-
such reconstructions is shown in Fig. 5: although there are sentations themselves without the confound of additional
perceptible differences in color values, the reconstruction nonlinearities. The classification accuracy we achieved
of fine structure is remarkably accurate. (84.23%, with chance performance of 56% due to the slight
preponderance of tumor slices in the dataset) shows that
Discrimination between tumor/non-tumor our unsupervised sparse representations captured some
To test the hypothesis that sparse representations aspects of tumorous versus non-tumorous tissue – i.e.,
obtained using convolutional dictionaries optimized for some generic features such as (possibly) a preponderance
Fischer et al. BMC Bioinformatics 2018, 19(Suppl 18):489 Page 15 of 110
Fig. 5 Image reconstructions. Samples of reconstructed images based on convolutional feature kernels and weights (coefficients). Top: original
images; bottom: reconstructions
of proliferating nuclei. We also tried max-pooling and his- learning approach based on sparse coding, we obtained
togramming activation coefficients but obtained poorer a classification accuracy of 93.32% ± 0.21%, approxi-
classification results (data not shown). mately 54% error reduction from the conventional transfer
learning approach. Classification performance of the 3
approaches is shown in Table 2.
Transfer learning based on sparse coding
As a control, we employed a state-of-the-art deep learn-
Discussion
ing architecture for image classification, Residual Net-
Our results suggest that optimizing a dictionary for a
work (RESNET), to examine performance of conven-
sparse coding directly on raw unlabeled histological data
tional transfer learning on our dataset. We started with
and using that dictionary to infer sparse representa-
RESNET-152 from Keras libraries built in TensorFlow
tions on each image can support substantially better
using previously learned weights [29, 30], obtained from
performance than transfer learning based on features
about a million training images [31]. We retrained the
optimized for natural images [5]. An approach based on
final all-to-all layers from scratch on the same TCGA
sparse coding yields features specialized for the parsimo-
ROI images as used above. The convolutional layers were
nious reconstruction of histology slides, without requiring
fine-tuned as well. The first all-to-all layer consisted
either extensive hand-labeling or segmentation of images,
of 1,000 fully-connected elements followed by a drop-
and yet achieves respectable classification accuracy. The
out and a softmax layer. Thus, we began with convolu-
fact that features learned in an unsupervised manner can
tional features optimized for classifying natural images
nonetheless support accurate classification might at first
but used the available training data to adapt an exist-
seem surprising. State-of-the-art deep neural networks,
ing RESNET architecture for classifying cancer pathology
trained in a fully supervised manner so as to yield a
slides. Training/test subsets were approximately in the
maximally discriminative set of features, approach human
ratio of 5/1, respectively. We obtained a classification
levels of performance on a variety of benchmark image
score of 85.48% ± 0.36% on holdout test data, slightly
classification tasks. Features trained in an unsupervised
higher than our score obtained by feeding sparse coeffi-
cients into a linear SVM classifier (84.32%).
Next, we employed an analogous transfer learning
approach using our sparse coding feature map fed directly Table 2 Summary of classification performances
into the all-to-all layers at the top of the RESNET Approach Classification score
architecture. These all-to-all layers consisted of a fully-
Sparse coding, SVM 84.23%
connected 512-element table, a drop-out layer, and a
RESNET-152 85.48 ±0.36%
softmax classification layer. Again, training/test subsets
were approximately in the ratio of 5/1. For the transfer Sparse coding, MLP 93.32 ±0.21%
Fischer et al. BMC Bioinformatics 2018, 19(Suppl 18):489 Page 16 of 110
manner for sparse reconstruction, on the other hand, are Additional file
not required to be discriminative per se (e.g. between
cancerous and non-cancerous tissue), but are required Additional file 1: Tab-delimited file
to enable parsimonious descriptions of the data. In the
1. tcga_hist_file_name (original name of image file as downloaded from
case of histology slides, it is not unreasonable that fea- Genomic Data Commons)
tures optimized for sparse reconstruction might naturally 2. tcga_project_code
correspond to physiologically meaningful entities, such 3. tumor_type (TCGA project tumor type)
4. iocd_topo_code (IOCD topographical code for tumor sample)*
as cell membrane, cytoplasm, nuclear material and other 5. iocd_morph_code (IOCD morphological code for tumor sample)*
subcellular structures, as such features likely enable the 6. patient_id (TCGA patient ID)
most parsimonious explanation of the data. Occasionally, 7. sample_id (TCGA sample ID)
8. sample_type (Primary Tumor, Solid Tissue Normal, or Metastatic)
such physiologically-meaningful features will be naturally
discriminative between cancerous and non-cancerous tis- * normal samples are taken from the vicinity of tumor samples and are
sue even though such discrimination was not explicitly labelled with the same IOCD codes. (TXT 294 kb)
optimized for. While deep learning approaches would
Acknowledgements
likely have produced superior results given enough labeled This work was performed under the auspices of the U.S. Department of Energy
training examples, such labeled datasets can only be pre- by Los Alamos National Laboratory under Contract DE-AC5206NA25396. We
pared by highly trained pathologists and are currently thank Brendt Wohlberg for help with the SPORCO library.
unavailable. Instead, we started with a deep neural net- Funding
work optimized for the classification of natural images, This work was supported in part by the Joint Design of Advanced Computing
which are clearly very different from pathology slides, Solutions for Cancer (JDACS4C) program established by the U.S. Department
of Energy (DOE) and the National Cancer Institute (NCI) of the National
and would be unlikely to contain features correspond- Institutes of Health. Publication costs were funded by JDACS4C; the funding
ing to subcellular components. Absent sufficient labeled body had no role in the design or conclusions of the study.
training data, our results indicate that a hybrid approach
Availability of data and materials
based on unsupervised sparse coding followed by a rel- Source data are available from TCGA (see text). ROI images (62 GB) are
atively shallow but non-linear fully-supervised classifier available on request.
supports the best classification performance. Finally, we
About this supplement
attempted no systematic search of meta-parameters to This article has been published as part of BMC Bioinformatics Volume 19
optimize the classification performance supported by our Supplement 18, 2018: Selected Articles from the Computational Approaches for
hybrid approach based on sparse coding followed by Cancer at SC17 workshop. The full contents of the supplement are available
online at https://bmcbioinformatics.biomedcentral.com/articles/
an MLP with a single hidden layer. Thus, it is likely supplements/volume-19-supplement-18.
that our reported classification performance could be
improved by optimizing various meta-parameters such as Authors’ contributions
WF and GTK designed the study. JDC assembled and annotated TCGA
the patch size, number of dictionary elements and overall datasets. GTK, WF, NTTN, and SSM wrote code, performed analysis, and wrote
sparsity [32]. the paper. All authors read and approved of the final manuscript.
Published: 21 December 2018 20. Otsu N. A threshold selection method from gray-level histograms. IEEE
Trans Sys Man Cyber. 1979;9(1):62–6.
References 21. Wohlberg B. SPORCO: A Python package for standard and convolutional
1. Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz Jr LA, Kinzler sparse representations. In: Proceedings of the 15th Python in Science
KW. Cancer genome landscapes. Science. 2013;339(6127):1546–58. Conference, Austin, TX, USA; 2017. p. 1–8.
https://doi.org/10.1126/science.1235122. 22. Candès EJ, Romberg J, Tao T. Robust uncertainty principles: Exact signal
2. TCGA Research Network. http://cancergenome.nih.gov/. Accessed 2 Mar reconstruction from highly incomplete frequency information. IEEE Trans
2017. Inform Theory. 2006;52:489.
3. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, 23. Donoho D. Compressed sensing. IEEE Trans Inform Theory. 2006;52:1289.
van der Laak JAWM, van Ginneken B, Sánchez CI. A survey on deep 24. Olshausen BA, Field D. Emergence of simple-cell receptive field properties
learning in medical image analysis. Med Image Anal. 2017;42:60–88. by learning a sparse code for natural images. Nature. 1996;381:607.
https://doi.org/10.1016/j.media.2017.07.005. 25. Schultz PF, Paiton DM, Lu W, Kenyon GT. Replicating kernels with a short
4. Shen D, Wu G, Suk H-I. Deep learning in medical image analysis. Annu stride allows sparse reconstructions with fewer independent kernels.
Rev Biomed Eng. 2017;19:221–48. https://doi.org/10.1146/annurev- arXiv:1406.4205v1. 2014. https://arxiv.org/abs/1406.4205.
bioeng-071516-044442. 26. Rozell CJ, Johnson DH, Baraniuk RG, Olshausen BA. Sparse coding via
5. Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, thresholding and local competition in neural circuits. Neural Comput.
Liang J. Convolutional neural networks for medical image analysis: Full 2008;20:2526.
training or fine tuning? IEEE Trans Med Imaging. 2016;35(5):1299–312. 27. Petavision. https://petavision.github.io. Accessed 29 July 2018.
https://doi.org/10.1109/TMI.2016.2535302. 28. Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J. LIBLINEAR: A library
6. Xu Y, Jia Z, Wang L-B, Ai Y, Zhang F, Lai M, Chang EI-C. Large scale tissue for large linear classification. J Mach Learn Res. 2008;9(Aug):1871–4.
histopathology image classification, segmentation, and visualization via 29. He K, Zhang X, Ren S, Sun J. Deep residual learning for image
deep convolutional activation features. BMC Bioinformatics. 2017;18(1): recognition. arXiv:1512.03385v1. 2015. https://arxiv.org/abs/1512.03385.
281. https://doi.org/10.1186/s12859-017-1685-x. 30. Yu F. ResNet-152 in Keras. https://gist.github.com/flyyufelix/
7. Khosravi P, Kazemi E, Imielinski M, Elemento O, Hajirasouliha I. Deep 7e2eafb149f72f4d38dd661882c554a6. Accessed 26 Nov 2018.
convolutional neural networks enable discrimination of heterogeneous 31. Russakovsky O, Deng J, Su H, Krause J, Satheesh MS, Huang Z, Karpathy A,
digital pathology images. EBioMedicine. 2018;27:317–28. https://doi.org/ Khosla A, Bernstein M. Imagenet large scale visual recognition challenge.
10.1016/j.ebiom.2017.12.026. arXiv:1409.0575v3. 2014. https://arxiv.org/abs/1409.0575.
8. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep 32. Carroll J, Carlson N, Kenyon GT. Phase transitions in image denoising via
convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, sparsely coding convolutional neural networks. arXiv:1710.09875v1. 2017.
Weinberger KQ, editors. Advances in Neural Information Processing https://arxiv.org/abs/1710.09875.
Systems 25. 2012. p. 1097–1105.
9. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):
436–44.
10. Chang H, Zhou Y, Spellman P, Parvin B. Stacked predictive sparse coding
for classification of distinct regions of tumor histopathology. Proc IEEE Int
Conf Comput Vis. 2013169–76. https://doi.org/10.1109/ICCV.2013.28.
11. Robertson S, Azizpour H, Smith K, Hartman J. Digital image analysis in
breast pathology-from image processing techniques to artificial
intelligence. Transl Res. 2018;194:19–35. https://doi.org/10.1016/j.trsl.
2017.10.010.
12. Gheisari S, Catchpoole DR, Charlton A, Kennedy PJ. Convolutional deep
belief network with feature encoding for classification of neuroblastoma
histological images. J Pathol Inform. 2018;9:17.
13. Sharma H, Zerbe N, Klempert I, Hellwich O, Hufnagl P. Deep
convolutional neural networks for automatic classification of gastric
carcinoma using whole slide images in digital histopathology. Comput
Med Imaging Graph. 2017;61:2–13. https://doi.org/10.1016/j.
compmedimag.2017.06.001.
14. Wang D, Khosla A, Gargeya R, Irshad H, Beck AH. Deep learning for
identifying metastatic breast cancer. arXiv:1606.05718v1. 2016. https://
arxiv.org/abs/1606.05718.
15. Coates A, Ng AY. The importance of encoding versus training with sparse
coding and vector quantization. In: Proceedings of the 28th International
Conference on Machine Learning ICML; 2011.
16. Zhang X, Kenyon G. A deconvolutional strategy for implementing large
patch sizes supports improved image classification. In: Proceedings of the
9th EAI International Conference on Bio-inspired Information and
Communications Technologies (formerly BIONETICS). ICST (Institute for
Computer Sciences, Social-Informatics and Telecommunications
Engineering); 2016. p. 529–534.
17. International Classification of Diseases, Tenth Revision, Clinical
Modification (ICD-10-CM). https://www.cdc.gov/nchs/icd/icd10cm.htm.
Accessed 8 Mar 2017.
18. Goode A, Gilbert B, Harkes J, Jukic D, Satyanarayanan M. Openslide: A
vendor-neutral software foundation for digital pathology. J Pathol Inform.
2013;4:27. https://doi.org/10.4103/2153-3539.119005.
19. Eaton JW, Bateman D, Hauberg S, Wehbring R. GNU Octave Version 4.2.0
Manual: a High-level Interactive Language for Numerical Computations.
http://www.gnu.org/software/octave/doc/interpreter. Accessed 1 Nov
2017.