Albumentation
Albumentation
Albumentation
Article
Albumentations: Fast and Flexible
Image Augmentations
Alexander Buslaev 1 , Vladimir I. Iglovikov 2 , Eugene Khvedchenya 3 , Alex Parinov 4 ,
Mikhail Druzhinin 5 and Alexandr A. Kalinin 6,7, *
1 Mapbox, Minsk 220030, Belarus; al.buslaev@gmail.com
2 Lyft Level 5, Palo Alto, CA 94304, USA; iglovikov@gmail.com
3 ODS.ai, Odessa 65000, Ukraine; ekhvedchenya@gmail.com
4 X5 Retail Group, Moscow 119049, Russia; creafz@gmail.com
5 Simicon, Saint Petersburg 195009, Russia; dipetm@gmail.com
6 Department of Computational Medicine and Bioinformatics, University of Michigan,
Ann Arbor, MI 48109, USA
7 Shenzhen Research Institute of Big Data, Shenzhen 518172, Guangdong, China
* Correspondence: akalinin@umich.edu
Received: 31 December 2019; Accepted: 20 February 2020; Published: 24 February 2020
Abstract: Data augmentation is a commonly used technique for increasing both the size and the
diversity of labeled training sets by leveraging input transformations that preserve corresponding
output labels. In computer vision, image augmentations have become a common implicit
regularization technique to combat overfitting in deep learning models and are ubiquitously used to
improve performance. While most deep learning frameworks implement basic image transformations,
the list is typically limited to some variations of flipping, rotating, scaling, and cropping. Moreover,
image processing speed varies in existing image augmentation libraries. We present Albumentations,
a fast and flexible open source library for image augmentation with many various image transform
operations available that is also an easy-to-use wrapper around other augmentation libraries.
We discuss the design principles that drove the implementation of Albumentations and give an
overview of the key features and distinct capabilities. Finally, we provide examples of image
augmentations for different computer vision tasks and demonstrate that Albumentations is faster
than other commonly used image augmentation tools on most image transform operations.
1. Introduction
Modern machine learning models such as deep artificial neural networks often have a very large
number of parameters, which allows them to generalize well when trained on massive amounts
of labeled data [1]. In practice, such large labeled datasets are not always available for training,
which leads to the elevated risk of overfitting [1–3]. Data augmentation is a commonly used
technique for increasing both the size and the diversity of labeled training sets by leveraging input
transformations that preserve corresponding output labels. In computer vision, image augmentations
have become a common regularization technique to combat overfitting in deep convolutional
neural networks and are ubiquitously used to improve performance on various tasks [4–6].
Image augmentations have also been shown to improve convergence [7], generalization and robustness
on out-of-distribution samples [8,9], and to overall have more advantages compared to other
regularization techniques [10].
While most popular deep learning frameworks such as TensorFlow [11], Keras [12],
and PyTorch [13] implement basic image transformations, the range is typically limited to some
variations and combinations of flipping, rotating, scaling, and cropping. Different domains, imaging
modalities, and tasks may benefit from a wide range of different degrees and combinations of various
image transformations [14–16]. The need to use more sophisticated training set augmentation recipes
has been typically addressed by custom implementations of image transformations for the task in
hand using low-level libraries, for example OpenCV [17] and Pillow [18]. However, implementing
new complex transforms and their combinations from scratch can be challenging, time-consuming,
and error-prone [19], especially in tasks with complex targets such as image segmentation, as we
discuss further in Section 2.3. The development of the image augmentation-specific tools, for instance
imgaug [20], Augmentor [21], and CLoDSA [22], aimed to fill in this gap. However, existing
solutions typically focus on a variety of operations, processing speed, or flexibility of the application
programming interface (API), at the cost of other factors. Thus, there is a need for a flexible image
augmentation tool that allows combining a wide range of image transforms and annotation types.
In this paper, we present Albumentations, an open source Python library for fast and flexible
image augmentations: https://github.com/albumentations-team/albumentations. Albumentations
efficiently implements a rich variety of image transform operations that are optimized for performance,
and does so while providing a concise, yet powerful image augmentation interface for different
computer vision tasks, including object classification, segmentation, and detection. We demonstrate
that Albumentations is faster than other popular image augmentation tools on most commonly used
image transformations, without sacrificing the variety of operations or an ability to compose them into
more complex pre-processing pipelines.
2. Background
Although there have been attempts to evaluate individual contributions of image transform
operations to overall accuracy [42], different augmentation strategies still lead to performance
variability even on the well studied performance benchmarks. Most of the above proposed approaches
used a fixed set of image transforms and suggested that further improvements can likely be made
by expanding the pool of used image augmentations. This is why we identified the variety of the
available transforms as one of the core needs for the productive use of image augmentations.
Albumentations aims to tackle these challenges by providing a flexible and convenient Python
interface for a rich variety of augmentations for image classification, segmentation, and object detection,
based on optimized implementations of transform operations that outperform their alternatives.
3. Design Principles
Albumentations implements a design that seeks to provide a balanced approach to addressing
the existing needs. Overall, it relies on five main design principles.
3.1. Performance
In a typical deep learning hardware configuration, CPU can be a performance bottleneck, thus
the speed of individual transform operations becomes a top priority. Albumentations strives to deliver
the best performance on the most of commonly used augmentations by wrapping multiple low-level
image manipulation libraries and choosing the fastest underlying implementation. As a trade-off,
Albumentations has to rely on a bigger number of dependencies compared to other high-level wrappers
that are based on a single library.
3.2. Variety
Since finding an optimal set of augmentations for a particular task, dataset, and domain is still
an open research topic, it is important to provide an extensive list of available operations that can be
quickly tested for the problem at hand. Therefore, Albumentations aims to implement a very diverse
range of image transforms. It includes all or almost all basic and commonly used operations such that
most research results from recent studies can be validated on an extended pool of augmentations. It also
adds some task- and domain-specific image transformations. This is where the trade-off discussed in
Section 3.1 gives Albumentations another advantage, since it now can combine operations that were
previously unique to an individual low-level library.
3.3. Conciseness
To maximize user productivity and enable quick and easy experimentation with different
augmentations and their combinations, Albumentations has to provide a concise and convenient
interface that is powerful and intuitively clear. It should provide enough control for fine-tuning
parameters of all operations, if needed, but the complexity of transform implementations should
be hidden behind the API. With that in mind, we consider augmentations for object detection
and image segmentation to be as important as for image classification. Therefore, Albumentations
automatically applies transforms to complex target annotations, such as bounding boxes, keypoints,
and segmentation masks.
3.4. Flexibility
Albumentations is constantly evolving and changing, as new image transforms are being
proposed, the community requests support for new features, and the underlying implementations of
low-level image operations are being optimized (less frequently). Moreover, we have seen a quick
adoption of Albumentations in both Kaggle and research communities, as well as in commercial
companies. Contributions from these communities help Albumentations to grow, to test and validate
its features, and to define the direction of the future development. Therefore, the architecture of
Albumentations should be flexible quickly to adapt to these changes and to enable simple ways to
contribute new transforms, parameters, tests, and tutorials.
and robust, it is also essential for the code base to provide a clear way to make contributions and
maintain the high quality of code simultaneously. Albumentations is published under a permissive
free software license and has a contribution guide [55]. For each commit to the Albumentations source
code, continuous integration tools first perform style check and then run unit tests that cover the entire
code base. To the date, we have more than 5000 unit tests that cover standard situations and corner
cases that different transforms may encounter.
4. Key Features
Over the past few years, multiple image augmentations libraries have been developed, including
imgaug [20], torchvision [13], Augmentor [21], CLoDSA [22], SOLT [56], and Automold [57]. Each
has its own advantages when used for a specific task/domain/dataset combination. However,
these libraries did not satisfy our requirements for a wide enough range of implemented transform
operations, performance, or support for multiple targets. Since existing tools lacked a balanced
approach, the authors of this paper have been independently developing their own custom solutions
to make augmentations execute more quickly, in order to keep GPUs utilization during training close
to 100%. These custom implementations in part relied and built upon existing libraries, combining,
extending, and modifying available image operations. At some point, these solutions were merged
together into what later became Albumentations, with the first public alpha release in June 2018.
To support users of different deep learning frameworks, Albumentations provides a convenient
Python interface to enable a seamless integration with PyTorch [13], Keras [12], and Tensorflow [11].
Albumentations supports all of the most commonly used image transform operations (see Figure 1)
and some domain- or task-specific ones, for example changes in weather conditions for autonomous
vehicles perception modeling [57]. The rest of this central section is organized as follows: we list
features that make Albumentations stand out compared to other similar solutions, and provide
examples of code to illustrate how to use them in practice.
4.2. Composition
Composition allows applying multiple augmentations to an input image sequentially or using
simple control-flow logic. In a composition, each transformation takes the output of the previous
transformation as an input. This simple, yet powerful technique enables building sophisticated
pipelines of transforms that in fact can be implemented in different low-level array manipulation
libraries. A composition allows to describe such a complex sequence of augmentations in a simple
and clear declarative fashion. Albumentations implements a few different ways of composing image
transform operators (see Figure S2).
Let us consider a real-life example from one of the top-performing solutions from the APTOS
2019 Blindness Detection Challenge [58]:
t r a n s f o r m = A. Compose ( [
A. OneOf ( [
A. S h i f t S c a l e R o t a t e ( . . . , p = 0 . 5 ) ,
A. E l a s t i c T r a n s f o r m ( . . . , p = 0 . 5 ) ,
A. O p t i c a l D i s t o r t i o n ( . . . , p = 0 . 5 ) ,
A. G r i d D i s t o r t i o n ( . . . , p = 0 . 5 ) ,
A. NoOp ( )
]) ,
A. RandomSizedCrop ( . . . , p = 0 . 3 ) ,
A. ISONoise ( p = 0 . 5 ) ,
A. OneOf ( [
A. RandomBrightnessContrast ( . . . , p = 0 . 5 ) ,
A. RandomGamma ( . . . , p = 0 . 5 ) ,
A. NoOp ( )
Information 2020, 11, 125 7 of 20
]) ,
A. OneOf ( [
A. FancyPCA ( . . . , p = 0 . 5 ) ,
A. RGBShift ( . . . , p = 0 . 5 ) ,
A. HueSaturationValue ( . . . , p = 0 . 5 ) ,
A. ToGray ( p = 0 . 2 ) ,
A. NoOp ( )
]) ,
A. ChannelDropout ( p = 0 . 5 ) ,
A. RandomGridShuffle ( p = 0 . 3 ) ,
A. RandomRotate90 ( p = 0 . 5 ) ,
A. Transpose ( p = 0 . 5 )
])
In this example, an augmentation pipeline applies one of spatial augmentations
(ShiftScaleRotate, ElasticTransform, OpticalDistortion, GridDistortion, or none) as a first
step. This type of grid transformations is commonly used in biomedical image analysis (see example in
Figure 2). Then, random cropping with resizing occurs with a probability of 30%, followed by a set of
color image augmentations (ISO camera noise, brightness, contrast and gamma adjustment, color shift
augmentations, and color removal). Finally, a random channel can be dropped out and/or grid shuffle
can be applied. In the final step, the image may be randomly rotated by 90 degrees and transposed.
This complex augmentation pipeline expressed in a short snippet offers excellent flexibility in trying
out multiple augmentation strategies. A more detailed code listing for this experiment is provided in
the Supplementary Materials (Listing S11).
The main advantage of this API design is simplicity that allows for a laconic, easily extendable
expression of complex workflows.
internal representation of bounding boxes is normalized to XYWH, which allows us to keep sub-pixel
coordinates precision in consecutive spatial augmentations.
Figure 3. An example of geometry-preserving transforms applied to satellite images (top row) and
ground truth binary masks (bottom row) from the Inria Aerial Image Labeling dataset [62].
Some spatial transformations, such as different types of crops, may also change the number of
bounding boxes. For example, when we crop the left part of the image, bounding boxes in the right
part should be removed. We have a parameter label_fields in the definition of a Compose operator
in order to keep track of this boxes-labels correspondence.
randomized parameters that were used to apply to a particular target. Replay mode implemented
in Albumentations supports this functionality and it aims to improve reproducibility and developer
experience while using the library.
Figure 5. An example of applying a custom augmentation using A.lambda operator to an image (left)
and a corresponding segmentation mask (right).
A more detailed code listing for this example is provided in the Supplementary Materials
(Listing S12).
4.7. Performance
Albumentations follows the example of other popular Python machine learning packages
and tends to minimize the use of pure Python functionality under the hood due to performance
considerations. For example, vectorized functions are used as much as possible instead of loops.
Furthermore, Albumentations considers multiple options for how each operation could be realized.
For example, Albumentations tries to work with images of uint8 data type when possible for
a number of reasons. First, it allows minimizing the memory usage and fitting more values into
a SIMD register (e.g., 16 × uint8 vs. 4 × float32 values). Second, high-performance implementations
Information 2020, 11, 125 12 of 20
of common transform operations on uint8 data are widely available. In some cases, it is possible
for Albumentations to not perform transformations directly on the image, instead operating on the
corresponding look-up table (LUT) and then applying it to the original image.
Although there are dedicated computer vision libraries such as OpenCV [17], which are also
implemented in C++ and provide Python interface, their implementation of image transforms are
not always the most efficient. For example, NumPy [64] implementation of an image flip operation
used in Albumentations is faster than OpenCV implementation. The use of numpy.where() operator
for conditional selection, numpy.empty() for memory pre-allocation, and inplace flag in supported
NumPy operations can result in the sensible gains in the processing speed. To balance the performance
and the number of underlying dependencies, we rely on a few low-level libraries that provide fastest
implementations of almost all common image transforms, as demonstrated in Section 5.1.
5. Evaluation
5.1. Benchmarks
The quantitative comparison of image transformation speed performance for Albumentations
and other commonly used image augmentation tools is presented in Table 1. We included
a framework-agnostic image augmentation libraries imgaug [65], Augmentor [21], and SOLT [56],
as well as augmentations provided with Keras [12] and PyTorch [66] frameworks. For the most
image operations, Albumentations is consistently faster than all alternatives. Detailed instructions
for running benchmarks locally are provided in the Albumentations GitHub repository: https:
//github.com/albumentations-team/albumentations.
Table 1. Results for running the benchmark on the first 2000 images from the ImageNet validation set
using an Intel Xeon Platinum 8168 CPU. All outputs are converted to a contiguous NumPy array with
the np.uint8 data type. The table shows how many images per second can be processed on a single
core (higher is better). A denotes results for Albumentations.
A 0.4.2 Imgaug 0.3.0 Torchvision 0.4.1 Keras 2.3.1 Augmentor 0.2.6 Solt 0.1.8
HorizontalFlip 2183 1403 1757 1068 1779 1031
VerticalFlip 4217 2334 1538 4196 1541 3820
Rotate 456 368 163 32 60 116
ShiftScaleRotate 800 549 146 34 - -
Brightness 2209 1288 405 211 403 2070
Contrast 2215 1387 338 - 337 2073
BrightnessContrast 2208 740 193 - 193 1060
ShiftRGB 2214 1303 - 407 - -
ShiftHSV 468 443 61 - - 144
Gamma 2281 - 730 - - 925
Grayscale 5019 436 788 - 1451 4191
RandomCrop64 173,877 3340 43,792 - 36,869 36,178
PadToSize512 2906 - 553 - - 2711
Resize512 663 506 968 - 954 673
RandomSizedCrop64_512 2565 933 1395 - 1353 2360
Equalize 759 457 - - 684 -
We used the same metric as used on the official online evaluation server that is Intersection
over Union (IoU) per image, averaged across all images. We report the best achieved IoU on the
validation set for each augmentation level and data preprocessing time below. For this study, we used
well-known UNet-based model architecture [67], with HRNetV2 encoder [68]. We did not use any
pre-trained weights and started training from default initialization with a fixed random seed. Training
ran for 100 epochs and RAdam optimizer [69] was used with the starting learning rate of 10−3 and
cosine annealing to 10−6 . Models were trained in a hardware setup with four 1080Ti NVidia GPUs
with a batch size of 48 using PyTorch 1.4 [13] and NVidia Apex for mixed-precision training. Each
training run took approximately 24 h. We used the Catalyst library [70] as a high-level framework for
model training and experiment management.
Since original images are too large to fit in a GPU memory entirely, we randomly cropped a square
image patch of the size from the range [384; 640] from the source image and resized it to 512 × 512
during training. Other image augmentations were performed on the resized image. This cropping
scheme was used in all of the following experiments:
1. No augmentations: After cropping and tile, no changes to the image were made.
2. Light augmentations: Random horizontal flips, change of brightness, contrast, color, and random
affine and perspective changes.
3. Medium augmentations, an extended set of augmentations in addition to the Light scenario:
Gaussian blur, sharpening, coarse dropout, removal of some buildings, and randomly
generated fog.
4. Hard augmentations, extending the Medium set with: Random rotation by 90 degrees,
image grid shuffle, elastic transformations, gamma adjustments, and contrast-limited adaptive
histogram equalization.
The results for all four experiments are presented in Table 2. Without enough image
augmentations, even our average-sized model (35M parameters) showed signs of overfitting after
epoch 50, when IoU validation stopped improving. With Medium augmentations, the same model
had a smaller gap between train and validation IoU scores and the best IoU was achieved towards
the end of the training process, which shows the potential for even further improvement. When
training with Hard augmentations, the model achieved the overall best validation IoU and did not
overfit. The current state-of-the-art result on this dataset is 80.32 mIoU, and model trained with Hard
augmentation have not reached this mark on the training set, which indicates it is still under-trained.
Overall time for model training did not increase substantially, meaning that even Hard augmentations
were fast enough to process the batch in time for passing it to a GPU.
Table 2. Results for different augmentation levels for the segmentation task on the Inria Aerial Image
Labeling dataset [62]. Train IoU and Valid IoU show the best metric value reached across 100 epochs of
training (higher is better). Data time and Model time indicate how long it takes to preprocess a batch of
images and then run it through the network (lower is better).
Augmentations Train IoU Valid IoU Best Epoch Data Time (sec/batch) Model Time (sec/batch)
None 84.67 73.89 45/100 0.09 0.6
Light 84.84 77.50 90/100 0.09 0.6
Medium 83.52 76.94 96/100 0.11 0.6
Heavy 79.78 78.34 95/100 0.13 0.6
This use case demonstrates that having the variety of different available transforms is important
for preventing overfitting and achieving the best model performance. At the same time, thanks to the
optimized operation performance in Albumentations, augmentation pipeline can be extended even
further without slowing down training process.
Information 2020, 11, 125 14 of 20
5.3. Visualization
To anyone who works with image analysis, the ability to visualize the effect of programmatic
operations applied to an image is of immense help. Qualitative visual inspection allows quickly
validating the results of a transform and catch possible bugs early, especially in the case of the complex
processing workflow. Thanks to the great community, there are two tools to visualize Albumentations
(see Figure 6). The first one allows looking at the result of one specific transformation applied to one
of the predefined images, manually change parameters to achieve the desirable result, and extract
the exact Python code to reproduce it [71]. The second visualizes the output of a chain of transforms
applied to a predefined or uploaded image in order to validate the resulting image [72]. Besides the
obvious practical utility, both tools show involvement of the community around Albumentations.
5.4. Adoption
Although there is no easy way to measure the adoption of the library, we can look at different
metrics that may serve as a useful proxy. First, for any open source library, the number of stars on
GitHub can show the interest of users. In Figure 7, we show the dependence of this metric as a function
of time, as well as number of downloads via PyPI. Second, the library was born from winning solutions
to the Computer Vision competitions, and it is not surprising that many, if not all, top teams at Kaggle
use the library in their solutions [55]. Third, the library is gaining use in academia, as shown by
recent Google Scholar mentions, with most common applications in biomedical and satellite image
analysis. [73–80]. Finally, Albumentations has joined other PyTorch-friendly tools in the Pytorch
ecosystem [81].
Information 2020, 11, 125 15 of 20
Figure 7. Library adoption shown as: (left) the number of stars in the Albumentations GitHub
repository over time; and (right) the number of daily installations of the library using PyPI: pip
install albumentations.
Abbreviations
The following abbreviations are used in this manuscript:
References
1. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef]
2. Nowlan, S.J.; Hinton, G.E. Simplifying neural networks by soft weight-sharing. Neural Comput. 1992,
4, 473–493. [CrossRef]
3. Hawkins, D.M. The problem of overfitting. J. Chem. Inf. Comput. Sci. 2004, 44, 1–12. [CrossRef] [PubMed]
4. Kukačka, J.; Golkov, V.; Cremers, D. Regularization for deep learning: A taxonomy. arXiv 2017,
arXiv:1710.10686.
5. Mikołajczyk, A.; Grochowski, M. Data augmentation for improving deep learning in image classification
problem. In Proceedings of the 2018 International Interdisciplinary PhD Workshop (IIPhDW),
Swinoujście, Poland, 9–12 May 2018; pp. 117–122.
6. Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019,
6, 60. [CrossRef]
7. Liu, S.; Papailiopoulos, D.; Achlioptas, D. Bad Global Minima Exist and SGD Can Reach Them. arXiv 2019,
arXiv:1906.02613.
8. Bengio, Y.; Bastien, F.; Bergeron, A.; Boulanger-Lewandowski, N.; Breuel, T.; Chherawala, Y.; Cisse,
M.; Côté, M.; Erhan, D.; Eustache, J.; et al. Deep learners benefit more from out-of-distribution
examples. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics,
Ft. Lauderdale, FL, USA, 11–13 April 2011; pp. 164–172.
9. Hendrycks, D.; Mu, N.; Cubuk, E.D.; Zoph, B.; Gilmer, J.; Lakshminarayanan, B. AugMix: A Simple Data
Processing Method to Improve Robustness and Uncertainty. In Proceedings of the International Conference
on Learning Representations (ICLR), Millennium Hall, Addis Ababa, Ethiopia, 26–30 April 2020.
10. Hernández-García, A.; König, P. Further advantages of data augmentation on convolutional neural
networks. In Proceedings of the International Conference on Artificial Neural Networks, Rhodes, Greece ,
4–7 October 2018; pp. 95–103.
11. Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.;
et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium
on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016;
pp. 265–283.
12. Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 21 February 2020).
13. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.;
Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In Proceedings
of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019;
pp. 8024–8035.
14. Ratner, A.J.; Ehrenberg, H.; Hussain, Z.; Dunnmon, J.; Ré, C. Learning to compose domain-specific
transformations for data augmentation. In Proceedings of the Advances in Neural Information Processing
Systems, Long Beach, CA, USA, 4–9 December 2017, pp. 3236–3246.
15. Lemley, J.; Bazrafkan, S.; Corcoran, P. Smart Augmentation Learning an Optimal Data Augmentation
Strategy. IEEE Access 2017, 5, 5858–5869. [CrossRef]
Information 2020, 11, 125 17 of 20
16. Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, Q.V. AutoAugment: Learning Augmentation Policies
from Data. arXiv 2018, arXiv:1805.09501.
17. Bradski, G. The OpenCV Library. Dr. Dobb’s J. Softw. Tools 2000. Available online: https://www.drdobbs.
com/open-source/the-opencv-library/184404319 (accessed on 21 February 2020).
18. Clark, A. Pillow. 2010. Available online: https://python-pillow.org/ (accessed on 21 February 2020).
19. Ince, D.C.; Hatton, L.; Graham-Cumming, J. The case for open computer programs. Nature 2012, 482, 485.
[CrossRef]
20. Jung, A.B.; Wada, K.; Crall, J.; Tanaka, S.; Graving, J.; Yadav, S.; Banerjee, J.; Vecsei, G.; Kraft, A.; Borovec, J.;
et al. Imgaug. 2019. Available online: https://github.com/aleju/imgaug (accessed on 31 December 2019).
21. Bloice, M.D.; Roth, P.M.; Holzinger, A. Biomedical image augmentation using Augmentor. Bioinformatics
2019, 35, 4522–4524. [CrossRef]
22. Casado-García, Á.; Domínguez, C.; García-Domínguez, M.; Heras, J.; Inés, A.; Mata, E.; Pascual, V. CLoDSA:
A tool for augmentation in classification, localization, detection, semantic segmentation and instance
segmentation tasks. BMC Bioinform. 2019, 20, 323. [CrossRef] [PubMed]
23. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image
database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009),
Miami, FL, USA, 20–26 June 2009; pp. 248–255.
24. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural
networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA,
3–6 December 2012; pp. 1097–1105.
25. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition.
Proc. IEEE 1998, 86, 2278–2324. [CrossRef]
26. Howard, A.G. Some improvements on deep convolutional neural network based image classification. arXiv
2013, arXiv:1312.5402.
27. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A.
Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9.
28. Wu, R.; Yan, S.; Shan, Y.; Dang, Q.; Sun, G. Deep image: Scaling up image recognition. arXiv 2015,
arXiv:1501.02876.
29. Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; Van Der Laak, J.A.;
Van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal.
2017, 42, 60–88. [CrossRef]
30. Ching, T.; Himmelstein, D.S.; Beaulieu-Jones, B.K.; Kalinin, A.A.; Do, B.T.; Way, G.P.; Ferrero, E.; Agapow,
P.M.; Zietz, M.; Hoffman, M.M.; et al. Opportunities And Obstacles For Deep Learning In Biology And
Medicine. J. R. Soc. Interface 2018, 15. [CrossRef]
31. Rakhlin, A.; Shvets, A.; Iglovikov, V.; Kalinin, A.A. Deep Convolutional Neural Networks for Breast
Cancer Histology Image Analysis. In Image Analysis and Recognition; Campilho, A., Karray, F.,
ter Haar Romeny, B., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 737–744.
32. DeVries, T.; Taylor, G.W. Improved regularization of convolutional neural networks with cutout. arXiv 2017,
arXiv:1708.04552.
33. Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization.
In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada,
30 April–3 May 2018.
34. Guo, H.; Mao, Y.; Zhang, R. Mixup as locally linear out-of-manifold regularization. In Proceedings of the
AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33,
pp. 3714–3722.
35. Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. Cutmix: Regularization strategy to train strong classifiers
with localizable features. In Proceedings of the International Conference on Computer Vision (ICCV),
Seoul, Korea, 27 October–2 November 2019.
36. Graham, B. Fractional max-pooling. arXiv 2014, arXiv:1412.6071.
37. Lee, H.; Hwang, S.J.; Shin, J. Rethinking Data Augmentation: Self-Supervision and Self-Distillation. arXiv
2019, arXiv:1910.05872.
Information 2020, 11, 125 18 of 20
38. He, Z.; Xie, L.; Chen, X.; Zhang, Y.; Wang, Y.; Tian, Q. Data Augmentation Revisited: Rethinking the
Distribution Gap between Clean and Augmented Data. arXiv 2019, arXiv:1909.09148.
39. Tran, T.; Pham, T.; Carneiro, G.; Palmer, L.; Reid, I. A bayesian data augmentation approach for learning
deep models. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA,
USA, 4–9 December 2017; pp. 2797–2806.
40. Lim, S.; Kim, I.; Kim, T.; Kim, C.; Kim, S. Fast AutoAugment. In Proceedings of the Advances in Neural
Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; pp. 6665–6675.
41. Ho, D.; Liang, E.; Chen, X.; Stoica, I.; Abbeel, P. Population Based Augmentation: Efficient Learning of
Augmentation Policy Schedules. In Proceedings of the International Conference on Machine Learning,
Boca Raton, FL, USA, 16–19 December 2019; pp. 2731–2741.
42. Taylor, L.; Nitschke, G. Improving deep learning with generic data augmentation. In Proceedings of the
2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 18–21 November 2018;
pp. 1542–1547.
43. Joaquin, A.G.; Krzysztof Ł˛ecki, J.L.S.P.M.S.A.W.; Zientkiewicz, M. Fast AI Data Preprocessing with
NVIDIA DALI. Available online: https://devblogs.nvidia.com/fast-ai-data-preprocessing-with-nvidia-
dali/ (accessed on 31 December 2019).
44. Kalinin, A.A.; Allyn-Feuer, A.; Ade, A.; Fon, G.V.; Meixner, W.; Dilworth, D.; De Wet, J.R.; Higgins, G.A.;
Zheng, G.; Creekmore, A.; et al. 3D Cell Nuclear Morphology: Microscopy Imaging Dataset and Voxel-Based
Morphometry Classification Results. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2272–2280.
45. Parpulov, D.; Samorodov, A.; Makhov, D.; Slavnova, E.; Volchenko, N.; Iglovikov, V. Convolutional neural
network application for cells segmentation in immunocytochemical study. In Proceedings of the 2018
Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT),
Yekaterinburg, Russia, 7–8 May 2018; pp. 87–90.
46. Caicedo, J.C.; Goodman, A.; Karhohs, K.W.; Cimini, B.A.; Ackerman, J.; Haghighi, M.; Heng, C.; Becker, T.;
Doan, M.; McQuin, C.; et al. Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl.
Nat. Methods 2019, 16, 1247–1253. [CrossRef] [PubMed]
47. Kalinin, A.A.; Iglovikov, V.; Rakhlin, A.; Shvets, A. Medical Image Segmentation using Deep Neural
Networks with Pre-trained Encoders. In Deep Learning Applications; Wani, M.A., Kantardzic, M.,
Sayed Mouchaweh, M., Eds.; Springer: Singapore, 2020.
48. Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B.
The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223.
49. Neuhold, G.; Ollmann, T.; Bulò, S.R.; Kontschieder, P. The Mapillary Vistas Dataset for Semantic
Understanding of Street Scenes. In Proceedings of the International Conf. on Computer Vision (ICCV),
Venice, Italy, 22–29 October 2017; pp. 5000–5009.
50. Iglovikov, V.; Seferbekov, S.; Buslaev, A.; Shvets, A. TernausNetV2: Fully Convolutional Network for
Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 233–237.
51. Szegedy, C.; Toshev, A.; Erhan, D. Deep neural networks for object detection. In Proceedings of the Advances
in Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–8 December 2013; pp. 2553–2561.
52. Guo, Y.; Liu, Y.; Oerlemans, A.; Lao, S.; Wu, S.; Lew, M.S. Deep learning for visual understanding: A review.
Neurocomputing 2016, 187, 27–48. [CrossRef]
53. Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent
advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [CrossRef]
54. Brito, J.J.; Li, J.; Moore, J.H.; Greene, C.S.; Nogoy, N.A.; Garmire, L.X.; Mangul, S. Enhancing rigor
and reproducibility by improving software availability, usability, and archival stability. arXiv 2020,
arXiv:2001.05127.
55. Albumentations. Available online: https://github.com/albumentations-team/albumentations (accessed on
31 December 2019).
56. Tiulpin, A. SOLT: Streaming over Lightweight Transformations. 2019. Available online: https://zenodo.
org/record/3351977#.XlMrnEoRXIU (accessed on 21 February 2020). [CrossRef]
Information 2020, 11, 125 19 of 20
79. Kuzin, A.; Fattakhov, A.; Kibardin, I.; Iglovikov, V.I.; Dautov, R. Camera Model Identification Using
Convolutional Neural Networks. In Proceedings of the 2018 IEEE International Conference on Big Data
(Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 3107–3110. [CrossRef]
80. Yang, F.; Sakti, S.; Wu, Y.; Nakamura, S. A Framework for Knowing Who is Doing What in Aerial Surveillance
Videos. IEEE Access 2019, 7, 93315–93325. [CrossRef]
81. Pytorch Ecosystem. Available online: https://pytorch.org/ecosystem/ (accessed on 31 December 2019).
82. Open Data Science (ODS.ai). Available online: https://ods.ai (accessed on 31 December 2019).
c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).