On Pretraining Data Diversity for Self-Supervised Learning

Al Kader Hammoud, Hasan Abed; Das, Tuhin; Pizzati, Fabio; Torr, Philip H. S.; Bibi, Adel; Ghanem, Bernard

doi:10.1007/978-3-031-72992-8_4

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15114))

Included in the following conference series:

European Conference on Computer Vision

211 Accesses

Abstract

We explore the impact of training with more diverse datasets, characterized by the number of unique samples, on the performance of self-supervised learning (SSL) under a fixed computational budget. Our findings demonstrate that increasing pretraining data diversity enhances SSL performance, albeit only when the distribution distance to the downstream data is minimal. Notably, even with an exceptionally large pretraining data diversity achieved through methods like web crawling or diffusion-generated data, among other ways, the distribution shift remains a challenge. Our experiments are comprehensive with seven SSL methods using large-scale datasets such as ImageNet and YFCC100M amounting to over 200 GPU days. The code and trained models will be available at https://github.com/hammoudhasan/DiversitySSL.

H. A. Al Kader Hammoud, T. Das and F. Pizzati—Equal contribution.

A. Bibi and B. Ghanem—Equal supervision.

H. A. Al Kader Hammoud—Work done during a research visit at Oxford.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: EUR 29.95; Price includes VAT (France)

eBook: EUR 63.34; Price includes VAT (France)

Softcover Book: EUR 78.06; Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ExMatch: Self-guided Exploitation for Semi-supervised Learning with Scarce Labeled Samples

FeatMatch: Feature-Based Augmentation for Semi-supervised Learning

Federated Learning Under Statistical Heterogeneity on Riemannian Manifolds

References

Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. In: NeurIPS (2019)
Google Scholar
Balestriero, R., et al.: A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210 (2023)
Bao, H., Dong, L., Piao, S., Wei, F.: BEit: BERT pre-training of image transformers. In: ICML (2022)
Google Scholar
Bardes, A., Ponce, J., LeCun, Y.: VICReg: variance-invariance-covariance regularization for self-supervised learning. In: ICLR (2022)
Google Scholar
Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 446–461. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_29
Chapter Google Scholar
Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: ECCV (2018)
Google Scholar
Caron, M., Bojanowski, P., Mairal, J., Joulin, A.: Unsupervised pre-training of image features on non-curated data. In: ICCV (2019)
Google Scholar
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: NeurIPS (2020)
Google Scholar
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: ICCV (2021)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)
Google Scholar
Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.: Big self-supervised models are strong semi-supervised learners. In: NeurIPS (2020)
Google Scholar
Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297 (2020)
Chen, X., He, K.: Exploring simple Siamese representation learning. In: CVPR (2021)
Google Scholar
Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: ICCV (2021)
Google Scholar
Cole, E., Yang, X., Wilber, K., Aodha, O.M., Belongie, S.: When does contrastive visual representation learning work? In: CVPR (2022)
Google Scholar
Costa, V.G.T.D., Fini, E., Nabi, M., Sebe, N., Ricci, E.: solo-learn: a library of self-supervised methods for visual representation learning. In: JMLR (2022)
Google Scholar
Dehghani, M., et al.: Scaling vision transformers to 22 billion parameters. In: ICML (2023)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: ICCV (2015)
Google Scholar
Doersch, C., Zisserman, A.: Multi-task self-supervised visual learning. In: ICCV (2017)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
El-Nouby, A., Izacard, G., Touvron, H., Laptev, I., Jegou, H., Grave, E.: Are large-scale datasets necessary for self-supervised pre-training? arXiv (2021)
Google Scholar
Ericsson, L., Gouk, H., Hospedales, T.M.: How well do self-supervised models transfer? In: CVPR (2021)
Google Scholar
Fini, E., Astolfi, P., Romero-Soriano, A., Verbeek, J., Drozdzal, M.: Improved baselines for vision-language pre-training. TMLR (2023)
Google Scholar
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: ICLR (2018)
Google Scholar
Goyal, P., et al.: Self-supervised pretraining of visual features in the wild. arXiv preprint arXiv:2103.01988 (2021)
Goyal, P., et al.: Vision models are more robust and fair when pretrained on uncurated images without supervision. arXiv preprint arXiv:2202.08360 (2022)
Goyal, P., Mahajan, D., Gupta, A., Misra, I.: Scaling and benchmarking self-supervised visual representation learning. In: ICCV (2019)
Google Scholar
Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. In: NeurIPS, vol. 33 (2020)
Google Scholar
Hammoud, H.A.A.K., Itani, H., Pizzati, F., Torr, P., Bibi, A., Ghanem, B.: SynthCLIP: are we ready for a fully synthetic clip training? arXiv (2024)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR (2022)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPS, vol. 30 (2017)
Google Scholar
Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: ICLR (2018)
Google Scholar
Hénaff, O.J., et al.: Data-efficient image recognition with contrastive predictive coding. In: ICML (2019)
Google Scholar
Kotar, K., Ilharco, G., Schmidt, L., Ehsani, K., Mottaghi, R.: Contrasting contrastive self-supervised representation learning pipelines. In: ICCV (2021)
Google Scholar
Krause, J., Deng, J., Stark, M., Fei-Fei, L.: Collecting a large-scale dataset of fine-grained cars. In: FGVC (2013)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Le, Y., Yang, X.: Tiny ImageNet visual recognition challenge. CS 231N (2015)
Google Scholar
Li, A.C., Brown, E.L., Efros, A.A., Pathak, D.: Internet explorer: targeted representation learning on the open web. In: ICML (2023)
Google Scholar
Li, J., Zhou, P., Xiong, C., Hoi, S.C.H.: Prototypical contrastive learning of unsupervised representations. In: ICLR (2021)
Google Scholar
Misra, I., Maaten, L.V.D.: Self-supervised learning of pretext-invariant representations. In: CVPR (2020)
Google Scholar
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: ICVGIP (2008)
Google Scholar
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5
Chapter Google Scholar
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Oquab, M., et al.: DINOv2: learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)
Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.: Cats and dogs. In: CVPR (2012)
Google Scholar
Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Dollár, P.: Designing network design spaces. In: CVPR (2020)
Google Scholar
Ramtoula, B., Gadd, M., Newman, P., De Martini, D.: Visual DNA: representing and comparing images using distributions of neuron activations. In: CVPR (2023)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)
Google Scholar
Shi, Y., Daunhawer, I., Vogt, J.E., Torr, P., Sanyal, A.: How robust is unsupervised representation learning to distribution shift? In: ICLR (2022)
Google Scholar
Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: ICCV (2017)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)
Google Scholar
Thomee, B., et al.: YFCC100M: the new data in multimedia research. Commun. ACM (2016)
Google Scholar
Tian, Y., Henaff, O.J., Van Den Oord, A.: Divide and contrast: self-supervised learning from uncurated data. In: ICCV (2021)
Google Scholar
Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 776–794. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_45
Chapter Google Scholar
Tong, S., Chen, Y., Ma, Y., Lecun, Y.: EMP-SSL: towards self-supervised learning in one training epoch. arXiv preprint arXiv:2304.03977 (2023)
Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR (2011)
Google Scholar
Van Gansbeke, W., Vandenhende, S., Georgoulis, S., Gool, L.V.: Revisiting contrastive methods for unsupervised learning of visual representations. In: NeurIPS (2021)
Google Scholar
Van Horn, G., Cole, E., Beery, S., Wilber, K., Belongie, S., Mac Aodha, O.: Benchmarking representation learning for natural world image collections. In: CVPR (2021)
Google Scholar
Wei, J., et al.: Emergent abilities of large language models. TMLR (2022)
Google Scholar
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR (2018)
Google Scholar
Xie, Z., et al.: SimMIM: a simple framework for masked image modeling. In: CVPR (2022)
Google Scholar
Xie, Z., et al.: On data scaling in masked image modeling. In: CVPR (2023)
Google Scholar
Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: ICML (2021)
Google Scholar
Zhai, X., Kolesnikov, A., Houlsby, N., Beyer, L.: Scaling vision transformers. In: CVPR (2022)
Google Scholar
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40
Chapter Google Scholar
Zhao, N., Wu, Z., Lau, R.W.H., Lin, S.: What makes instance discrimination good for transfer learning? In: ICML (2021)
Google Scholar
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. T-PAMI (2017)
Google Scholar
Zhou, P., Zhou, Y., Si, C., Yu, W., Ng, T.K., Yan, S.: Mugs: a multi-granular self-supervised learning framework. arXiv preprint arXiv:2203.14415 (2022)
Zhuang, C., Zhai, A.L., Yamins, D.: Local aggregation for unsupervised learning of visual embeddings. In: ICCV (2019)
Google Scholar

Download references

Acknowledgements

This work was supported by SDAIA-KAUST Center of Excellence in Data Science, Artificial Intelligence (SDAIA-KAUST AI). Fabio Pizzati is financed by KAUST (Grant DFR07910). Philip H.S. Torr thanks the Royal Academy of Engineering for their support. This work is supported by a UKRI grant Turing AI Fellowship (EP/W002981/1).

Author information

Authors and Affiliations

KAUST, Thuwal, Saudi Arabia
Hasan Abed Al Kader Hammoud & Bernard Ghanem
University of Oxford, Oxford, UK
Hasan Abed Al Kader Hammoud, Tuhin Das, Fabio Pizzati, Philip H. S. Torr & Adel Bibi

Authors

Hasan Abed Al Kader Hammoud
View author publications
You can also search for this author in PubMed Google Scholar
Tuhin Das
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Pizzati
View author publications
You can also search for this author in PubMed Google Scholar
Philip H. S. Torr
View author publications
You can also search for this author in PubMed Google Scholar
Adel Bibi
View author publications
You can also search for this author in PubMed Google Scholar
Bernard Ghanem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hasan Abed Al Kader Hammoud .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5446 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Al Kader Hammoud, H.A., Das, T., Pizzati, F., Torr, P.H.S., Bibi, A., Ghanem, B. (2025). On Pretraining Data Diversity for Self-Supervised Learning. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15114. Springer, Cham. https://doi.org/10.1007/978-3-031-72992-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-72992-8_4
Published: 30 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72991-1
Online ISBN: 978-3-031-72992-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Pretraining Data Diversity for Self-Supervised Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ExMatch: Self-guided Exploitation for Semi-supervised Learning with Scarce Labeled Samples

FeatMatch: Feature-Based Augmentation for Semi-supervised Learning

Federated Learning Under Statistical Heterogeneity on Riemannian Manifolds

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 5446 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On Pretraining Data Diversity for Self-Supervised Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ExMatch: Self-guided Exploitation for Semi-supervised Learning with Scarce Labeled Samples

FeatMatch: Feature-Based Augmentation for Semi-supervised Learning

Federated Learning Under Statistical Heterogeneity on Riemannian Manifolds

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 5446 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation