Skip to main content

On Pretraining Data Diversity for Self-Supervised Learning

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15114))

Included in the following conference series:

  • 211 Accesses

Abstract

We explore the impact of training with more diverse datasets, characterized by the number of unique samples, on the performance of self-supervised learning (SSL) under a fixed computational budget. Our findings demonstrate that increasing pretraining data diversity enhances SSL performance, albeit only when the distribution distance to the downstream data is minimal. Notably, even with an exceptionally large pretraining data diversity achieved through methods like web crawling or diffusion-generated data, among other ways, the distribution shift remains a challenge. Our experiments are comprehensive with seven SSL methods using large-scale datasets such as ImageNet and YFCC100M amounting to over 200 GPU days. The code and trained models will be available at https://github.com/hammoudhasan/DiversitySSL.

H. A. Al Kader Hammoud, T. Das and F. Pizzati—Equal contribution.

A. Bibi and B. Ghanem—Equal supervision.

H. A. Al Kader Hammoud—Work done during a research visit at Oxford.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 63.34
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 78.06
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. In: NeurIPS (2019)

    Google Scholar 

  2. Balestriero, R., et al.: A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210 (2023)

  3. Bao, H., Dong, L., Piao, S., Wei, F.: BEit: BERT pre-training of image transformers. In: ICML (2022)

    Google Scholar 

  4. Bardes, A., Ponce, J., LeCun, Y.: VICReg: variance-invariance-covariance regularization for self-supervised learning. In: ICLR (2022)

    Google Scholar 

  5. Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 446–461. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_29

    Chapter  Google Scholar 

  6. Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: ECCV (2018)

    Google Scholar 

  7. Caron, M., Bojanowski, P., Mairal, J., Joulin, A.: Unsupervised pre-training of image features on non-curated data. In: ICCV (2019)

    Google Scholar 

  8. Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: NeurIPS (2020)

    Google Scholar 

  9. Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: ICCV (2021)

    Google Scholar 

  10. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)

    Google Scholar 

  11. Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.: Big self-supervised models are strong semi-supervised learners. In: NeurIPS (2020)

    Google Scholar 

  12. Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297 (2020)

  13. Chen, X., He, K.: Exploring simple Siamese representation learning. In: CVPR (2021)

    Google Scholar 

  14. Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: ICCV (2021)

    Google Scholar 

  15. Cole, E., Yang, X., Wilber, K., Aodha, O.M., Belongie, S.: When does contrastive visual representation learning work? In: CVPR (2022)

    Google Scholar 

  16. Costa, V.G.T.D., Fini, E., Nabi, M., Sebe, N., Ricci, E.: solo-learn: a library of self-supervised methods for visual representation learning. In: JMLR (2022)

    Google Scholar 

  17. Dehghani, M., et al.: Scaling vision transformers to 22 billion parameters. In: ICML (2023)

    Google Scholar 

  18. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  19. Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: ICCV (2015)

    Google Scholar 

  20. Doersch, C., Zisserman, A.: Multi-task self-supervised visual learning. In: ICCV (2017)

    Google Scholar 

  21. Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: ICLR (2021)

    Google Scholar 

  22. El-Nouby, A., Izacard, G., Touvron, H., Laptev, I., Jegou, H., Grave, E.: Are large-scale datasets necessary for self-supervised pre-training? arXiv (2021)

    Google Scholar 

  23. Ericsson, L., Gouk, H., Hospedales, T.M.: How well do self-supervised models transfer? In: CVPR (2021)

    Google Scholar 

  24. Fini, E., Astolfi, P., Romero-Soriano, A., Verbeek, J., Drozdzal, M.: Improved baselines for vision-language pre-training. TMLR (2023)

    Google Scholar 

  25. Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: ICLR (2018)

    Google Scholar 

  26. Goyal, P., et al.: Self-supervised pretraining of visual features in the wild. arXiv preprint arXiv:2103.01988 (2021)

  27. Goyal, P., et al.: Vision models are more robust and fair when pretrained on uncurated images without supervision. arXiv preprint arXiv:2202.08360 (2022)

  28. Goyal, P., Mahajan, D., Gupta, A., Misra, I.: Scaling and benchmarking self-supervised visual representation learning. In: ICCV (2019)

    Google Scholar 

  29. Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. In: NeurIPS, vol. 33 (2020)

    Google Scholar 

  30. Hammoud, H.A.A.K., Itani, H., Pizzati, F., Torr, P., Bibi, A., Ghanem, B.: SynthCLIP: are we ready for a fully synthetic clip training? arXiv (2024)

    Google Scholar 

  31. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR (2022)

    Google Scholar 

  32. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)

    Google Scholar 

  33. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  34. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPS, vol. 30 (2017)

    Google Scholar 

  35. Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: ICLR (2018)

    Google Scholar 

  36. Hénaff, O.J., et al.: Data-efficient image recognition with contrastive predictive coding. In: ICML (2019)

    Google Scholar 

  37. Kotar, K., Ilharco, G., Schmidt, L., Ehsani, K., Mottaghi, R.: Contrasting contrastive self-supervised representation learning pipelines. In: ICCV (2021)

    Google Scholar 

  38. Krause, J., Deng, J., Stark, M., Fei-Fei, L.: Collecting a large-scale dataset of fine-grained cars. In: FGVC (2013)

    Google Scholar 

  39. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  40. Le, Y., Yang, X.: Tiny ImageNet visual recognition challenge. CS 231N (2015)

    Google Scholar 

  41. Li, A.C., Brown, E.L., Efros, A.A., Pathak, D.: Internet explorer: targeted representation learning on the open web. In: ICML (2023)

    Google Scholar 

  42. Li, J., Zhou, P., Xiong, C., Hoi, S.C.H.: Prototypical contrastive learning of unsupervised representations. In: ICLR (2021)

    Google Scholar 

  43. Misra, I., Maaten, L.V.D.: Self-supervised learning of pretext-invariant representations. In: CVPR (2020)

    Google Scholar 

  44. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: ICVGIP (2008)

    Google Scholar 

  45. Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5

    Chapter  Google Scholar 

  46. Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

  47. Oquab, M., et al.: DINOv2: learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

  48. Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.: Cats and dogs. In: CVPR (2012)

    Google Scholar 

  49. Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Dollár, P.: Designing network design spaces. In: CVPR (2020)

    Google Scholar 

  50. Ramtoula, B., Gadd, M., Newman, P., De Martini, D.: Visual DNA: representing and comparing images using distributions of neuron activations. In: CVPR (2023)

    Google Scholar 

  51. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)

    Google Scholar 

  52. Shi, Y., Daunhawer, I., Vogt, J.E., Torr, P., Sanyal, A.: How robust is unsupervised representation learning to distribution shift? In: ICLR (2022)

    Google Scholar 

  53. Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: ICCV (2017)

    Google Scholar 

  54. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)

    Google Scholar 

  55. Thomee, B., et al.: YFCC100M: the new data in multimedia research. Commun. ACM (2016)

    Google Scholar 

  56. Tian, Y., Henaff, O.J., Van Den Oord, A.: Divide and contrast: self-supervised learning from uncurated data. In: ICCV (2021)

    Google Scholar 

  57. Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 776–794. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_45

    Chapter  Google Scholar 

  58. Tong, S., Chen, Y., Ma, Y., Lecun, Y.: EMP-SSL: towards self-supervised learning in one training epoch. arXiv preprint arXiv:2304.03977 (2023)

  59. Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR (2011)

    Google Scholar 

  60. Van Gansbeke, W., Vandenhende, S., Georgoulis, S., Gool, L.V.: Revisiting contrastive methods for unsupervised learning of visual representations. In: NeurIPS (2021)

    Google Scholar 

  61. Van Horn, G., Cole, E., Beery, S., Wilber, K., Belongie, S., Mac Aodha, O.: Benchmarking representation learning for natural world image collections. In: CVPR (2021)

    Google Scholar 

  62. Wei, J., et al.: Emergent abilities of large language models. TMLR (2022)

    Google Scholar 

  63. Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR (2018)

    Google Scholar 

  64. Xie, Z., et al.: SimMIM: a simple framework for masked image modeling. In: CVPR (2022)

    Google Scholar 

  65. Xie, Z., et al.: On data scaling in masked image modeling. In: CVPR (2023)

    Google Scholar 

  66. Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: ICML (2021)

    Google Scholar 

  67. Zhai, X., Kolesnikov, A., Houlsby, N., Beyer, L.: Scaling vision transformers. In: CVPR (2022)

    Google Scholar 

  68. Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40

    Chapter  Google Scholar 

  69. Zhao, N., Wu, Z., Lau, R.W.H., Lin, S.: What makes instance discrimination good for transfer learning? In: ICML (2021)

    Google Scholar 

  70. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. T-PAMI (2017)

    Google Scholar 

  71. Zhou, P., Zhou, Y., Si, C., Yu, W., Ng, T.K., Yan, S.: Mugs: a multi-granular self-supervised learning framework. arXiv preprint arXiv:2203.14415 (2022)

  72. Zhuang, C., Zhai, A.L., Yamins, D.: Local aggregation for unsupervised learning of visual embeddings. In: ICCV (2019)

    Google Scholar 

Download references

Acknowledgements

This work was supported by SDAIA-KAUST Center of Excellence in Data Science, Artificial Intelligence (SDAIA-KAUST AI). Fabio Pizzati is financed by KAUST (Grant DFR07910). Philip H.S. Torr thanks the Royal Academy of Engineering for their support. This work is supported by a UKRI grant Turing AI Fellowship (EP/W002981/1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hasan Abed Al Kader Hammoud .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5446 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Al Kader Hammoud, H.A., Das, T., Pizzati, F., Torr, P.H.S., Bibi, A., Ghanem, B. (2025). On Pretraining Data Diversity for Self-Supervised Learning. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15114. Springer, Cham. https://doi.org/10.1007/978-3-031-72992-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72992-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72991-1

  • Online ISBN: 978-3-031-72992-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics