Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

Schaeffer, Rylan; Lecomte, Victor; Pai, Dhruv Bhandarkar; Carranza, Andres; Isik, Berivan; Unell, Alyssa; Khona, Mikail; Yerxa, Thomas; LeCun, Yann; Chung, SueYeon; Gromov, Andrey; Shwartz-Ziv, Ravid; Koyejo, Sanmi

Computer Science > Machine Learning

arXiv:2406.09366 (cs)

[Submitted on 13 Jun 2024]

Title:Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

Authors:Rylan Schaeffer, Victor Lecomte, Dhruv Bhandarkar Pai, Andres Carranza, Berivan Isik, Alyssa Unell, Mikail Khona, Thomas Yerxa, Yann LeCun, SueYeon Chung, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo

View PDF HTML (experimental)

Abstract:Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to improve our understanding and our utilization of MMCR. To better understand MMCR, we leverage tools from high dimensional probability to demonstrate that MMCR incentivizes alignment and uniformity of learned embeddings. We then leverage tools from information theory to show that such embeddings maximize a well-known lower bound on mutual information between views, thereby connecting the geometric perspective of MMCR to the information-theoretic perspective commonly discussed in MVSSL. To better utilize MMCR, we mathematically predict and experimentally confirm non-monotonic changes in the pretraining loss akin to double descent but with respect to atypical hyperparameters. We also discover compute scaling laws that enable predicting the pretraining loss as a function of gradients steps, batch size, embedding dimension and number of views. We then show that MMCR, originally applied to image data, is performant on multimodal image-text data. By more deeply understanding the theoretical and empirical behavior of MMCR, our work reveals insights on improving MVSSL methods.

Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
Cite as:	arXiv:2406.09366 [cs.LG]
	(or arXiv:2406.09366v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.09366

Submission history

From: Rylan Schaeffer [view email]
[v1] Thu, 13 Jun 2024 17:49:56 UTC (13,766 KB)

Computer Science > Machine Learning

Title:Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators