A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

Suh, Namjoon; Cheng, Guang

Statistics > Machine Learning

arXiv:2401.07187 (stat)

[Submitted on 14 Jan 2024 (v1), last revised 16 Sep 2024 (this version, v3)]

Title:A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

Authors:Namjoon Suh, Guang Cheng

View PDF HTML (experimental)

Abstract:In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression (and classification in Appendix~{\color{blue}B}). These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly non-convex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review papers that attempt to answer ``how the neural network trained via gradient-based methods finds the solution that can generalize well on unseen data.'' In particular, two well-known paradigms are reviewed: the Neural Tangent Kernel (NTK) paradigm, and Mean-Field (MF) paradigm. Last but not least, we review the most recent theoretical advancements in generative models including Generative Adversarial Networks (GANs), diffusion models, and in-context learning (ICL) in the Large Language Models (LLMs) from two perpsectives reviewed previously, i.e., approximation and training dynamics.

Comments:	38 pages, 2 figures. Invited for review in Annual Review of Statistics and Its Application
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
Cite as:	arXiv:2401.07187 [stat.ML]
	(or arXiv:2401.07187v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2401.07187

Submission history

From: Namjoon Suh [view email]
[v1] Sun, 14 Jan 2024 02:30:19 UTC (70 KB)
[v2] Thu, 4 Jul 2024 04:36:06 UTC (119 KB)
[v3] Mon, 16 Sep 2024 09:57:35 UTC (142 KB)

Statistics > Machine Learning

Title:A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators