Information Geometry in Optimization Machine Learn

This document discusses information geometry and its applications in optimization, machine learning, and statistical inference. It provides an introduction to information geometry without advanced differential geometry concepts. The key points are: - Information geometry studies manifolds of probability distributions and other spaces from a geometric perspective using divergence functions. - Divergence functions define a Riemannian geometric structure on manifolds when they satisfy certain criteria. The Kullback-Leibler divergence satisfies these criteria. - Manifolds of probability distributions and positive arrays are examples of spaces that information geometry can be applied to. - Properties like the generalized Pythagorean theorem in dually flat manifolds are important for applications in areas like optimization, machine learning

Uploaded by

sggtio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Information Geometry in Optimization Machine Learn

Uploaded by

sggtio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/225366894

Information geometry in optimization, machine learning and statistical

inference

Article in Frontiers of Electrical and Electronic Engineering in China · September 2010

DOI: 10.1007/s11460-010-0101-3

CITATIONS READS

20 1,397

1 author:

Shun-ichi Amari
RIKEN
564 PUBLICATIONS 36,896 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Information Geometry and Applications to Neuroimaging View project

state-space Ising model View project

All content following this page was uploaded by Shun-ichi Amari on 28 May 2014.

The user has requested enhancement of the downloaded file.

Front. Electr. Electron. Eng. China
DOI 10.1007/s11460-010-0021-2

Shun-ichi AMARI

Information geometry in optimization, machine learning and

statistical inference

c Higher Education Press and Springer-Verlag Berlin Heidelberg 2010

Abstract The present article gives an introduction to article intends to give an understandable introduction
information geometry and surveys its applications in to information geometry without modern differential ge-
the area of machine learning, optimization and statis- ometry. Since underlying manifolds in most applications
tical inference. Information geometry is explained in- are dually flat, the dually flat structure plays a funda-
tuitively by using divergence functions introduced in a mental role. We explain the fundamental dual structure
manifold of probability distributions and other general and related dual geodesics without using the concept of
manifolds. They give a Riemannian structure together affine connections and covariant derivatives.
with a pair of dual flatness criteria. Many manifolds are We begin with a divergence function between two
dually flat. When a manifold is dually flat, a general- points in a manifold. When it satisfies an invariance cri-
ized Pythagorean theorem and related projection the- terion of information monotonicity, it gives a family of f -
orem are introduced. They provide useful means for divergences [2]. When a divergence is derived from a con-
various approximation and optimization problems. We vex function in the form of Bregman divergence [3], this
apply them to alternative minimization problems, Ying- gives another type of divergence, where the Kullback-
Yang machines and belief propagation algorithm in ma- Leibler divergence belongs to both of them. We derive
chine learning. a geometrical structure from a divergence function [4].
The Fisher information Riemannian structure is derived
Keywords information geometry, machine learning, from an invariant divergence (f -divergence) (see Refs.
optimization, statistical inference, divergence, graphical [1,5]), while the dually flat structure is derived from the
model, ying-yang machine Bregman divergence (convex function).
The manifold of all discrete probability distributions is
dually flat, where the Kullback-Leibler divergence plays
1 Introduction a key role. We give the generalized Pythagorean theo-
rem and projection theorem in a dually flat manifold,
Information geometry [1] deals with a manifold of proba- which plays a fundamental role in applications. Such
bility distributions from the geometrical point of view. It a structure is not limited to a manifold of probability
studies the invariant structure by using the Riemannian distributions, but can be extended to the manifolds of
geometry equipped with a dual pair of affine connec- positive arrays, matrices and visual signals, and will be
tions. Since probability distributions are used in many used in neural networks and optimization problems.
problems in optimization, machine learning, vision, sta- After introducing basic properties, we show three ar-
tistical inference, neural networks and others, informa- eas of applications. One is application to the alterna-
tion geometry provides a useful and strong tool to many tive minimization procedures such as the expectation-
areas of information sciences and engineering. maximization (EM) algorithm in statistics [6–8]. The
Many researchers in these fields, however, are not fa- second is an application to the Ying-Yang machine in-
miliar with modern differential geometry. The present troduced and extensively studied by Xu [9–14]. The
third one is application to belief propagation algorithm
of stochastic reasoning in machine learning or artificial
Received January 15, 2010; accepted February 5, 2010
intelligence [15–17]. There are many other applications
Shun-ichi AMARI in analysis of spiking patterns of the brain, neural net-
RIKEN Brain Science Institute, Saitama 351-0198, Japan works, boosting algorithm of machine learning, as well
E-mail: amari@brain.riken.jp as wide range of statistical inference, which we do not
2 Front. Electr. Electron. Eng. China

mention here. is an important coordinate system of Sn , as we will see

later.

2 Divergence function and information

geometry

2.1 Manifold of probability distributions and positive

arrays

We introduce divergence functions in various spaces

or manifolds. To begin with, we show typical exam-
ples of manifolds of probability distributions. A one-
dimensional Gaussian distribution with mean μ and vari-
ance σ 2 is represented by its probability density function

1 (x − μ)2 Fig. 1 Manifold S2 of discrete probability distributions
p(x; μ, σ) = √ exp − . (1)
2πσ 2σ 2
The third example deals with positive measures, not
It is parameterized by a two-dimensional parameter
probability measures. When we disregard the constraint
ξ = (μ, σ). Hence, when we treat all such Gaussian
pi = 1 of (6) in Sn , keeping pi > 0, p is regarded
distributions, not a particular one, we need to consider
as an (n + 1)-dimensional positive arrays, or a positive
the set SG of all the Gaussian distributions. It forms a
measure where x = i has measure pi . We denote the set
two-dimensional manifold
of positive measures or arrays by
SG = {p(x; ξ)} , (2)
Mn+1 = {z, zi > 0 ; i = 0, 1, . . . , n} . (10)
where ξ = (μ, σ) is a coordinate system of SG . This
is not the only coordinate system. It is possible to use This is an (n + 1)-dimensional manifold with a coordi-
nate system z. Sn is its submanifold derived by a linear
other parameterizations or coordinate systems when we
study SG . constraint zi = 1.
We show another example. Let x be a discrete random In general, we can regard any regular statistical model
variable taking values on a finite set X = {0, 1, . . . , n}. S = {p(x, ξ)} (11)
Then, a probability distribution is specified by a vector
p = (p0 , p1 , . . . , pn ), where parameterized by ξ as a manifold with a (local) coor-
dinate system ξ. It is a space M of positive measures,
pi = Prob {x = i} . (3) when the constraint p(x, ξ)dx = 1 is discarded. We
We may write may treat any other types of manifolds and introduce
dual structures in them. For example, we will consider
p(x; p) = pi δi (x), (4) a manifold consisting of positive-definite matrices.
where
1, x = i, 2.2 Divergence function and geometry
δi (x) = (5)
0, x = i.
Since p is a probability vector, we have We consider a manifold S having a local coordinate sys-
tem z = (zi ). A function D[z : w] between two points
pi = 1, (6) z and w of S is called a divergence function when it
and we assume satisfies the following two properties:
pi > 0. (7) 1) D[z : w] 0, with equality when and only when
The set of all the probability distributions is denoted by z = w.
2) When the difference between w and z is infinitesi-
Sn = {p} , (8) mally small, we may write w = z + dz and Taylor
which is an n-dimensional simplex because of (6) and expansion gives

(7). When n = 2, Sn is a triangle (Fig. 1). Sn is an n- D [z : z + dz] = gij (z)dzi dzj , (12)
dimensional manifold, and ξ = (p1 , . . . , pn ) is a coordi-
nate system. There are many other coordinate systems. where
For example, ∂2
gij (z) = D[z : w]|w=z (13)
pi ∂zi ∂zj
θi = log , i = 1, . . . , n, (9) is a positive-definite matrix.
p0

View publication stats

Amari, Nagaoka - Methods of Information Geometry
100% (3)
Amari, Nagaoka - Methods of Information Geometry
108 pages
Manifold Learning Theory and Applications 9781439871102 Compress
No ratings yet
Manifold Learning Theory and Applications 9781439871102 Compress
322 pages
Information Geometry
No ratings yet
Information Geometry
411 pages
Experiment 5: Electrical Resistance and Ohm's Law
No ratings yet
Experiment 5: Electrical Resistance and Ohm's Law
10 pages
Calculate Air Conditioner Size For A Room (Excel Sheet)
No ratings yet
Calculate Air Conditioner Size For A Room (Excel Sheet)
1 page
Handbook of Geometric Constraint System Principles
100% (1)
Handbook of Geometric Constraint System Principles
604 pages
Biology Investigatory Project On Mendelian Disorders
86% (14)
Biology Investigatory Project On Mendelian Disorders
42 pages
An Elementary Introduction To Information Geometry
No ratings yet
An Elementary Introduction To Information Geometry
56 pages
An Elementary Introduction To Information Geometry
No ratings yet
An Elementary Introduction To Information Geometry
61 pages
entropy-EIG
No ratings yet
entropy-EIG
62 pages
An Elementary Introduction To Information Geometry
No ratings yet
An Elementary Introduction To Information Geometry
63 pages
Information Geometry: Shun-Ichi Amari
No ratings yet
Information Geometry: Shun-Ichi Amari
48 pages
Snoussi 2007
No ratings yet
Snoussi 2007
45 pages
FrankNielsen Soph IA 21nov2019
No ratings yet
FrankNielsen Soph IA 21nov2019
52 pages
Manifold Learning: What, How, and Why: Marina Meila, Hanyu Zhang, November 8, 2023
No ratings yet
Manifold Learning: What, How, and Why: Marina Meila, Hanyu Zhang, November 8, 2023
33 pages
ResearchCards 18sept2020 PDF
No ratings yet
ResearchCards 18sept2020 PDF
172 pages
978-3-319-05317-2
No ratings yet
978-3-319-05317-2
397 pages
Slides Amari IG3
No ratings yet
Slides Amari IG3
138 pages
SVM_1997
No ratings yet
SVM_1997
11 pages
Where can buy (Ebook) Differential geometry of manifolds by Lovett, Stephen T. ISBN 9780367180461, 0367180464 ebook with cheap price
100% (1)
Where can buy (Ebook) Differential geometry of manifolds by Lovett, Stephen T. ISBN 9780367180461, 0367180464 ebook with cheap price
67 pages
Download Full (Ebook) Differential geometry of manifolds by Lovett Stephen T ISBN 9780367180461, 9780429059292, 0367180464, 0429059299 PDF All Chapters
100% (9)
Download Full (Ebook) Differential geometry of manifolds by Lovett Stephen T ISBN 9780367180461, 9780429059292, 0367180464, 0429059299 PDF All Chapters
65 pages
Geometric Deep Learning With Grids Groups Graphs Geodesics and Gauges
No ratings yet
Geometric Deep Learning With Grids Groups Graphs Geodesics and Gauges
160 pages
几何深度学习
No ratings yet
几何深度学习
160 pages
Geometric Foundations of Deep Learning by Michael Bronstein Towards Data Science
No ratings yet
Geometric Foundations of Deep Learning by Michael Bronstein Towards Data Science
22 pages
Distribution System
No ratings yet
Distribution System
103 pages
21589676
No ratings yet
21589676
47 pages
Jean Gallier, Jocelyn Quaintance - Linear Algebra and Optimization With Applications To Machine Learning - Volume II - Fundamentals of Optimization Theory With Applications To Machine Learning. 2-Wor
100% (1)
Jean Gallier, Jocelyn Quaintance - Linear Algebra and Optimization With Applications To Machine Learning - Volume II - Fundamentals of Optimization Theory With Applications To Machine Learning. 2-Wor
896 pages
Yunshu InformationGeometry
No ratings yet
Yunshu InformationGeometry
79 pages
Yunshu InformationGeometry PDF
No ratings yet
Yunshu InformationGeometry PDF
79 pages
(Book) Fundas of OptTheory Applications To ML
No ratings yet
(Book) Fundas of OptTheory Applications To ML
832 pages
Multivariate Approximation and Applications 1st Edition N. Dyn - Quickly download the ebook to never miss important content
No ratings yet
Multivariate Approximation and Applications 1st Edition N. Dyn - Quickly download the ebook to never miss important content
52 pages
Multivariate Approximation and Applications 1st Edition N. Dyn instant download
100% (1)
Multivariate Approximation and Applications 1st Edition N. Dyn instant download
42 pages
Metric Learning and Manifolds: Preserving The Intrinsic Geometry
No ratings yet
Metric Learning and Manifolds: Preserving The Intrinsic Geometry
37 pages
SchSmo03c
No ratings yet
SchSmo03c
24 pages
Multivariate Approximation and Applications 1st Edition N. Dyn download
No ratings yet
Multivariate Approximation and Applications 1st Edition N. Dyn download
52 pages
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
No ratings yet
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
29 pages
luc2016
No ratings yet
luc2016
328 pages
3191 Random Projections For Manifold Learning
No ratings yet
3191 Random Projections For Manifold Learning
8 pages
Analysis of Manifolds and Its APPLICATIONS-VERY GOOD
No ratings yet
Analysis of Manifolds and Its APPLICATIONS-VERY GOOD
11 pages
Geometry of High-dimensional Space
No ratings yet
Geometry of High-dimensional Space
36 pages
Handbook of Research On Machine Learning Applications and Trends
No ratings yet
Handbook of Research On Machine Learning Applications and Trends
34 pages
299
No ratings yet
299
28 pages
cmds
No ratings yet
cmds
14 pages
HDT_SOP_Report__1___Copy___Copy_ (1)
No ratings yet
HDT_SOP_Report__1___Copy___Copy_ (1)
19 pages
LN Ausgl
No ratings yet
LN Ausgl
154 pages
randomproj
No ratings yet
randomproj
17 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
Support Vector Network
No ratings yet
Support Vector Network
25 pages
Multivariate Approximation
100% (1)
Multivariate Approximation
296 pages
Instant download Differential Geometry of Curves and Surfaces Second Edition Thomas F. Banchoff pdf all chapter
100% (4)
Instant download Differential Geometry of Curves and Surfaces Second Edition Thomas F. Banchoff pdf all chapter
61 pages
Complete Download Differential Geometry of Curves and Surfaces Second Edition Thomas F. Banchoff PDF All Chapters
100% (6)
Complete Download Differential Geometry of Curves and Surfaces Second Edition Thomas F. Banchoff PDF All Chapters
60 pages
Mathematial Introduction to Data Science
No ratings yet
Mathematial Introduction to Data Science
158 pages
Differential Geometry of Curves and Surfaces Second Edition Thomas F. Banchoff - Explore the complete ebook content with the fastest download
100% (1)
Differential Geometry of Curves and Surfaces Second Edition Thomas F. Banchoff - Explore the complete ebook content with the fastest download
47 pages
121 Testing Manifold
No ratings yet
121 Testing Manifold
67 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
26 pages
Report 1
No ratings yet
Report 1
6 pages
Multivariate Approximation and Applications 1st Edition N. Dyn - Download the full ebook now for a seamless reading experience
100% (1)
Multivariate Approximation and Applications 1st Edition N. Dyn - Download the full ebook now for a seamless reading experience
74 pages
Download (Ebook) Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares by Stephen Boyd, Lieven Vandenberghe ISBN 9781316518960, 1316518965 ebook All Chapters PDF
100% (8)
Download (Ebook) Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares by Stephen Boyd, Lieven Vandenberghe ISBN 9781316518960, 1316518965 ebook All Chapters PDF
65 pages
Multivariate Approximation and Applications 1st Edition N. Dyn 2024 Scribd Download
100% (1)
Multivariate Approximation and Applications 1st Edition N. Dyn 2024 Scribd Download
71 pages
Introduction To Nonlinear Optimization and Optimality Conditions Fo
No ratings yet
Introduction To Nonlinear Optimization and Optimality Conditions Fo
46 pages
Where can buy Differential Geometry of Curves and Surfaces Second Edition Thomas F. Banchoff ebook with cheap price
100% (1)
Where can buy Differential Geometry of Curves and Surfaces Second Edition Thomas F. Banchoff ebook with cheap price
67 pages
Simon Foucart - Mathematical Pictures at a Data Science Exhibition (2022, Cambridge University Press) - Libgen.li
No ratings yet
Simon Foucart - Mathematical Pictures at a Data Science Exhibition (2022, Cambridge University Press) - Libgen.li
339 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Geometry of Hamiltonian Dynamics With Conformal Eisenhart Metric
No ratings yet
Geometry of Hamiltonian Dynamics With Conformal Eisenhart Metric
27 pages
2014 Zhang Book Chapter Divergence Geometry
No ratings yet
2014 Zhang Book Chapter Divergence Geometry
27 pages
Symplectically Integrated Neural Networks
No ratings yet
Symplectically Integrated Neural Networks
12 pages
Symplectic Networks For Identifying Hamiltonian Systems.
No ratings yet
Symplectic Networks For Identifying Hamiltonian Systems.
18 pages
Bus Idioms NY Times
No ratings yet
Bus Idioms NY Times
4 pages
Worksheet 10 Geo Ch6
No ratings yet
Worksheet 10 Geo Ch6
2 pages
Thesis On Abstract Algebra
100% (3)
Thesis On Abstract Algebra
5 pages
National Institute For Small Industry Extension and TRAINIG (Nisiet)
No ratings yet
National Institute For Small Industry Extension and TRAINIG (Nisiet)
14 pages
Bsbio Gec104 SLM01
No ratings yet
Bsbio Gec104 SLM01
19 pages
Irrigation and drainage engineering Waller - Download the ebook today to explore every detail
No ratings yet
Irrigation and drainage engineering Waller - Download the ebook today to explore every detail
70 pages
Mätprobe OMP40-2 Installtion Guide
No ratings yet
Mätprobe OMP40-2 Installtion Guide
48 pages
TA 12 _ĐỀ HK1 FORM 2025- Ng Linh
No ratings yet
TA 12 _ĐỀ HK1 FORM 2025- Ng Linh
6 pages
Deflection of Tie Bar G8
No ratings yet
Deflection of Tie Bar G8
25 pages
Samra Q
No ratings yet
Samra Q
7 pages
Seccap
No ratings yet
Seccap
1 page
Emona SIGEx For NI ELVIS
No ratings yet
Emona SIGEx For NI ELVIS
4 pages
FMOLS Model
No ratings yet
FMOLS Model
8 pages
Cambridge Lower Secondary Checkpoint: Anisah Abdulssalam
No ratings yet
Cambridge Lower Secondary Checkpoint: Anisah Abdulssalam
20 pages
Battered Wife Thesis
100% (2)
Battered Wife Thesis
8 pages
CBC Smaw NC 1 - 2020 E-Learning
No ratings yet
CBC Smaw NC 1 - 2020 E-Learning
80 pages
Module 2 Ge LWR
No ratings yet
Module 2 Ge LWR
52 pages
Describing Mathematical System
No ratings yet
Describing Mathematical System
46 pages
CMN2148 Mid
No ratings yet
CMN2148 Mid
8 pages
Effect of Drip Irrigation in Rice Cultivation
No ratings yet
Effect of Drip Irrigation in Rice Cultivation
4 pages
Dulux Colour - Futures - 2021 - Book
No ratings yet
Dulux Colour - Futures - 2021 - Book
70 pages
Brosura Oxyhelp PDF
100% (1)
Brosura Oxyhelp PDF
12 pages
Mill Sheet Snail Fc250
No ratings yet
Mill Sheet Snail Fc250
1 page
Visvesvaraya Technological University: (Your Internship Title)
No ratings yet
Visvesvaraya Technological University: (Your Internship Title)
4 pages
Math10 Module Q1 Wk3
No ratings yet
Math10 Module Q1 Wk3
13 pages
MCQ 1 - Managing Service Excellence
No ratings yet
MCQ 1 - Managing Service Excellence
5 pages
Kinematics
No ratings yet
Kinematics
26 pages