Information Geometry in Optimization Machine Learn
Information Geometry in Optimization Machine Learn
net/publication/225366894
CITATIONS READS
20 1,397
1 author:
Shun-ichi Amari
RIKEN
564 PUBLICATIONS 36,896 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Shun-ichi Amari on 28 May 2014.
Shun-ichi AMARI
c Higher Education Press and Springer-Verlag Berlin Heidelberg 2010
Abstract The present article gives an introduction to article intends to give an understandable introduction
information geometry and surveys its applications in to information geometry without modern differential ge-
the area of machine learning, optimization and statis- ometry. Since underlying manifolds in most applications
tical inference. Information geometry is explained in- are dually flat, the dually flat structure plays a funda-
tuitively by using divergence functions introduced in a mental role. We explain the fundamental dual structure
manifold of probability distributions and other general and related dual geodesics without using the concept of
manifolds. They give a Riemannian structure together affine connections and covariant derivatives.
with a pair of dual flatness criteria. Many manifolds are We begin with a divergence function between two
dually flat. When a manifold is dually flat, a general- points in a manifold. When it satisfies an invariance cri-
ized Pythagorean theorem and related projection the- terion of information monotonicity, it gives a family of f -
orem are introduced. They provide useful means for divergences [2]. When a divergence is derived from a con-
various approximation and optimization problems. We vex function in the form of Bregman divergence [3], this
apply them to alternative minimization problems, Ying- gives another type of divergence, where the Kullback-
Yang machines and belief propagation algorithm in ma- Leibler divergence belongs to both of them. We derive
chine learning. a geometrical structure from a divergence function [4].
The Fisher information Riemannian structure is derived
Keywords information geometry, machine learning, from an invariant divergence (f -divergence) (see Refs.
optimization, statistical inference, divergence, graphical [1,5]), while the dually flat structure is derived from the
model, ying-yang machine Bregman divergence (convex function).
The manifold of all discrete probability distributions is
dually flat, where the Kullback-Leibler divergence plays
1 Introduction a key role. We give the generalized Pythagorean theo-
rem and projection theorem in a dually flat manifold,
Information geometry [1] deals with a manifold of proba- which plays a fundamental role in applications. Such
bility distributions from the geometrical point of view. It a structure is not limited to a manifold of probability
studies the invariant structure by using the Riemannian distributions, but can be extended to the manifolds of
geometry equipped with a dual pair of affine connec- positive arrays, matrices and visual signals, and will be
tions. Since probability distributions are used in many used in neural networks and optimization problems.
problems in optimization, machine learning, vision, sta- After introducing basic properties, we show three ar-
tistical inference, neural networks and others, informa- eas of applications. One is application to the alterna-
tion geometry provides a useful and strong tool to many tive minimization procedures such as the expectation-
areas of information sciences and engineering. maximization (EM) algorithm in statistics [6–8]. The
Many researchers in these fields, however, are not fa- second is an application to the Ying-Yang machine in-
miliar with modern differential geometry. The present troduced and extensively studied by Xu [9–14]. The
third one is application to belief propagation algorithm
of stochastic reasoning in machine learning or artificial
Received January 15, 2010; accepted February 5, 2010
intelligence [15–17]. There are many other applications
Shun-ichi AMARI in analysis of spiking patterns of the brain, neural net-
RIKEN Brain Science Institute, Saitama 351-0198, Japan works, boosting algorithm of machine learning, as well
E-mail: amari@brain.riken.jp as wide range of statistical inference, which we do not
2 Front. Electr. Electron. Eng. China