Approximation of Multivariate Functions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/2773445

Approximation of Multivariate Functions

Article · January 1995


Source: CiteSeer

CITATIONS READS
38 550

2 authors, including:

Vladimir Lin
Technion - Israel Institute of Technology
46 PUBLICATIONS   2,105 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Vladimir Lin on 11 August 2014.

The user has requested enhancement of the downloaded file.


Approximation of Multivariate Functions

Vladimir Ya. Lin and Allan Pinkus

Abstract. We discuss one approach to the problem of approximating functions


of many variables which is truly multivariate in character. This approach is based
on superpositions of functions with in nite families of smooth simple functions.

x1. Introduction and Motivation


There are numerous methods of approximating functions of many variables. For
example, we have the more classic methods using Polynomials, Fourier Series, or
Tensor Products, and more modern methods using Wavelets, Radial Basis Func-
tions, Multivariate Splines, or Ridge Functions. Many of these are natural gener-
alizations of methods developed for approximating univariate functions. However
functions of many variables are fundamentally di erent from functions of one vari-
able, and approximation techniques for such problems are much less developed and
understood. We will discuss in these few pages one approach to this problem which
is truly multivariate in character.
Hilbert's 13th problem, although not formulated in the following terms, was
interpreted by some as conjecturing that not all functions of 3 variables could be
represented as superpositions (compositions) and sums of functions of two varaibles.
Surprisingly it turned that all functions could be so represented, and even more was
true. Kolmogorov and his student Arnold proved in a series of papers in the late
50's that there exist xed continuous one variable functions hij such that every
continuous function f of n variables on [0; 1]n could be represented in the form
2X
n+1 n
?X 
f (x1 ; : : : ; xn) = gi hij (xj )
i=1 j =1
where one chooses the continuous one variable functions gi .
As with any such surprising and deep result, numerous questions and results
were spawned. One question asked had to do with smoothness properties of the hij .
Assuming f is in some class of smooth functions, can one choose smooth hij ?
Di erent answers in di erent frameworks were given. One answer due to Vi-
tushkin and Henkin (separately and together) was given in the mid 60's (see [5] for
a survey of their results). Since the answer is in the negative, we only formulate it
for functions of two variables x and y in [0; 1]2.
Theorem A. For any m xed continuous functions i (x; y), i = 1; : : : ; m, and
continuously di erentiable functions i (x; y), i = 1; : : : ; m, the set of functions
m
nX ?  o
i (x; y ) gi i (x; y ) : gi continuous
i=1
is nowhere dense in the space of all functions continuous in [0; 1]2 with the topology
of uniform convergence.
\If" such a theorem were not true, i.e., if such a representation did in fact exist
with smooth i , then multivariate approximation theory might well look di erent
today. If the i and i were calculable (once and for all), then we could reduce many
multivariable problems to 1-variable problems. However, this is all idle speculation
as in the original Kolmogorov-Arnold result, the hij are not C 1 and not at all
calculable.
The idea of using superpositions (or compositions) of functions with smooth,
nice, functions is a good idea. Let us try to formulate somewhat more precisely
what we have in mind.
To make things more concrete, we will deal with the space of continuous real-
valued functions de ned on IRn, endowed with the topology of uniform convergence
on compact sets. (Note that we are not dealing with the topology of uniform
convergence on all of IRn.)
Let  be a family of continuous (smooth) functions  : IRn ! IR. Let us
consider the space
spanf g((x)) : g 2 C (IR) ;  2 g ;
where x = (x1 ; : : : ; xn) and the g are functions of one variable. Since representation
is, as we have seen, seldom possible using a nite number of smooth interior func-
tions, we will here deal with the related (but di erent) problem of approximation.
The question we will here discuss is that of density. That is when, for any
given f 2 C (IRn), any compact set K  IRn, and " > 0, we can nd gi 2 C (IR) and
i 2 , i = 1; : : : ; m (m nite but free to be chosen) such that
m
X
max f (x) ? gi (i(x)) < " :
x2K
i=1
We will look at two general classes of functions , and at the related question
of when we can also restrict the set of permissible g.
x2. Translation
Let ' be any xed C 1 function de ned on IRn . Consider
? 
M' = spanfg '(x ? a) : g 2 C (IR); a 2 IRng :
That is, we x one and only one function ', and let
 = f'( ? a)g
be the set of all translates of this function.
What can we say in this generality?
a) If ' is such that there exists c < d such that
fx : c  '(x)  dg
is a bounded nonempty set, then M' = C (IRn). (In this case, we can also x a
particular g.) Thus we have density if, for example, '(x) = kxk and the norm k  k
is smooth (or some power of the norm is smooth). This condition is not necessary.
b) A necessary condition for density is that
spanf'(x ? a) : a 2 IRng
must separate points. In other words, for x 6= y there must exist points a and b such
that '(x ? a) 6= '(y ? b). If ' is a polynomial then we do not have the separation
of points if ' is a polynomial of less than n variables. Aside from a linear change of
variables, this is the only case where the above set does not separate points. That
is, if ' is an algrebraic polynomial
spanf'(x ? a) : a 2 IRng
separates points if and only if there does not exist a non-singular n  n matrix
C such that ' is independent of (C x)n. Based on various cases which have been
studied in some detail, we would like to conjecture that if ' is a polynomial and
spanf'(x ? a) : a 2 IRng
separates points, then M' = C (IRn).
Two results permit us to easily restrict the size of the set of the approximants
without losing the density property. The rst is speci c, the second general.
c) If ' is a polynomial, then it is not necessary to consider all translates. In fact, if
A is any subset of IRn for which no non-trivial polynomial vanishes on A, then
?  ? 
spanfg '(x ? a) : g 2 C (IR); a 2 IRng = spanfg '(x ? a) : g 2 C (IR); a 2 Ag :
Thus, in case ' is a polynomial, we can, for example, consider shifts only by elements
of ZZn, or alternatively by all elements of any set with interior.
d) We can always replace all g 2 C (IR) in the above by all g 2 B where B is any
dense subset of C (IR) (in the requisite topology). For example, we can let g run
over the set of all monomials. A more interesting example is the following:
Let  2 C (IR). In the topology of uniform convergence on compact subsets of
IR, it may be shown that
C (IR) = spanf(at + b) : a; b 2 IRg
if and only if  is not a polynomial (see Leshno, Lin, Pinkus and Schocken [2]).
It follows quite easily from this result that for any set , and any  2 C (IR)
which is not a polynomial,
?  ? 
spanfg (x) : g 2 C (IR);  2 g = spanf a(x) + b : a; b 2 IR;  2 g :

Sometimes it is possible to further restrict this set in that we need not run
over all a; b 2 IR. For example, if  is what is called a sigmoidal function. That is,
 2 C (IR) and
lim (x) = 0 and xlim
x!?1 !1  (x) = 1;
then we can replace a; b 2 IR in the above by k; ` 2 ZZ (see Chui, Li [1]). It also
always suces to restrict a to any non-trivial interval.
Thus, for example, if  is sigmoidal and k  k is the Euclidean norm on IRn,
then
C (IRn) = spanf(kkx ? mk + `) : k; ` 2 ZZ; m 2 ZZng :
e) One particular choice of ' is of special interest. Consider n = 2, and '(x; y) =
x2 + y2. In other words, we are considering the space
M' = spanfg?(x ? a)2 + (y ? b)2 ) : g 2 C (IR); (a; b) 2 IR2g :
(Such functions are Radial Functions.) Instead of considering translates (centering)
by all (a; b) 2 IR2 , let us consider translates only by (a; b) 2 A, where A is some
subset of IR2 .
From (c) we know that M' = C (IR2) if no non-trivial polynomial vanishes on A.
However, this is far from necessary. Let us consider some simple sets A which are
zero sets of non-trivial polynomials.
(i) If A is only a nite number of points, then by Theorem A, M' 6= C (IR2 ).
(ii) If A is only a straight line (or any subinterval thereof), then M' 6= C (IR2 ).
However, M' is dense in the space of all continuous functions de ned on the half-
space de ned by the straight line. Thus, for example,
? 
spanfg (x ? a)2 + y2 ) : g 2 C (IR); a 2 IRg = C (D)
where D = f(x; y) : y  0g. In other words, one gets all even functions about the
real axis.
(iii) If A is a set of straight lines in IR2, then
? 
spanfg (x ? a)2 + (y ? b)2 ) : g 2 C (IR); (a; b) 2 Ag 6= C (IR2)
if and only if all these lines have a common intersection point, and the angles
between each of the lines is a rational multiple of .
(iv) If A is an ellipse or parabola, then
? 
spanfg (x ? a)2 + (y ? b)2 ) : g 2 C (IR); (a; b) 2 Ag = C (IR2) :
It would be interesting to determine necessary and sucient conditions on the
set A so that density holds.
x3. \Ridge" functions
Let us look at another simple set . Let h1(x); : : : ; hm (x) be any m xed continuous
functions from IRn to IR. Let  = spanfh1; : : : ; hmg. In other words, consider
m
X
M = spanfg?

ai hi(x) : a = (a1 ; : : : ; am ) 2 IRm ; g 2 C (IR)g :
i=1
When is M dense in C (IRn)?
a) The answer is quite simple, and since the proof is short and elegant, we give it
here.
Proposition B. M = C (IRn) if and only if H (x) = (h1(x); : : : ; hm (x)) separates
points. That is, for x; y 2 IRn, x 6= y, there exists i 2 f1; : : : ; mg such that
hi(x) 6= hi(y).
Proof: Consider the linear span of the set
m
?X k
ai hi (x)
i=1
as we vary over all a 2 IRm and k = 0; 1; 2; : : :. This is an algebra generated by
h`11 (x)  h`mm (x)
where the `i are non-negative integers. Furthermore this algebra contains the con-
stant function and separates points. Thus the density follows from the Stone{
Weierstrass Theorem.

b) Similar to a result of the previous section, assuming we have density in the above
problem, it is not necessary to run over all vectors a 2 IRm . It always suces to run
over all a 2 A, where A is any subset of IRm for which no non-trivial homogeneous
polynomial vanishes on A. In general this sucient condition on A is not necessary.
A Ridge Function, in its simplest sense, is a function of the form
f (x) = g(a  x)
P
where a 2 IRn nf0g is xed (a direction), and a  x = ni=1 ai xi . That is, m = n and
hi(x) = xi in the previous example. (We will also consider a simple generalization
thereof, namely
f (x) = g(Ax)
where A is a k  n matrix and g : IRk ! IR.)
Ridge functions are constant on the hyperplanes a  x = c and thus are partic-
ularly simple functions. Ridge functions are used, with varying degrees of success,
in di erent elds. In Partial Di erential Equations they are called Plane Waves.
(In general, linear combinations of ridge functions occur in the study of hyperbolic
p.d.e.'s with constant coecients.) In Statistics they are used in the theory of
Projection Pursuit and Projection Regression. They are used in the theory of Com-
puterized Tomography (the name Ridge Function was coined by Logan, Shepp [4] in
one of the seminal papers on tomography). In Neural Networks they are used. (More
speci cally in a model in Neural Networks concerned with Multilayer Feedforward
Neural Networks with Input, Hidden and Output layers.) Finally, approximation
theorists are interested in using Ridge Functions as a method of approximating
complicated (multivariate) functions by simple functions (linear combinations of
Ridge Functions).
There is, unfortunately, not very much known about approximating using Ridge
Functions. We mention here some of the results connected with the density problem.
c) What are necessary and sucient conditions on a set A  IRn such that
MA = spanfg(a  x) : a 2 A; g 2 C (IR)g
is dense in C (IRn)?
For n = 2 it is known that it is both necessary and sucient that A contains an
in nite number of pairwise linearly independent points. (Note that if a 2 A, then
we add nothing to MA if we adjoin a to A.) This result has been proved in the
literature at least 3{4 times. The earliest reference we have found is to an article by
Vostrecov, Kreines [6] (early 60's), where they in fact proved the following result.
Theorem C. MA = C (IRn) if and only if no non-trivial homogeneous polynomial
vanishes on A.
This result seems to have not been noticed, as very partial results were later reproved
by others.
d) Let A be a subset of all k  n real matrices, 1  k < n, xed. Set
MA = spanfg(Ax) : A 2 A; g 2 C (IRk )g :
Necessary and sucient conditions for when MA = C (IRn) are as follows (see Lin,
Pinkus [3]):
For each A 2 A, let L(A) denote the linear subspace spanned by the rows of A.
(If L(A) = L(B), then
spanfg(Ax) : g 2 C (IRk )g = spanfg(Bx) : g 2 C (IRk )g :)
Set [
L(A) = L(A) :
A2A
(If k = 1, L(A) is a line through the origin.)
Theorem D. MA is dense in C (IRn) if and only if L(A) is not contained in the
zero set of any non-trivial polynomial. (Or homogeneous polynomial, since L(A) is
homogeneous.)
Here are some facts connected with this result.
1. If L(A) is not contained inn the zero set of some non-trivial polynomial, then
not only is MA dense in C (IR ), but in fact more is true. Namely, the polynomials
are explicitly contained in the set MA. (As nite linear combinations of g(Ax), g
polynomial, A 2 A.)
2. If there exists a non-trivial polynomial p which vanishes on L(A), then we can
prove that for any  2 C01, the function = p(D) satis es
Z
g(Ax) (x) dx = 0
for all g 2 C (IRk ) and for all A 2 A. (Here p(D) = p( @x@ 1 ; : : : ; @x@n ).) Thus we
explicitly construct simple linear functionals which annihilate MA. The density
result therefore holds for any space where the Weierstrass Theorem holds, and the
above is also a linear functional, for some  2 C01.
3. Ifn A = A1 S A2 , then MA is dense in C (IRn) if and only if MAi is dense in
C (IR ) for i = 1 and/or i = 2.
4. Of course, if A contains only a nite number of points (elements), then MA is
not dense in C (IRn).
5. If k = n ? 1, then MA is dense in C (IRn) if A contains an in nite number
of pairwise \distinct" matrices A of full rank n ? 1. (Distinct here means that
L(A) 6= L(B).)
Finally we note some related result. (See Lin, Pinkus [3].)
e) What is MA in general?
MA = spanfg(Ax) : A 2 A; g 2 C (IRk )g
= spanfq(x) : q polynomial; p(D)q = 0 for every
polynomial which vanishes on L(A)g:
This is related to the theory of Polynomial Ideals. (The set of polynomials which
vanish on L(A) is a Polynomial Ideal (of a fairly simple form).) As such we could
actually replace \p(D)q = 0 for every ....." by a nite number of such p.
f) \Variable" Directions.
Assume we are given a xed positive integer r. For A = fA1; : : : ; Ar g, where the
Ai are some k  n matrices, we have that MA is not dense in C (IRn). What if we
keep r xed, but vary the \directions" A1; : : : ; Ar (as well as the gi)? That is, what
if we approximate from the highly nonlinear set
r
X
f gi(Ai x) : gi 2 C (IRk ); Ai k  n matrixg ?
i=1

Unfortunately, one can prove that density does not hold for any r (on any compact
set with interior).

Acknowledgments. This research was supported by the fund for the promotion
of research at the Technion.
References
1. Chui, C. K. and Xin Li, Approximation by ridge functions and neural networks
with one hidden layer, J. Approx. Theory 70 (1992), 131{141.
2. Leshno, L., V. Ya. Lin, A. Pinkus, and S. Schocken, Multilayer feedforward net-
works with a non-polynomial activation function can approximate any function,
Neural Networks, to appear.
3. Lin, V. Ya. and A. Pinkus, Fundamentality of Ridge Functions, J. Approx.
Theory, to appear.
4. Logan, B. F. and L. A. Shepp, Optimal reconstruction of a function from its
projections, Duke Math. J. 42 (1975), 645{659.
5. Vitushkin, A. G. and G. M. Henkin, Linear superpositions of functions, Uspehi
Mat. Nauk 22 (1967), 77{124 = Russian Math. Surveys 22 (1967), 77{125.
6. Vostrecov, B. A. and M. A. Kreines, Approximation of continuous functions by
superpositions of plane waves, Dokl. Akad. Nauk SSSR 140 (1961), 1237{1240
= Soviet Math. Dokl. 2 (1961), 1326{1329.
Vladimir Ya. Lin
Department of Mathematics
Technion, I. I. T.
Haifa, 32000
Israel
Allan Pinkus
Department of Mathematics
Technion, I. I. T.
Haifa, 32000
Israel

View publication stats

You might also like