Approximation of Multivariate Functions
Approximation of Multivariate Functions
Approximation of Multivariate Functions
net/publication/2773445
CITATIONS READS
38 550
2 authors, including:
Vladimir Lin
Technion - Israel Institute of Technology
46 PUBLICATIONS 2,105 CITATIONS
SEE PROFILE
All content following this page was uploaded by Vladimir Lin on 11 August 2014.
Sometimes it is possible to further restrict this set in that we need not run
over all a; b 2 IR. For example, if is what is called a sigmoidal function. That is,
2 C (IR) and
lim (x) = 0 and xlim
x!?1 !1 (x) = 1;
then we can replace a; b 2 IR in the above by k; ` 2 ZZ (see Chui, Li [1]). It also
always suces to restrict a to any non-trivial interval.
Thus, for example, if is sigmoidal and k k is the Euclidean norm on IRn,
then
C (IRn) = spanf(kkx ? mk + `) : k; ` 2 ZZ; m 2 ZZng :
e) One particular choice of ' is of special interest. Consider n = 2, and '(x; y) =
x2 + y2. In other words, we are considering the space
M' = spanfg?(x ? a)2 + (y ? b)2 ) : g 2 C (IR); (a; b) 2 IR2g :
(Such functions are Radial Functions.) Instead of considering translates (centering)
by all (a; b) 2 IR2 , let us consider translates only by (a; b) 2 A, where A is some
subset of IR2 .
From (c) we know that M' = C (IR2) if no non-trivial polynomial vanishes on A.
However, this is far from necessary. Let us consider some simple sets A which are
zero sets of non-trivial polynomials.
(i) If A is only a nite number of points, then by Theorem A, M' 6= C (IR2 ).
(ii) If A is only a straight line (or any subinterval thereof), then M' 6= C (IR2 ).
However, M' is dense in the space of all continuous functions dened on the half-
space dened by the straight line. Thus, for example,
?
spanfg (x ? a)2 + y2 ) : g 2 C (IR); a 2 IRg = C (D)
where D = f(x; y) : y 0g. In other words, one gets all even functions about the
real axis.
(iii) If A is a set of straight lines in IR2, then
?
spanfg (x ? a)2 + (y ? b)2 ) : g 2 C (IR); (a; b) 2 Ag 6= C (IR2)
if and only if all these lines have a common intersection point, and the angles
between each of the lines is a rational multiple of .
(iv) If A is an ellipse or parabola, then
?
spanfg (x ? a)2 + (y ? b)2 ) : g 2 C (IR); (a; b) 2 Ag = C (IR2) :
It would be interesting to determine necessary and sucient conditions on the
set A so that density holds.
x3. \Ridge" functions
Let us look at another simple set . Let h1(x); : : : ; hm (x) be any m xed continuous
functions from IRn to IR. Let = spanfh1; : : : ; hmg. In other words, consider
m
X
M = spanfg?
ai hi(x) : a = (a1 ; : : : ; am ) 2 IRm ; g 2 C (IR)g :
i=1
When is M dense in C (IRn)?
a) The answer is quite simple, and since the proof is short and elegant, we give it
here.
Proposition B. M = C (IRn) if and only if H (x) = (h1(x); : : : ; hm (x)) separates
points. That is, for x; y 2 IRn, x 6= y, there exists i 2 f1; : : : ; mg such that
hi(x) 6= hi(y).
Proof: Consider the linear span of the set
m
?X k
ai hi (x)
i=1
as we vary over all a 2 IRm and k = 0; 1; 2; : : :. This is an algebra generated by
h`11 (x) h`mm (x)
where the `i are non-negative integers. Furthermore this algebra contains the con-
stant function and separates points. Thus the density follows from the Stone{
Weierstrass Theorem.
b) Similar to a result of the previous section, assuming we have density in the above
problem, it is not necessary to run over all vectors a 2 IRm . It always suces to run
over all a 2 A, where A is any subset of IRm for which no non-trivial homogeneous
polynomial vanishes on A. In general this sucient condition on A is not necessary.
A Ridge Function, in its simplest sense, is a function of the form
f (x) = g(a x)
P
where a 2 IRn nf0g is xed (a direction), and a x = ni=1 ai xi . That is, m = n and
hi(x) = xi in the previous example. (We will also consider a simple generalization
thereof, namely
f (x) = g(Ax)
where A is a k n matrix and g : IRk ! IR.)
Ridge functions are constant on the hyperplanes a x = c and thus are partic-
ularly simple functions. Ridge functions are used, with varying degrees of success,
in dierent elds. In Partial Dierential Equations they are called Plane Waves.
(In general, linear combinations of ridge functions occur in the study of hyperbolic
p.d.e.'s with constant coecients.) In Statistics they are used in the theory of
Projection Pursuit and Projection Regression. They are used in the theory of Com-
puterized Tomography (the name Ridge Function was coined by Logan, Shepp [4] in
one of the seminal papers on tomography). In Neural Networks they are used. (More
specically in a model in Neural Networks concerned with Multilayer Feedforward
Neural Networks with Input, Hidden and Output layers.) Finally, approximation
theorists are interested in using Ridge Functions as a method of approximating
complicated (multivariate) functions by simple functions (linear combinations of
Ridge Functions).
There is, unfortunately, not very much known about approximating using Ridge
Functions. We mention here some of the results connected with the density problem.
c) What are necessary and sucient conditions on a set A IRn such that
MA = spanfg(a x) : a 2 A; g 2 C (IR)g
is dense in C (IRn)?
For n = 2 it is known that it is both necessary and sucient that A contains an
innite number of pairwise linearly independent points. (Note that if a 2 A, then
we add nothing to MA if we adjoin a to A.) This result has been proved in the
literature at least 3{4 times. The earliest reference we have found is to an article by
Vostrecov, Kreines [6] (early 60's), where they in fact proved the following result.
Theorem C. MA = C (IRn) if and only if no non-trivial homogeneous polynomial
vanishes on A.
This result seems to have not been noticed, as very partial results were later reproved
by others.
d) Let A be a subset of all k n real matrices, 1 k < n, xed. Set
MA = spanfg(Ax) : A 2 A; g 2 C (IRk )g :
Necessary and sucient conditions for when MA = C (IRn) are as follows (see Lin,
Pinkus [3]):
For each A 2 A, let L(A) denote the linear subspace spanned by the rows of A.
(If L(A) = L(B), then
spanfg(Ax) : g 2 C (IRk )g = spanfg(Bx) : g 2 C (IRk )g :)
Set [
L(A) = L(A) :
A2A
(If k = 1, L(A) is a line through the origin.)
Theorem D. MA is dense in C (IRn) if and only if L(A) is not contained in the
zero set of any non-trivial polynomial. (Or homogeneous polynomial, since L(A) is
homogeneous.)
Here are some facts connected with this result.
1. If L(A) is not contained inn the zero set of some non-trivial polynomial, then
not only is MA dense in C (IR ), but in fact more is true. Namely, the polynomials
are explicitly contained in the set MA. (As nite linear combinations of g(Ax), g
polynomial, A 2 A.)
2. If there exists a non-trivial polynomial p which vanishes on L(A), then we can
prove that for any 2 C01, the function = p(D) satises
Z
g(Ax) (x) dx = 0
for all g 2 C (IRk ) and for all A 2 A. (Here p(D) = p( @x@ 1 ; : : : ; @x@n ).) Thus we
explicitly construct simple linear functionals which annihilate MA. The density
result therefore holds for any space where the Weierstrass Theorem holds, and the
above is also a linear functional, for some 2 C01.
3. Ifn A = A1 S A2 , then MA is dense in C (IRn) if and only if MAi is dense in
C (IR ) for i = 1 and/or i = 2.
4. Of course, if A contains only a nite number of points (elements), then MA is
not dense in C (IRn).
5. If k = n ? 1, then MA is dense in C (IRn) if A contains an innite number
of pairwise \distinct" matrices A of full rank n ? 1. (Distinct here means that
L(A) 6= L(B).)
Finally we note some related result. (See Lin, Pinkus [3].)
e) What is MA in general?
MA = spanfg(Ax) : A 2 A; g 2 C (IRk )g
= spanfq(x) : q polynomial; p(D)q = 0 for every
polynomial which vanishes on L(A)g:
This is related to the theory of Polynomial Ideals. (The set of polynomials which
vanish on L(A) is a Polynomial Ideal (of a fairly simple form).) As such we could
actually replace \p(D)q = 0 for every ....." by a nite number of such p.
f) \Variable" Directions.
Assume we are given a xed positive integer r. For A = fA1; : : : ; Ar g, where the
Ai are some k n matrices, we have that MA is not dense in C (IRn). What if we
keep r xed, but vary the \directions" A1; : : : ; Ar (as well as the gi)? That is, what
if we approximate from the highly nonlinear set
r
X
f gi(Ai x) : gi 2 C (IRk ); Ai k n matrixg ?
i=1
Unfortunately, one can prove that density does not hold for any r (on any compact
set with interior).
Acknowledgments. This research was supported by the fund for the promotion
of research at the Technion.
References
1. Chui, C. K. and Xin Li, Approximation by ridge functions and neural networks
with one hidden layer, J. Approx. Theory 70 (1992), 131{141.
2. Leshno, L., V. Ya. Lin, A. Pinkus, and S. Schocken, Multilayer feedforward net-
works with a non-polynomial activation function can approximate any function,
Neural Networks, to appear.
3. Lin, V. Ya. and A. Pinkus, Fundamentality of Ridge Functions, J. Approx.
Theory, to appear.
4. Logan, B. F. and L. A. Shepp, Optimal reconstruction of a function from its
projections, Duke Math. J. 42 (1975), 645{659.
5. Vitushkin, A. G. and G. M. Henkin, Linear superpositions of functions, Uspehi
Mat. Nauk 22 (1967), 77{124 = Russian Math. Surveys 22 (1967), 77{125.
6. Vostrecov, B. A. and M. A. Kreines, Approximation of continuous functions by
superpositions of plane waves, Dokl. Akad. Nauk SSSR 140 (1961), 1237{1240
= Soviet Math. Dokl. 2 (1961), 1326{1329.
Vladimir Ya. Lin
Department of Mathematics
Technion, I. I. T.
Haifa, 32000
Israel
Allan Pinkus
Department of Mathematics
Technion, I. I. T.
Haifa, 32000
Israel