Cook_umn_0130E_23206
Cook_umn_0130E_23206
Cook_umn_0130E_23206
A Dissertation
SUBMITTED TO THE FACULTY OF THE
UNIVERSITY OF MINNESOTA
BY
Brendan Cook
Jeff Calder
May 2022
Copyright © 2022 Brendan Cook. All rights reserved.
Table of Contents
List of Figures iii
List of Tables iv
1 Overview 1
2 Introduction 3
3 Main results 6
3.1 Definition of Viscosity Solution . . . . . . . . . . . . . . . . . . . 10
3.2 Outline of Proof of Theorem 3.2 . . . . . . . . . . . . . . . . . . 10
3.3 Outline of Paper . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9 Introduction 57
10 Poisson learning 58
10.1 Random walk interpretation . . . . . . . . . . . . . . . . . . . . . 60
10.1.1 Proofs for Section 10.1 . . . . . . . . . . . . . . . . . . . . 63
i
10.2 Variational interpretation . . . . . . . . . . . . . . . . . . . . . . 66
10.2.1 Proofs for Section 10.2 . . . . . . . . . . . . . . . . . . . . 67
10.3 Laplace learning at low label rates . . . . . . . . . . . . . . . . . 76
10.4 The Poisson MBO algorithm . . . . . . . . . . . . . . . . . . . . 77
10.4.1 Proofs for Section 10.4 . . . . . . . . . . . . . . . . . . . . 80
12 Experimental Results 84
13 Continuum limits 88
14 Conclusion 92
References 99
ii
List of Figures
1 Illustration of the sets An , Bn and A, B used in the proof outline,
and the viscosity touching property that An ⊂ A and B ⊂ Bn . . 11
2 Demonstration of spikes in Laplace learning. The graph consists
of n = 104 independent uniform random variables on [0, 1]2 and
two points are given labels of 0 and 1. Most values of the solution
u of Laplace learning are very close to 0.5. . . . . . . . . . . . . . 61
3 Accuracy of Poisson Learning for (a) different numbers of neigh-
bors k used to construct the graph and (b) unbalanced training
data. In (a) we used 5 labels per class and in (b) we used 1 label
per class for the odd numbered classes, and m = 1, 2, 3, 4, 5 labels
per class for the even numbered classes. Both figures show the
difference in accuracy compared to k = 10 and balanced training
data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
iii
List of Tables
1 MNIST: Average accuracy scores over 100 trials with standard
deviation in brackets. . . . . . . . . . . . . . . . . . . . . . . . . . 86
2 FashionMNIST: Average accuracy scores over 100 trials with
standard deviation in brackets. . . . . . . . . . . . . . . . . . . . 87
3 Cifar-10: Average accuracy scores over 100 trials with standard
deviation in brackets. . . . . . . . . . . . . . . . . . . . . . . . . . 88
iv
1 Overview
In this thesis we present two papers applying partial differential equations (PDEs)
in data science. In Part I we present the paper ”Rates of Convergence for the
Continuum Limit of Nondominated Sorting” [22] published in the SIAM Journal
of Mathematical Analysis, which builds on prior work by Calder et al. [16]
which found that the nondominated sorting process used in multi-objective
optimization has a continuum limit PDE. In this work, we prove a convergence
rate which holds with high probability, bounding the speed of convergence of
nondominated sorting to its continuum limit. This result can be applied to show
a convergence rate for the PDE-based ranking algorithm proposed by Calder et
al. in [18], which efficiently approximates nondominated sorting by solving the
continuum limit PDE numerically.
In Part II we present the paper ”Poisson Learning: Graph Based Semi-
Supervised Learning At Very Low Label Rates” [15] which introduces a new
algorithm for semi-supervised learning on graphs. Semi-supervised learning is
a subfield of machine learning that combines the paradigms of supervised and
unsupervised learning, by making use of both labeled data and unlabeled data
in a prediction task. A graph is a data structure that describes a network of
objects and the relationships between them. For example, in a social network
such as Facebook, each object represents a user, and each user is connected
to a set of other users through friendships. Other examples of graph data
include transportation networks, citation databases, websites, and many others.
When the data have a graph structure and labeled data is limited, graph-based
semi-supervised learning can leverage the geometric and topological relationships
between the unlabeled and labeled data to enhance the prediction. In graph-
based semi-supervised learning, data are given in the form of a weighted graph
with a small subset of labeled nodes, and learning algorithms make use of the
graph structure to propagate information from the labeled nodes to the unlabeled
nodes to make an accurate prediction. This paper introduces the Poisson learning
algorithm, which is a new approach to graph-based semi-supervised learning
that excels with very little labeled data.
1
Part I
2
2 Introduction
The sorting of multivariate data is an important problem in many fields of
applied science [17]. Nondominated sorting is a discrete process that is widely
applied in multiobjective optimization and can be interpreted as arranging a
finite set of points in Euclidean space into layers according to the coordinatewise
partial order. Let ≤ denote the coordinatewise partial order on Rd given by
x ≤ y ⇐⇒ xi ≤ yi for all i = 1, . . . , d.
shown in [17] that if the Xi are i.i.d. random variables on Rd+ with density ρ,
cd
then n−1/d Un → d u almost surely in L∞ (Rd ) as n → ∞ where u is the unique
nondecreasing viscosity solution of the problem
(
ux1 ux2 . . . uxd = ρ in Rd+
(2.1)
u=0 on ∂Rd+ .
3
datasets can be approximated by solving a partial differential equation numeri-
cally. This idea was developed further by Calder et al. in [18] which proposed a
fast approximate algorithm for nondominated sorting called PDE-based ranking
based on estimating ρ from the data and solving the PDE numerically. It was
shown in [18] that PDE-based ranking is considerably faster than nondominated
sorting in low dimensions while maintaining high sorting accuracy.
In this paper, we establish rates of convergence for the continuum limit of
nondominated sorting. This is an important result in applications of PDE-based
ranking [1, 40] where it is important to consider how the error scales with the
size n of the dataset. The problem has several features that complicate the proof.
The Hamiltonian H(p) = p1 . . . pd is not coercive, which is the standard property
required to prove Lipschitz regularity of viscosity solutions [4]. If one takes a dth
root of the PDE to replace the Hamiltonian with H(p) = (p1 . . . pd )1/d , we obtain
a concave H at the cost of losing local Lipschitz regularity. In particular, solutions
of (2.1) are neither semiconcave nor semiconvex in general. Furthermore, u is
not Lipschitz due to the lack of boundary smoothness and coercivity. Our proof
approximates the solution to (2.1) by the solution to the auxiliary problem
(
ux1 ux2 . . . uxd = ρ in ΩR
(2.2)
u=0 on ∂R Ω,
where
n o
ΩR = x ∈ [0, 1]d : (x1 . . . xd )1/d > R
and
n o
∂R Ω = x ∈ [0, 1]d : (x1 . . . xd )1/d = R ,
4
bound the blowup rate of the semiconvexity constant of u at the boundary. The
semiconvex regularity of u on the auxiliary domain enables us to avoid use of
a sup-convolution approximation for this direction, bolstering the convergence
rate. The proof uses a closed-form asymptotic expansion to obtain a smooth
approximate solution to (2.2) near the boundary, and computes semiconvexity
estimates for the approximation analytically. We believe this argument is new,
as the typical arguments found in the literature for proving semiconvexity near
the boundary proceed by means of vanishing viscosity [4]. We also extend the
semiconvexity estimates to the full domain with a doubling variables argument
which is new and simpler compared to the standard tripling variables approach
[4].
Our convergence rate proof is at a high level similar to the proofs of con-
vergence rates for stochastic homogenization of Hamilton-Jacobi equations in
[3], which uses Azuma’s inequality to control fluctuations and a doubling vari-
ables argument to prove convergence rates. Apart from the viscosity solution
theory, the main machinery we use is the convergence rate for the longest chain
problem proved by Bollobás and Brightwell in [7], whose proof is also based
on Azuma’s inequality. As our PDE is first-order, our approach uses the inf
convolution instead of a doubling variables argument which leads to an equivalent
but somewhat simplified argument.
As described in [11], this continuum limit result can be viewed in the context
of stochastic homogenization of Hamilton-Jacobi equations. One may interpret
Un as the discontinuous viscosity solution of
n
X
in Rd+
U
n,x1 Un,x2 . . . Un,xd = δ Xj
j=1 (2.3)
∂Rd+ .
Un = 0 on
The sense in which Un solves the PDE (2.3) is not obvious. By mollifying Un ,
one obtains a sequence Unε of approximate solutions to (2.3). It can be shown
that Unε converges pointwise to CUn as ε → 0 where the constant C depends on
the choice of mollification kernel.
Our proof techniques may also be applicable to several other related problems
in the literature. The convex peeling problem studied in [20] bears many
similarities to our problem, and similar ideas may give convergence rates for
the convex peeling problem, provided the solutions of the continuum PDE are
sufficiently smooth. The papers [58, 13] introduce numerical methods for the
5
PDE (2.1) and prove convergence rates. Our semiconvex regularity results could
be used to improve the convergence rates of the above papers to O(h) in one
direction. We also suspect the methods used in our paper could be adapted to
the directed last passage percolation problem studied in [10].
We also briefly note that nondominated sorting is equivalent to the problem
of finding the length of a longest chain (i.e. a totally ordered subset) in a partially
ordered set, which is a well-studied problem in the combinatorics and probability
literature [59, 33, 8, 25]. In particular, Un (x) is equal to the length of a longest
chain in X consisting of points less than x in the partial order.
3 Main results
We begin by introducing definitions and notation that will be used throughout
the paper. In our results and proofs, we let C denote a constant that does not
depend on any other quantity, and Ck denotes a constant dependent on the
variable k. Be advised that the precise value of constants may change from line
to line. To simplify the proofs, we model the data using a Poisson point process.
Given a nonnegative function ρ ∈ L1 (Rd ), we let Xρ denote a Poisson point
process with intensity function ρ. Hence, Xρ is a random, at most countable
subset of Rd with the property that for every Borel measurable set A ⊂ Rd , the
R
cardinality N (A) of A ∩ Xρ is a Poisson random variable with mean A ρ dx.
Given two measurable disjoint sets A, B ⊂ Rd , the random variables N (A) and
N (B) are independent. Further properties of Poisson processes can be found in
[45]. In this paper we consider a Poisson point process Xnρ where n ∈ N and
ρ ∈ C(Rd ) satisfies
and n o
∂R Ω = x ∈ [0, 1]d : (x1 . . . xd )1/d = R .
6
Let u denote the viscosity solution of (2.2). Given a finite set A ⊂ Rd , let ℓ(A)
denote the length of the longest chain in the set A. Given a domain Ω ⊂ Rd ,
the Pareto-depth function Un in Ω is defined by
d −1/d
un (x) = n Un (x) (3.2)
cd
d −1/d
For a subset S ⊂ Rd , we write un (S) to denote cd n ℓ(S ∩ Xnρ ). This
particular scaling is chosen to eliminate the constant on the right-hand side of
(3.6).
Remark 3.1. There are several results regarding the constant cd that have been
established in the literature. Hammersley showed that limn→∞ n−1/2 ℓ(Xn ∩
[0, 1]2 ) = c a.s. and conjectured that c = 2 in [33]. In subsequent works, Logan
and Shepp [50] and Vershik and Kerov [61] showed that c ≥ 2 and c ≤ 2. The
exact values of cd for d > 2 remain unknown, although Bollobás and Winkler
showed in [8] that
d2
1 1
≤ cd < e for all d ≥ 1.
d! Γ
d
d
Now we state our main convergence rate results. Let un denote the Pareto-
depth function in ΩR and let u denote the viscosity solution of (2.2).
Theorem 3.2. Given k ≥ 1 and ρ ∈ C 0,1 (Rd ) satisfying (3.1), the following
statements hold.
2
(a) Given R ∈ (0, 1], and n1/d ≥ Cd,k,ρ R−(2d −d−1)
we have
1/2 !
log2 n
−2d2 +d+1
P sup (un − u) > Cd,ρ,k R 4 n−1/4d ≤ Cd,ρ,k R−Cd n−k .
ΩR log log n
(b) Assume ρ ∈ C 2 (Rd+ ). Then there exists Cd > 0 such that for all R ∈ (0, Cd )
7
2
and n1/d ≥ Cd,k,ρ R−2d +4d−4
log(n)C we have
2/3 !
log2 n
−2d2 +d
−1/3d
P sup (u − un ) > Cd,ρ,k R 3 n ≤ Cd,ρ,k R−Cd n−k .
ΩR log log n
In the next result we state our convergence rates on Ω0 = [0, 1]d which are proved
by using u as an approximation to v and setting R equal to the optimal value
that balances the approximation error term with the convergence rate. Let
d −1/d
vn (x) = n ℓ(Xnρ ∩ [0, x]) (3.5)
cd
Theorem 3.3. Given k ≥ 1 and ρ ∈ C 0,1 (Rd ) satisfying (3.1), the following
statements hold.
Observe that the rate in (b) is sharper thanks to the sharper one-sided rate
in Theorem 3.3. We do not know for certain whether the rates in Theorem 3.2
and 3.3 are optimal, although it seems likely that they are not.
8
These results also extend to the situation when Xnρ = {Y1 , . . . , Yn } where
Y1 , . . . , Yn are i.i.d. random variables with continuous density ρ. The analogues
of Theorems 3.2 and 3.3 in this context follow from Lemma 7.3.
A key step in our proof of the sharper one-sided rate is a quantitative estimate
on the semiconvexity constant of u. As the Hamiltonian H(p) = (p1 . . . pd )1/d
is concave, the results on semiconvex viscosity solutions in [4] would lead us to
suspect that u is semiconvex. However, from an examination of the function
w(x) = d(x1 . . . xd )1/d that solves (2.1) with ρ = 1, it is evident that solutions of
(2.1) on Rd+ need not be semiconvex nor semiconcave due to the gradient singu-
larity on the coordinate axes. This motivates us to determine the rate at which
the semiconvexity constant of u on ΩR blows up as R → 0+ . For proving these
results it is convenient to raise the PDE to the 1/d power and pose the Dirichlet
problem on the more general domains ΩR,M = x ∈ [0, M ]d : (x1 . . . xd )1/d > R
with boundary conditions on ∂R,M Ω = x ∈ [0, M ]d : (x1 . . . xd )1/d = R . Let
R > 0, M ≥ 1, and let u denote the solution of
(
(ux1 ux2 . . . uxd )1/d = ρ1/d in ΩR,M
(3.6)
u=0 on ∂R,M Ω.
Our result on semiconvexity bounds the rate at which the semiconvexity constant
of u on ΩR,M blows up as R tends to 0. This result enables us to establish the
sharpened one-sided convergence rates in case (b) of Theorem 3.2 and Theorem
3.3.
Theorem 3.5. Let u denote the solution to (3.6). Then there exists a constant
Cρ > 0 such that for all R ≤ Cρ , x ∈ ΩR,M , and h ∈ Rd such that x ± h ∈ ΩR,M
we have
where
−(d−1)/d
Cd,M,ρ = Cd (1 + M ρ1/d
max ) ρ min ∥Dρ∥ ∞
L (∂R,M Ω) M 2d−1
+ ρ 1/d
max M 2d−2
.
9
3.1 Definition of Viscosity Solution
Here we briefly state for reference the definition of viscosity solution for the
first-order equation
H(Du, u, x) = 0 in Γ, (3.7)
H(Dφ(x), φ(x), x) ≤ 0.
H(Dφ(x), φ(x), x) ≥ 0.
and
Sp := x ∈ (−∞, 0]d : 1 + x · p−1 ≥ 0
(3.9)
10
(a) An ⊂ A (b) B ⊂ Bn
Figure 1: Illustration of the sets An , Bn and A, B used in the proof outline, and
the viscosity touching property that An ⊂ A and B ⊂ Bn .
by
p1 · · · pd
|Sp | = . (3.10)
dd
The sets A and B in Figure 1 show examples of orthogonal simplices. The longest
chain in an orthogonal simplex, un (Sp ), can be thought of as a cell problem
from homogenization, in the sense that it is a simpler local problem, whose
solution allows us to prove our main results. The value of p will turn out to be
proportional to the gradient Du of the continuum limit u, as in homogenization,
and the cell problem exactly describes the local behaviour of un for large n.
For simplicity, we will take the intensity ρ to be constant on Rd throughout
the rest of this section, and we denote the constant value by ρ > 0. The extension
to nonconstant intensities follows by approximating ρ from above and below by
constant intensities on the simplices Sp , which are vanishingly small as n → ∞.
It was shown in [11] that
with probability one. This is proved by reducing to the unit cell problem un (S1 )
using dilation invariance of un and the sets Sp . In particular, if Φ : Rn → Rn is
any dilation (i.e., Φx = (a1 x1 , . . . , ad xd ) for ai > 0), then we have ΦSp = SΦ−1 p
and so
ℓ(Xnρ ∩ Sp ) = ℓ(ΦXnρ ∩ ΦSp ) ∼ ℓ(Xn|Φ|−1 ρ ∩ ΦSp ).
11
We then choose Φ so that ΦSp = S1 , that is ai = pi , to obtain
This shows that the scaling limit (3.11) for a general simplex Sp follows directly
follow from one for the unit simplex S1 .
The first ingredient in our proof is a convergence rate, with high probability,
for the cell problem (3.11). In particular, in Theorem 5.6 we improve (3.11) by
showing that
un (Sp ) = dρ1/d |Sp |1/d + O n−1/2d |Sp |1/2d (3.12)
Basically by definition we have un (An ) ≈ ε (see Lemma 6.2 for a precise statement
of this). Now, if un is well approximated by a smooth function u, then we can
Taylor expand u to show that An ≈ x0 + Sp where p−1 = ε−1 Du(x0 ). In this
case we use (3.10) to obtain
p1 · · · pd εd
|Sp | = d
= d ,
d d ux1 (x0 ) · · · uxd (x0 )
and hence
ερ1/d
ε ≈ un (An ) ≈ un (Sp ) ≈ dρ1/d |Sp |1/d = . (3.13)
(ux1 (x0 ) · · · uxd (x0 ))1/d
12
viscosity property. That is, if un − φ attains its maximum at x0 , then
and so
φ(x0 ) − φ(x) ≤ un (x0 ) − un (x).
It follows that An ⊂ A, where A is the corresponding set defined for the test
function φ, given by
13
semiconvex nor semiconcave in general. We thus obtain the rates in Theorem 3.3
by approximation to the rounded off case (2.2), leading to substantially worse
rates of convergence in the presence of the corner singularity in (2.1).
While our proof techniques are at a high level similar to [3], the details
are substantially different and cannot be compared directly. We can, however,
compare the final convergence rates we obtain. In [3] the authors consider
stochastic homogenization of Hamilton-Jacobi equations of the form
x
uεt + H Duε , , ω = 0 in Rd × (0, ∞),
ε
for any δ > 0, in the setting where H is level-set convex and coercive in the
gradient. Our Hamiltonian H(p) = p1 · · · pd is level-set concave (and in fact we
can write it as H(p) = (p1 · · · pd )1/d to obtain a concave Hamiltonian), but it is
not coercive. Recalling that nondominated sorting can be viewed as a stochastic
Hamilton-Jacobi equation (2.3) with rapidly oscillating terms on the order of
ε = n−1/d we see that our rates in Theorem 3.2 yield
14
4 Maximum Principle and Lipschitz estimates
In this section we establish fundamental results regarding the PDE (2.1) that
are used throughout the paper. First we show that if u satisfies ux1 . . . uxd = ρ
on a domain Ω, then a closely related PDE is also automatically satisfied at
certain boundary points. Given M > 0, let Ω ⊂ [0, M ]d and define
(4.1)
Lemma 4.1. Given Ω ⊂ [0, M ]d , let ∂ ∗ Ω be given by (4.1) and let ρ ∈ C(Ω)
satisfy (3.1). Then the following statements hold.
d
Y
(uxi )+ ≤ ρ in Ω ∪ ∂ ∗ Ω.
i=1
d
Y
(uxi )+ ≥ ρ in Ω ∪ ∂ ∗ Ω.
i=1
Proof. To prove (b), let ψ ∈ C ∞ (Rd ) such that u − ψ has a local minimum at
x0 ∈ Ω, and show that ψxi (x0 ) ≥ 0. For y in a neighborhood of x0 we have
Hence, ψxi (x0 ) ≥ 0. Now let x0 ∈ Ω ∪ ∂ ∗ Ω and let φ ∈ C ∞ (Rd ) such that u − φ
has a local minimum at x0 . Without loss of generality, we may assume that
u − φ attains a strict global minimum at x0 . If x0 ∈ Ω, then φxi (x0 ) ≥ 0, and
15
we have
d
Y d
Y
(φxi (x0 ))+ = φxi (x0 ) ≥ ρ(x0 ).
i=1 i=1
Pd
If x0 ∈ ∂ ∗ Ω, let φε (x) = φ(x)−ε 1
i=1 M −xi , and we claim that u−φε attains its
minimum over Ω in Ω ∩ [0, M ) . To prove this, let yk ∈ [0, M )d be a minimizing
d
d
Y εk
φxj (xk ) − ≥ ρ(xk ).
j=1
(M − xk,j )2
Qd
Letting k → ∞, we have j=1 φxj (x0 ) + ≥ ρ(x0 ). To prove (a), let x0 ∈
Ω ∪ ∂ ∗ Ω, and let φ ∈ C ∞ (Rd ) such that u − φ has a local maximum at x0 . If
φxi (x0 ) ≤ 0 for some 1 ≤ i ≤ d, then we have
d
Y
0= φxj (x0 ) +
≤ ρ(x0 ).
j=1
d
Y d
Y
φxj (x0 ) = φxj (x0 ) + ≤ ρ(x0 ).
j=1 j=1
16
xk ∈ [0, M )d . Then when k is large we have xk ∈ Ω, hence
d d
Y Y εk
φxj (xk )+ ≤ φxj (xk ) + ≤ ρ(xk ).
j=1 j=1
(M − xk,j )2
Since φ is smooth, we have φxj (xk ) > 0 for k sufficiently large. Letting k → ∞,
we have
d
Y
φxj (x0 )+ ≤ ρ(x0 ).
j=1
Proposition 4.2. Given V ⊆ Rd , let ρ ∈ C(V ) satisfy (3.1). Then the following
statements hold.
Proof. To prove (a1), let x ∈ V . Then there exists φ ∈ C ∞ (Rd ) such that u − φ
has a local minimum at x. Consequently, (1+λ)u−(1+λ)φ has a local minimum
at x, so
L((1 + λ)Dφ(x)) = (1 + λ)f (x) ≥ f (x) + λ(inf f )
V
17
and the statement follows. The proofs of the other statements are very similar
and omitted here, making use of the inequalities (1 + λ)d ≥ (1 + dλ) in (a2) and
(1 − λ)d ≤ (1 − λ) in (b2).
and (
vx1 vx2 . . . vxd ≥ ρ in Ω
(4.3)
v = g2 on Γ,
respectively. Assume that v is nondecreasing in each coordinate and g1 ≤ g2 on
Γ. Then u ≤ v on Ω.
Proof. Given λ ∈ (0, 1), set vλ = (1 + λ)v, and suppose for contradiction that
supΩ (u − vλ ) > 0. Let
α 2
Φ(x, y) = u(x) − vλ (y) − |x − y| .
2
2 C
|xα − yα | ≤ . (4.4)
α
18
We cannot have x0 ∈ Γ, since u(x0 ) − vλ (x0 ) > 0 and u ≤ vλ on Γ. Hence,
x0 ∈ Ω ∪ ∂ ∗ Ω. As u − vλ ≤ 0 on Γ and (u − vλ )(x0 ) > 0, by continuity of
u − vλ there exists N > 0 such that (xn , yn ) ∈ (Ω ∪ ∂ ∗ Ω) × (Ω ∪ ∂ ∗ Ω) for n > N .
2 2
Let φ(x) = α2n |x − yn | and ψ(x) = − α2n |xn − y| . Then u − φ has a local
maximum at xn and vλ − ψ has a local minimum at yn . Setting H(p) = p1 . . . pd ,
Proposition 4.2 gives that H(Dvλ ) ≥ ρ + δ on Ω, where δ = λρmin > 0. By
∗
Lemma 4.1 we have H(Dv λ ) ≥ ρ + δ and H(Du) ≤ ρ on Ω ∪ ∂ Ω. Thus, we have
e e
and
where Z 1
1/d
J(γ) = g(γ(t))1/d [γ1′ (t) . . . γd′ (t)] dt,
0
and
Ux Ux . . . Ux = 1 g
1 2 d
in Rd+
dd (4.6)
U =0 on ∂Rd+ .
19
Theorem 4.4. Given ρ ∈ C 0,1 (Rd ) satisfying (3.1), let u denote the solution to
(3.6). Then we have
hzi
A simple calculation shows that |z − Φ(z)| ≤ h+xi ≤ Ch for z ∈ [0, 2x]d . Hence,
we have
hγi (t) hxi
|γ(t) − γ(t)| ≤ ≤ . (4.8)
h + xi h + xi
The above gives us
hγi (t) 1
|f (γ(t)) − f (γ(t))| ≤ [f ]C 0,1 (ΩR,M ) ≤ hxi [f ]C 0,1 (ΩR,M ) ≤ h[f ]C 0,1 (ΩR,M ) .
h + xi xi
(4.9)
20
We have
Z 1
1/d 1/d
J(γ) − J(γ) = f (γ(t)) [γ1′ (t) · · · γd′ (t)]
− f (γ(t)) [γ ′1 (t) · · · γ ′d (t)] dt
0
1 ′ 1/d !
γ 1 (t) . . . γ ′d (t)
Z
1/d
= f (γ(t)) − f (γ(t)) ′ [γ1′ (t) . . . γd′ (t)] dt
0 γ1 (t) · · · γd′ (t)
1/d !
Z 1
xi 1/d
= f (γ(t)) − f (γ(t)) [γ1′ (t) · · · γd′ (t)] dt.
0 xi + h
Furthermore,
1/d ! 1/d !
xi xi
f (γ(t)) − f (γ(t)) = (f (γ(t)) − f (γ(t))) + 1 − f (γ(t))
xi + h xi + h
xi
≤ (f (γ(t)) − f (γ(t))) + 1 − f (γ(t))
xi + h
h
≤ (f (γ(t)) − f (γ(t))) + f (γ(t))
xi
h
≤ h ∥f ∥C 0,1 (ΩR,M ) + ∥f ∥C 0,1 (ΩR,M )
xi
= h(1 + x−1
i ) ∥f ∥C 0,1 (ΩR,M )
We conclude that
Z 1
1/d
J(γ) − J(γ) ≤ h(1 + x−1
i ) ∥f ∥C 0,1 (ΩR,M ) [γ1′ (t) . . . γd′ (t)]
0
d
Y
≤ h(1 + x−1
i ) ∥f ∥C 0,1 (ΩR,M ) (γj (1) − γj (0))1/d
j=1
(x1 . . . xd )1/d
≤ 2h ∥f ∥C 0,1 (ΩR,M ) max .
x∈ΩR,M xi
(x1 . . . xd )1/d
max = R−(d−1) M d−1 .
x∈ΩR,M xi
21
and consequently that
1/d
lim un (S) = dρ1/d |S| a.s.
n→∞
In this section, we establish explicit rates of convergence for the length of the
longest chain in rectangles and simplices. We begin by stating a simple property
of Poisson processes, whose proof is found in [45].
Theorem 5.2. Let Xn be a Poisson point process on [0, 1]d with intensity n.
Then there exists a constant Cd such that for all n > Cd we have
d d 1/2d log n
P Un ([0, 1] ) − EUn ([0, 1] ) > Cd tn ≤ 4t2 exp(−t2 )
log log n
22
n1/2d
for all t satisfying 2 < t < log log n . Furthermore,
log3/2 n
cd n1/d ≥ EUn ([0, 1]d ) ≥ cd n1/d − Cd n1/2d
log log n
where cd is given by
Theorem 5.3. Let Xnρ be a Poisson point process on [0, 1]d where ρ > 0 is a
(ρn)1/2d
constant. Then for all n > Cd ρ−1 and all t satisfying 2 < t < log log ρn we have
log ρn
P un ([0, 1]d ) − Eun ([0, 1]d ) > Cd n−1/2d ρ1/2d t ≤ 4t2 exp(−t2 ).
log log ρn
Furthermore,
log3/2 ρn
dρ1/d ≥ Eun ([0, 1]d ) ≥ dρ1/d − Cd ρ1/2d n−1/2d .
log log ρn
Theorem 5.4. Let Xnρ denote a Poisson point process on Rd with intensity nρ
where ρ ∈ C(Rd ) satisfies (3.1). Given x, y ∈ Rd with xi < yi for i = 1, . . . , d,
let R = [x, y] := w ∈ Rd : xi ≤ wi ≤ yi for i = 1, . . . , d .
−1
(a) For all n > Cd (supR ρ)−1 |R| and t satisfying
1/2d 1
Cd < t < Cd n1/2d (sup ρ)1/2d |R|
R log log n(supR ρ) |R|
23
we have
!
1/d 1/d −1/2d 1/2d 1/2d log3/2 (n |R| (supR ρ))
P un (R) − d(sup ρ) |R| > Cd tn (sup ρ) |R|
R R log log(n |R| (supR ρ))
≤ 4t2 exp(−t2 ).
−1
(b) For all n > Cd (inf R ρ)−1 |R| and t satisfying
1/2d 1
log1/2 (n(inf ρ) |R|) < t < (inf ρ)1/2d n1/2d |R|
R R log log n(inf R ρ) |R|
we have
1/d 1/2d log(n |R| (inf R ρ))
P un (R) − d(inf ρ)1/d |R| < −Cd tn−1/2d (inf ρ)1/2d |R|
R R log log(n |R| (inf R ρ))
≤ 4t2 exp(−t2 ).
Proof. We shall prove only (a), as the proof of (b) is similar. Without loss of
generality we may take R to be the rectangle [0, y] with y ∈ Rd+ . By Lemma 5.1
there exists a Poisson process X n ⊃ Xnρ on Rd with intensity function nρ where
ρ = (supR ρ) 1R + ρ1Rd \R . Given A ⊂ Rd , let un (A) = n−1/d ℓ(A ∩ X n ) and
set Φ(x) = xy11 , . . . xydd . Then Yn := Φ(X n ) is a Poisson process with intensity
n |R| ρ and un (R) = n−1/d ℓ([0, 1]d ∩ Yn ). Let E be the event that
and
24
Assume that E holds for fixed choices of t and n. As un (R) ≤ un (R), we have
1/d 1/d
un (R) − d(sup ρ)1/d |R| ≤ un (R) − Cd (sup ρ)1/d |R|
R R
1/d
≤ |un (R) − Eun (R)| + (Eun (R) − Cd (sup ρ)1/d |R| )
R
Now we extend the preceding result to establish rates of convergence for the
longest chain in an orthogonal simplex of the form Sy,q as in (3.8). The lower
one-sided rate is easily attained taking the rectangle R ⊂ S with largest volume
and applying Theorem 5.4. To prove the upper one-sided rate, we embed S into
a finite union of rectangles and apply the union bound. The following result
verifies the existence of a suitable collection of rectangles.
Lemma 5.5. Given y ∈ Rd and q ∈ Rd+ , let Sy,q be as in (3.8). Given ε > 0,
there exists a finite collection R of rectangles covering Sy,q satisfying
1/d 1/d
Cd ε |Sy,q | ≤ |R| ≤ |Sy,q | + Cd ε |Sy,q | for all R ∈ R, (5.1)
and
and
Proof. Without loss of generality, we may take y = 0 and prove the statements
for the simplex Sq := S0,q . Letting 1 denote the ones vector, we first prove
the statements for the simplex S := x ∈ [0, ∞)d : x · 1 ≤ 1 , and then obtain
the general result via reflection and scaling. Let P = x ∈ [0, ∞)d : x · 1 = 1 .
Fix x0 ∈ P , and let R = [0, x + ε1] : x ∈ P ∩ (x0 + εZd ) . It is clear from the
definition of R that (5.2) and (5.3) hold. By the Arithmetic-Geometric Mean
Inequality, for all x ∈ P we have
d
Y 1
ε≤ (xi + ε)1/d ≤ +ε
i=1
d
25
S
and it follows that (5.1) holds. To show that S ⊂ R∈R R, let y ∈ P . Then
∗ d
there exists y ∈ (x0 + εZ ) such that |yi − ≤ ε for 1 ≤ i ≤ d. Hence, yi∗ |
∗
S
y ∈ [0, y ], and it follows that S ⊂ R∈R R. This concludes the proof for
S, and we now leverage this result to prove the statement for the simplex
−xd
Sq = x ∈ (−∞, 0]d : 1 + x · q ≥ 0 . Let Φ(x) = ( −x
q1 , . . . qd ), so Φ(S) = Sq .
1
and
Theorem 5.6. Let ρ ∈ C 0,1 (Rd ) satisfy (3.1), and given y ∈ Rd and q ∈ Rd+
1/d
let Sy,q be given by (3.8). Assume that d |Sy,q | ≤ 1. Then for all k ≥ 1 and
−1 2d
n > Cd,ρ,k |Sy,q | log(n) we have
!
1/d −1/2d 1/2d log2 n
P un (Sy,q ) − d(sup ρ) 1/d
|Sy,q | > Cd,k,ρ n |Sy,q | ≤ Cd,k n−k
Sy,q log log n
and
log2 n
1/d −1/2d 1/2d
P un (Sy,q ) − d( inf ρ) 1/d
|Sy,q | < −Cd,k,ρ n |Sy,q | ≤ Cd,k n−k .
Sy,q log log n
Proof. We present the proof of the first statement only, as the proof of the
second is similar and simpler. Without loss of generality, we may take y = 0
and prove the result for the simplex Sq = S0,q . We first prove the result for the
simplex S := x ∈ (−∞, 0]d : 1 + x · 1 ≥ 0 , and then obtain the general result
via a scaling argument. Given ε > 0, we may apply Lemma 5.5 to conclude there
exists a collection R of rectangular boxes covering S such that |R| ≤ Cd ε−(d−1) ,
26
1/d
Cd ε ≤ |R| ≤ d1 + Cd ε for each R ∈ R, and dist(y, S) ≤ ε for y ∈ R ∈ R.
Set R = R∈R R. As ρ ∈ C 0,1 (Rd ), we have supR ρ ≤ supS ρ + [ρ]C 0,1 (Rd ) ε for
S
y ∈ R. Let K = supS ρ + [ρ]C 0,1 (Rd ) ε . By Lemma 5.1 there exists a Poisson
process X n ⊃ Xnρ on Rd with intensity function nρ where ρ = K 1R + ρ1Rd \R .
d −1/d
Given A ⊂ Rd , let un (A) = cd n ℓ(A ∩ X n ). As X n ⊃ Xnρ and S ⊂ R, we
have un (S) ≤ un (S) ≤ maxR∈R un (R). Let ER be the event that
1/d
un (R) − dK 1/d |R| ≤ Cd νK 1/2d (5.4)
3/2
log (nK) −1
where ν = tn−1/2d log log(nK) . For any ε < Cd , n > Cd |R| K −1 , and 2 < t <
(nK)1/2d
log log nK , we have P(ER ) ≥ 1 − 4t2 exp(−t2 ) by Theorem 5.4. Letting E be the
event that (5.4) holds for all R ∈ R, we have P(E) ≥ 1 − Cd ε−d+1 t2 exp(−t2 )
p
by the union bound. Given k ≥ 1, set t = Ck log(n) for a constant C
chosen large enough so n1/2 n−Ck ≤ n−k , and let ε = tn−1/2d . Observe that
the hypotheses of Theorem 5.4 are satisfied when nK > Cd,k log(n)2d , so we
have P(E) ≥ 1 − Cd,k n1/2 n−Ck ≥ 1 − Cd,k n−k . If E holds, then using S ⊂ R,
1/d 1
|R| ≤ d + Cd ε, and (5.4), we have
e 1/d + Cd ν(K
un (Sq ) ≤ K e 1/2d + K
e 1/d ). (5.7)
27
e = dd |Sq | K and d |Sq |1/d ≤ 1, we have
If E holds, then using K
e 1/d + Cd ν(K
un (Sq ) ≤ K e 1/2d + K
e 1/d )
1/d 1/2d
= d |Sq | K 1/d + Cd ν |Sq | (1 + K 1/2d )
1/d 1/2d
≤ d |Sq | (sup ρ)1/d + Cd,ρ ν |Sq |
S
Using our result for S and ℓ(Xnρ ∩ Sq ) = ℓ(Yn ∩ S), we have P(E) ≥ 1 − Cd,k n−k
e −1 log(n)2d .
for n > Cd,k K
Lemma 6.1. Let ρ ∈ C 0,1 (Rd ) satisfy (3.1). Given R ∈ (0, 1), let φ ∈ C(ΩR )
be a non-negative function such that 0 < γ ≤ φxi ≤ γ holds in the viscosity sense
on ΩR for some constants γ ≥ 1 and γ ≤ 1. Given α ≥ 1, k ≥ 1, δ ∈ (0, 1),
2
log (n)
and ε ∈ (0, 1), let Rn = γ 1/2 n−1/2d ε−1/2 log log(n) . Then there exist constants
Cd ≤ 1 and Cd,k,ρ ≥ 1 such that when ε ≤ Cd min α−1 γ 2 γ −1 log(n)−2 , Rd−1 δ
and 1 ≥ λ ≥ Cd,k,ρ (Rn + γ −2 αε) the following statements hold.
Then there exists a constant Cd,ρ,k ≥ 1 such that for all n > 1 with
28
n1/d > Cd,ρ,k ε−1 γ log(n)2 we have
!
P sup (un − (1 + λ)φ) = sup (un − (1 + λ)φ) ≤ Cd,k ε−6d n−k .
ΩR ΩR+δ
Proof. First we introduce the notation used throughout the proof. Let ΩεR = ΩR ∩
d−1/2 ε3 Zd and we define a collection of simplices S = Sx,s : x ∈ ΩεR+δ , s ∈ Γε
Γε = s ∈ ε3 Zd : (4γ)−1 ε ≤ si ≤ 4γ −1 ε .
−1
and let E be the event that ES holds for all S ∈ S. For any n > Cd,k,ρ |S| log(n)2d ,
we have P(ES ) ≥ 1−Cd,k n−k by Theorem 5.6. By choice of Γε , we have (4γ)−1 ε ≤
1/d
d |S| ≤ 4γ −1 ε for S ∈ S. As we assume that n1/d > Cd,ρ,k ε−1 γ log(n)2 ,
−1
we have n > Cd,k,ρ |S| log(n)2d for all S ∈ S. As |Γε | ≤ Cd ε−3d and
|ΩεR | ≤ Cd ε−3d , we have |S| ≤ Cd ε−6d . By the union bound, we have P (E) ≥
1 − Cd,k ε−6d n−k . For the remainder of the proof we assume that E holds. Then
for each S ∈ S we have
1/2d
1/d Cd,k,ρ |S| log2 (n)
un (S) ≤ d |S| (sup ρ1/d ) +
S n1/2d log log(n)
1/d
≤ d |S| (sup ρ1/d )(1 + Cd,k,ρ Rn )
S
29
2
log (n)
where Rn = γ 1/2 ε−1/2 n−1/2d log log(n) . Let w = (1 + λ)φ, and we show that
2
w(x) ≤ w(xn ) + Dw(xn ) · (x − xn ) + (1 + λ)α |x − xn |
2
≤ w(xn ) + Dw(xn ) · (x − xn ) + 2α |x − xn | .
2
−ε − ε3 ≤ w(x) − w(xn ) ≤ Dw(xn ) · (x − xn ) + 2α |x − xn |
≤ Dw(xn ) · (x − xn ) + Cαγ −2 ε2 .
Cαε2 γ −2 +ε
Letting pi = vxi (xn ) , we have A ⊆ Sxn ,p . We show there exist y ∈ ΩεR and
q ∈ Γε such that Sxn ,p ⊆ Sy,q . Let y ∈ ΩεR such that xn ≤ y and |y − xn | ≤ ε3 .
Letting 1 denote the all ones vector, we have Sxn ,p ⊆ Sy,p+ε3 1 . We may choose
q so p + 2ε3 1 ≥ q ≥ p + ε3 1 provided that γ −1 ε ≤ pi ≤ 4γ −1 ε − 2ε3 for each i,
which holds when ε ≤ Cα−1 γ 2 and α ≥ 1. Then Sy,p+ε3 1 ⊂ Sy,q . Using that
30
1/d
(q1 . . . qd )1/d = d |Sy,q | , we have
ε ≤ un (An )
≤ un (Sy,q )
≤ (1 + Cd,k,ρ Rn )(sup ρ)1/d (q1 . . . qd )1/d
Sy,q
d
( )!1/d
1/d
Y Cαε2 γ −2 + ε
≤ (1 + Cd,k,ρ Rn )(sup ρ) + 2ε3
Sy,q i=1
wxi (xn )
d
( )!1/d
Y Cαε2 γ −2 + ε + 2γε3
≤ (1 + Cd,k,ρ Rn )(sup ρ)1/d .
Sy,q i=1
wxi (xn )
1/d
≤ (1 + Cd,k,ρ Rn )(sup ρ)1/d 1 + Cγ −2 αε .
(wx1 (xn ) · . . . wxd (xn ))
Sy,q
As we assume ρ ∈ C 0,1 (Rd ) and ρ ≥ ρmin > 0, we have ρ1/d ∈ C 0,1 (ΩR ), hence
supSy,q ρ1/d ≤ ρ(xn )1/d +Cd,ρ εγ −1 . By our assumption n1/d > Cd,k,ρ ε−1 γ log(n)2 ,
we have Rn ≤ Cd,k,ρ . As ε ≤ Cα−1 γ 2 γ −1 , we have γ −2 αε ≤ C. Hence we have
1/d
(wx1 (xn ) · . . . wxd (xn )) ≤ ρ(xn )1/d + Cd,k,ρ (Rn + γ −2 αε).
We now prove (b). Let ΩεR , Γε , and S be as in (a), and let ES be the event that
1/2d
1/d Cd,k,ρ |S| log2 n
un (S) − d(inf ρ1/d ) |S| ≥− 1/2d
(6.3)
S n log log n
−1
and let E be the event that ES holds for all S ∈ S. For any n > Cd,k,ρ |S| log(n)2d ,
we have P(ES ) ≥ 1 − Cd,k n−k by Theorem 5.6. By choice of Γε , we have
1/d
γ −1 ε ≤ d |S| ≤ 4γ −1 ε for S ∈ S. As we assume that n1/d > Cd,ρ,k ε−2 α−1 γ 2
31
and ε ≤ γ 2 γ −1 log(n)−2 , for all S ∈ S we have
−1/d
n1/d > Cd,ρ,k ε−2 α−1 γ 2 ≥ ε−1 γ log(n)2 ≥ Cd,k,ρ |S| log(n)2 .
By the union bound, we have P (E) ≥ 1 − Cd,k ε−6d n−k . For the remainder of
the proof we assume that E holds. Then for each S ∈ S we have
1/2d
1/d Cd,k,ρ |S| log2 (n) 1/d
un (S) ≥ d |S| (inf ρ1/d ) − ≥ d |S| (inf ρ1/d )(1 − Cd,k,ρ Rn )
S n1/2d log log(n) S
2
log (n)
where Rn = γ 1/2 ε−1/2 n−1/2d log log(n) . Let w = (1 − λ)φ, and we show that
sup (w − un ) ̸= sup (w − un ) .
ΩR ΩR+δ
sup (w − un ) = sup (w − un ) .
ΩR ΩR+δ
d −1/d
un (xn ) − un (x) ≤ w(xn ) − w(x) + n for all y ∈ ΩR . (6.4)
cd
32
Using (6.5), |x − xn | ≤ γ −1 ε, and n−1/d ≤ αγ −2 ε2 , we have
2
w(xn ) − w(x) ≤ Dw(xn ) · (xn − x) + α |x − xn |
≤ ε − Kαγ −2 ε2 + αγ −2 ε2
≤ ε − (K − 1)n−1/d .
2d
Choosing K so (K − 1) = cd , we have x ∈ B. Hence, Sxn ,p ⊂ B. We now
show there exist y ∈ ΩεR and q ∈ Γε such that Sy,q ⊆ Sxn ,p . Let y ∈ ΩεR such
that xn ≥ y and |y − xn | ≤ ε3 . Letting 1 denote the ones vector, we have
Sy,p−ε3 1 ⊆ Sxn ,p . We may choose q so p − ε3 1 ≥ q ≥ p − 2ε3 1 provided that
γ −1 ε + 2ε3 ≤ pi ≤ 4γ −1 ε for each i, which holds when ε ≤ 81 α−1 γ 2 γ −1 . Then
1/d
Sy,q ⊂ Sxn ,p . Using that (q1 . . . qd )1/d = d |Sy,q | , we have
ε ≥ un (Bn )
≥ un (Sy,q )
≥ (1 − Cd,k,ρ Rn )( inf ρ)1/d (q1 . . . qd )1/d
Sy,q
d
( )!1/d
ε − Cd αε2 γ −2
1/d
Y
3
≥ (1 − Cd,k,ρ Rn )( inf ρ) − 2ε
Sy,q
i=1
wxi (xn )
d
( )!1/d
1/d
Y ε − Cd αε2 γ −2 − 2γε3
≥ (1 − Cd Rn )( inf ρ) .
Sy,q
i=1
wxi (xn )
1/d
≥ (1 − Cd,k,ρ Rn )( inf ρ)1/d 1 − Cd γ −2 αε .
(wx1 (xn ) · . . . wxd (xn ))
Sy,q
As we assume ρ ∈ C 0,1 (Rd ) and ρ ≥ ρmin > 0, we have (inf Sy,q ρ1/d ) ≥ ρ(xn )1/d −
Cd,ρ εγ −1 . Hence we have
1/d
≥ ρ(xn )1/d (1 − Cd,k,ρ Rn )(1 − Cd,ρ εγ −1 ) 1 − Cd γ −2 αε
(wx1 (xn ) · . . . wxd (xn ))
≥ ρ(xn )1/d − Cd,k,ρ (Rn + γ −2 αε).
sup (w − un ) ̸= sup (w − un ) .
ΩR ΩR+δ
33
Lemma 6.2. Given x0 ∈ ΩR and ε > 0, let
Hence,
d −1/d d
un (An ) ≤ un (x0 ) − un (x) + n ≤ ε + n−1/d .
cd cd
j
It is clear that ℓ([0, yj ] ∩ S ∩ Xnρ ) ≥ j, as {yi }i=1 is a chain of length j in
[0, yj ] ∩ S ∩ Xnρ . If ℓ([0, yj ] ∩ S ∩ Xnρ ) ≥ j + 1, then there exists a chain
j+1 k
{zi }i=1 in [0, yj ] ∩ S ∩ Xnρ . Concatenating this chain with {yi }i=j+1 yields
k
a chain of length k + 1, contradicting maximality of {yi }i=1 . Now we prove
k
the main result. Let {xi }i=1 be a longest chain in [0, x0 ] ∩ ΩR ∩ Xnρ , and let
j k
j = min {i ∈ {1, . . . , k} : xi ∈ An }. Letting C1 = {xi }i=1 and C2 = {xi }i=j , we
have
34
un (x0 ) − un (xj−1 ) > ε and un (x0 ) ≥ ε we have
d −1/d
un (x0 ) − un (xj ) ≥ ε − n . (6.9)
cd
d −1/d
un (An ) ≥ un (x0 ) − un (x) + n ≥ ε.
cd
Lemma 6.3. Let 0 < δ ≤ R. Then given y ∈ ΩR+δ , there is a constant Cd > 0
such that
dist(y, ∂R Ω) ≥ Cd δRd−1 .
(x1 . . . xd )1/d
fxi (x) = .
dxi
1 −d 2
∥Df ∥L∞ (ΩR \ΩR+δ ) ≤ R (R + δ) ≤ R−d+1 .
d d
Hence, we have
2 −d+1
δ ≤ f (y) − f (x) ≤ R |y − x| .
d
35
Lemma 6.4. The following statements hold.
Proof. We give the proof of (b) only, as the proof of (a) is similar but simpler.
Given h ∈ (0, 1), let
n o
B = x ∈ [0, 2]d : R − ε ≤ (x1 . . . xd )1/d ≤ R + 2ε
and we show that R has the desired properties when h is appropriately chosen.
Let z, y ∈ ΩR \ΩR+ε with z < y and [z, y] ⊆ ΩR \ΩR+ε . Let w(x) = (x1 . . . xd )1/d .
(x1 ...xd )1/d
Then wxi = dxi and if x ∈ B and ε ≤ 12 R we have
(x1 . . . xd )1/d
Cd R ≤ ≤ Cd R−(d−1) . (6.10)
xi
Hence we have |z − y| ≤ Cd R−1 ε. Letting 1 denote the all ones vector, by (6.10)
we have w(y + h1) ≤ (R + ε) + Cd R−(d−1) h and w(z − h1) ≥ R − ε − Cd R−(d−1) h.
Then there exists a constant Cd such that [z − 2h1, y + 2h1] ⊂ B when h =
Cd Rd−1 ε. Letting h = Cd Rd−1 ε, there exist y + ∈ Bh and y − ∈ Bh such that
1/d
y + 2h1 ≥ yi+ ≥ y + h1 and z − 2h1 ≤ yi− ≤ z − h1. Letting A = |[y − , y + ]| ,
we have A ≥ Cd Rd−1 ε and
d
1X +
A≤ (y − yi− ) ≤ Cd (|z − y| + h) ≤ Cd R−1 ε.
d i=1 i
Furthermore, we have
36
Lemma 6.5. Let ρ ∈ C(Rd ) satisfy (3.1), and let un and vn be given by (3.2).
2
log (n)
and (3.5). Given k ≥ 1, ε ∈ (0, 1), and n > 0, let Rn = n−1/2d ε−1/2 log log(n) .
Then the following statements hold.
2
(a) For all n > Cd,k ρ−1
max R
−d −d
ε log(n)4d we have
!
2
P sup vn > Cd,k,ρ ε ≤ Cd,k ε−d n−k .
Ω0 \Ωε
(b) Given R ∈ (0, 1], ε ∈ (0, R], and n > Cd,k ρ−1
max R
−2d −d
ε log(n)4d we have
!
2
P sup un > Cd,k,ρ ε ≤ Cd,k ε−3d n−k .
ΩR \ΩR+ε
Proof. We will prove (b) only, as the proof of (a) is similar. Applying Lemma
6.4 (b) with α = εR, there exists a collection R of rectangles such that
1/d
supΩR \ΩR+ε un ≤ maxR∈R un (R), Cd Rd ε ≤ |E| ≤ Cd ε for E ∈ R, and
2
|R| ≤ Cd R−2d(d−1) α−2d ≤ Cd ε−3d . Given R ∈ R, let ER be the event that
log2 n
un (R) ≤ Cd (sup ρ)1/d ε + Cd k 1/2 n−1/2d ε1/2 .
R log log n
2
If ER holds, then our assumption n > Cd,k ρ−1
max R
−d −d
ε log(n)4d implies that
−1
un (R) ≤ Cd,k,ρ ε. Furthermore, we have n > Cd (supE ρ)−1 |E| for each E ∈ R
(n|E|)1/2d
p p
and k log(n) ≤ log log n|E| . Applying Theorem 5.4 with t = 2k log(n), we
have P(ER ) ≥ 1 − Cd kn−k for n > Cd . Letting E be the event that ER holds
2
for all R ∈ R, we have P(E) ≥ 1 − Cd,k ε−3d n−k by the union bound. As
supΩR \ΩR+ε un ≤ maxR∈R un (R), the result follows.
Lemma 6.6. Given R > 0 and ρ ∈ C(Rd+ ) satisfying (3.1), let u and v denote
the solutions of (2.2) and (3.4) respectively. Then the following statements hold.
sup u ≤ dρ1/d
max α.
ΩR \ΩR+α
(b) We have
37
1/d 1/d
Proof. (a) Let w(x) = dρmax (x1 . . . xd )1/d − dρmax R. Then w = 0 on ∂R Ω and
Lemma 7.1. Given an open and bound set Ω ⊂ Rd , u ∈ C 0,1 (Ω), and α > 0,
consider the inf-convolution defined by
1 2
uα (x) = inf u(y) + |x − y| .
y∈Ω 2α
38
(b) There exists a constant C > 0 such that
0,1
(d) Assume f ∈ C 0,1 (Ω) and H ∈ Cloc (Rd ). If
H(Du) ≥ f in Ω
then
where
( )
1 2
Mα (u) = x ∈ Ω : arg min u(y) + |x − y| ∩ Ω ̸= ∅ .
y∈Ω 2α
n o
1 2
(e) Let yα ∈ arg miny∈Ω u(y) + 2α |x − y| . Then there exists a constant
C > 0 such that
39
we have
and
d
Y
(uα )xi ≥ ρ − Cd,ρ α[u]C 0,1 (ΩR ) ≥ ρ − Cd,ρ R−d+1 α in Mα (u). (7.3)
i=1
Let x ∈ ΩR such that dist(x, ∂R Ω) > Cα[u]C 0,1 (ΩR ) where C is the con-
n Lemma 6.1 (e), oand we show that x ∈ Mα (u). Given yα ∈
stant given in
1 2
arg miny∈ΩR u(y) + 2α |x − y| , we have
1 2
u(yα ) + |x − yα | ≤ u(x),
2α
hence yα ≤ x. By Lemma 7.1 (e) we have |x − yα | ≤ Cα[u]C 0,1 (ΩR ) < dist(x, ∂R Ω),
hence x ∈ Mα (u). By Lemma 6.3, there exist constants Cd > 0 and Cd,ρ > 0
such that ΩR+δ ⊂ Mα (u) when δ = Cd,ρ αR−2d+2 . By Lemma 7.1 and Theorem
4.4 we have
By (7.3), we have
2
Cd,ρ R(d−1) ≤ (uα )xi ≤ Cd,ρ R−(d−1)
in the viscosity sense. Given λ > 0, let A1 denote the event that
2
Let γ = Cd,ρ R(d−1) , γ = Cd,ρ R−(d−1) , δ = Cd,ρ αR−2d+2 , and ε ∈ (0, 1). Set
log2 n
ν= log log n , Rn = n−1/2d γ 1/2 ε−1/2 ν, and λ = Cd,k,ρ (Rn + γ −2 α−1 ε + R−d+1 α).
Assume for now that supΩR (un − (1 + λ)uα ) ≥ 2ε. Then for all n > 1 with
n1/d > Cd,ρ,k R−d+1 ε−1 log(n)2 and ε satisfying
we have P(A1 ) ≥ 1 − Cd,k ε−6d n−k by Lemma 6.1. Let A2 be the event that
40
2
supΩR \ΩR+δ un ≤ Cd,k,ρ δ. By Lemma 6.5 we have P(A2 ) ≥ 1 − Cd,k δ −3d n−k
for some constant Cd . As ε ≤ δ, we have P(A1 ∩ A2 ) ≥ 1 − Cd,k ε−Cd n−k . If
A1 ∩ A2 holds, then using (7.4) and (7.2) we have
2
Using γ −2 = Cd,ρ R−2d +4d−2
, we have
2
sup (un − u) ≤ Cd,k,ρ Rn + R−2d+2 α + R−2d +4d−2 α−1 ε . (7.6)
ΩR
In the case where supΩR (un − (1 + λ)uα ) < 2ε, we cannot apply Lemma 6.1, but
obtain (7.6) by observing that
Let E denote the right-hand side of (7.6), and we select the parameters α and
ε to minimize E subject to the constraints from Lemmas 6.5 and 6.1. Let
2d2 −3d+1 (−2d2 +6d−4) √ −2d2 +9d−7
ε = νn−1/2d R 2 and α = R 2 ε=R 4 n−1/4d ν 1/2 . Then
we have
(−2d2 +d+1)
E ≤ Cd,k,ρ R 4 ν 1/2 n−1/4d .
2d2 −3d+1 2
As εn1/d = n1/2d R 2 ν, (7.9) is satisfied when n1/d ≥ Cd,ρ,k R−(2d −d−1)
log(n)C .
2 6d2 −3d−3
−3d+1
We also have (7.8), since ε < δ. As R2d α = R 4 n−1/4d ν 1/2 and
41
2d2 −3d+1 2
ε=R 2 n−1/2d ν, (7.10) is satisfied when n1/d ≥ Cd,k,ρ R−2d +9d+1 log(n)C .
2
It also follows that (7.11) holds, as Rd−1 δ = R−d+1 α ≥ R2d −3d+1 α. Since α =
−2d2 +9d−7 2
R 4 ν 1/2 n−1/4d , (7.7) is satisfied when n1/d ≥ Cd,k,ρ R−(2d −9d+11)
log(n)C .
It is straightforward to check that the most restrictive of these conditions is
2
n1/d ≥ Cd,k,ρ R−(2d −d−1)
log(n)C , hence all constraints are satisfied when this
holds. We conclude that
1/2 !
log2 n
(2d2 +d+1)
− −1/4d
P sup (un − u) > Cd,ρ,k R 4 n ≤ Cd,ρ,k RCd n−k .
ΩR log log n
(b) By Theorem 3.5, there exist constants Cρ and Cd,ρ such that u is semiconvex
on ΩR with semiconvexity constant Cd,ρ R−2d+1 when R < Cρ . Given ε > 0 and
λ > 0, let E be the event that
2
Let γ = Cd,ρ R(d−1) , γ = Cd,ρ R−(d−1) , α = Cd,ρ R−2d+1 and ε ∈ (0, 1). Set
log2 n
ν= log log n , Rn = n−1/2d γ 1/2 ε−1/2 ν, and λ = Cd,k,ρ (Rn + γ −2 R−2d+1 ε). Then
2
1/d −2d+2 −2
for all n > Cd,ρ,k R2d ε and ε satisfying (7.5), we have P(E) ≥
−6d −k
1 − Cd,k ε n by Lemma 6.1. We assume for the remainder of the proof that
E holds. By Lemma 6.6 we have
2
Letting δ = λ and using γ −2 R−2d+1 = Cd,ρ R−2d +2d−1
, we have
4d2 −5d+3
2
ν 2/3
Letting E = (Rn +R−2d +2d−1
ε), we select ε to minimize E. Let ε = R 3
n1/3d
,
2
so Rn = R−2d +2d−1
ε. Then
2 −2d2 +d
E = R−2d +2d−1
ε=R 3 n−1/3d ν 2/3 .
42
From Lemma 6.1 we have the following constraints
2 8d2 −10d+6
−2d+1
Cd,ρ,k R2d ≤ ε2 n1/d = n1/3d R 3 ν 4/3 (7.12)
2d2 −3d+1 −2 2d2 −5d+1 −2
ε ≤ Cd αR log(n) =R log(n) (7.13)
d−1
ε ≤ Cd R δ. (7.14)
2
As δ = λ ≥ R−2d +2d−1
ε, it is clear that (7.14) is satisfied. Observe that (7.12) is
2
satisfied when n 1/d
≥ R−2d +4d−4
log(n)C . Furthermore, (7.13) is satisfied when
−2d2 +10d+2
1/d C
n ≥ Cd,k,ρ R 3 log(n) for some constant C. As the most restrictive
1/d −2d2 +4d−4
of these conditions is n ≥R log(n)C , when this is satisfied we can
conclude that
2/3 !
log2 n
−2d2 +d
P sup (u − un ) > Cd,ρ,k R 3 n 1/3d
≤ Cd,ρ,k RCd n−k .
ΩR log log n
Proof of Theorem 3.3. (a) Given R > 0, let E1 be the event that
1/2
log2 n
(2d2 +d+1)
sup (un − u) ≤ Cd,ρ,k R− 4 n−1/4d . (7.15)
ΩR log log n
2
By Theorem 3.2 (a) we have P(E1 ) ≥ 1−Cd,k,ρ RCd n−k for all n1/d ≥ Cd,k,ρ R−(2d −d−1)
log(n)C .
Given any x ∈ Ω0 and longest chain C in [0, x] ∩ Xnρ , let C1 = {y ∈ C : y ∈ ΩR }
and C2 = {y ∈ C : y ∈
/ ΩR }. Then
Let E2 denote the event that supΩ0 \ΩR vn ≤ Cd,ρ R holds. By Lemma 6.5 we
have P(E2 ) ≥ 1 − Cd,k,ρ R−Cd n−k , for all n with n1/d ≥ Cd,k,ρ R−d−1 log(n)4 .
Then P(E1 ∩ E2 ) ≥ 1 − Cd,k,ρ RCd n−k , and for the remainder of the proof, we
1/2
log2 n
assume that E1 ∩ E2 holds. Let ν = log log n . Using (7.16) ,(7.15), and
43
u ≤ v on ΩR , we have for all x ∈ ΩR that
vn (x) − v(x) ≤ (vn (x) − un (x)) + (un (x) − u(x)) + (u(x) − v(x))
(2d2 +d+1)
≤ Cd,k,ρ (R + R− 4 n−1/4d ν).
Hence we have
(2d2 +d+1)
sup(vn − v) ≤ Cd,k,ρ (R + R− 4 n−1/4d ν).
Ω0
(2d2 +d+1)
Letting R = Kn−β , we select the maximum value of β such that R ≥ R− 4 n−1/4d ν
2
and n1/d ≥ Cd,k,ρ R−(2d −d−1)
log(n)C hold when n > Cd,k,ρ . These are satisfied
when dβ(2d2 + d + 5) < 1 and dβ(2d2 − d − 1) < 1, respectively. Letting
1
β= 2d3 +d2 +5d+1 , we have
2/3
log2 n
−2d2 +d
−1/3d
sup (u − un ) ≤ Cd,ρ,k R 3 n . (7.17)
ΩR log log n
By Theorem 3.2 (a) we have P(E) ≥ 1 − Cd,k,ρ RCd n−k for all n with
2
n1/d ≥ Cd,k,ρ R−2d +4d−4
log(n)C . (7.18)
For the remainder of the proof, we assume that E holds. By Lemma 6.6, we
2/3
log2 n
have supΩ0 \ΩR (v − 1ΩR u) ≤ Cd,ρ R. Let ν = log log n . Using (7.17) and
un ≤ vn , we have for x ∈ ΩR that
44
If x ∈ Ω0 \ ΩR , then we have v(x) − vn (x) ≤ supΩ0 \ΩR v ≤ Cd,ρ R. Hence, we
have
−2d2 +d
sup(v − vn ) ≤ Cd,k,ρ (R + R 3 n−1/3d ν).
Ω0
−2d2 +d−1
Letting R = Kn−β , we select β to satisfy R ≥ R 3 n−1/3d ν and (7.18) when
n ≥ Cd,ρ,k . These are satisfied when dβ(2d2 −d+3) < 1 and dβ(2d2 −4d+4) < 1,
respectively. Letting β = 2d3 −d21+3d+1 , we have
3
−d2 +3d+1)
sup (vn − v) ≤ n−1/(2d
Ω0
∞
X k k e−k
P F (Y N ) > c = P F (Y k ) > c ≤ K.
k!
k=0
n! √
By Stirling’s Formula, nn e−n ≤ e n. We conclude that P F (Y n ) > c ≤
√
Ke n.
45
8 Proof of Theorem 3.5
First we define the notion of semiconvexity.
2
u(x + h) − 2u(x) + u(x − h) ≥ −C |h| .
2
u(x + h) − 2u(x) + u(x − h) ≥ −Ku |h| for x ∈ Γ. (8.1)
Then we have
−1/d 2
u(x+h)−2u(x)+u(x−h) ≥ −(1+∥u∥L∞ (Ωε ) ) max(Ku , ρmin Kρ ) |h| for x ∈ Ω.
u(x+h)+u(x−h)
Proof. Set L(p) = (p1 . . . pd )1/d and w(x) = 2 , and we show that w
2
satisfies L(Dw) ≥ ρ 1/d
− Kρ |h| on Ω. Given x0 ∈ Ω, let φ ∈ C ∞ (Rd ) such that
w − φ has a local minimum at x0 . Without loss of generality we may assume
that w(x0 ) = φ(x0 ) and w − φ has a strict global minimum at x0 . Let
1 1 x+y α 2
Φ(x, y) = u(x) + u(y) − φ + |x − y − 2h| .
2 2 2 2
46
thermore, we have
α 2
|xα − yα − 2h| ≤ Cd . (8.3)
2
By (8.2) we have
y + h) − φ(b
lim Φ(xn , yn ) = 0 = w(b y + h).
n→∞
By construction, u−ψ1 has a local minimum at xn and u−ψ2 has a local minimum
at yn , Dψ1 (xn ) = pn − 2qn , and Dψ2 (yn ) = pn + 2qn where pn = Dφ xn +yn
2
and qn = αn (xn − yn − 2h). Since u satisfies L(Du) = ρ1/d on Ωε , we have
L(pn + 2qn ) ≥ ρ(yn )1/d and L(pn − 2qn ) ≥ ρ(xn )1/d for n > N . Using concavity
of L, we have
pn + 2qn pn − 2qn
L(pn ) = L +
2 2
1
≥ (L (pn + 2qn ) + L (pn − 2qn ))
2
1
≥ (ρ(xn )1/d + ρ(yn )1/d ).
2
47
x, yb) = (x0 + h, x0 − h), lower semicontinuity of L, and semiconvexity of
Using (b
ρ1/d , we have
xn + yn 1
L(Dφ(x0 )) ≥ lim sup L Dφ ≥ (ρ(x0 + h)1/d + ρ(x0 − h)1/d )
n→∞ 2 2
Kρ 2
≥ ρ(x0 )1/d − |h| .
2
Kρ 2 1 2 −1/d
It follows that L(Dw) ≥ ρ1/d − 2 |h| on Ω. Given θ > 0, set θ = 2 |h| max(Ku , ρmin Kρ )
and let wθ = (1 + θ)w + θ. By Proposition 4.2 we have
Kρ 2 1/d
L(Dwθ ) ≥ ρ1/d − |h| + dρmin θ ≥ ρ1/d on Ω.
2
Ku 2
Furthermore, by (8.1) we have w ≥ u − 2 |h| on Γ. By choice of θ, we have
wθ ≥ u on Γ. As ∂Ω = Γ ∪ ∂ ∗ Ω by assumption, we may apply Theorem 4.3 to
conclude wθ ≥ u on Ω. Hence for all x ∈ Ω we have
u(x + h) + u(x − h)
w(x) = ≥ u(x) − θ(1 + ∥w∥L∞ (Ω) )
2
1 2 −1/d
≥ u(x) − ∥u∥L∞ (Ωε ) |h| max(Ku , ρmin Kρ ).
2
1 − d−1
v(x) = a d ((p · x)((x1 . . . xd )1/d − (x1 . . . xd )−1/d ) − 2(p · x0 )((x1 . . . xd )1/d − 1))
2
(8.4)
and
and
u = w + v. (8.6)
48
Theorem 8.3. Given x0 ∈ ∂1,M Ω, p ∈ Rd , and a > 0, let v and w be as in
(8.4) and (8.5).
(a) We have
Xd Y
wxj vxi = p · (x − x0 ) in Ω1,M
i=1 j̸=i (8.7)
v=0 on ∂1,M Ω.
2
where |E| ≤ Cd a−1 |p| M 2 ε2 . Furthermore, u is nondecreasing in each
coordinate within B(x0 , ε) ∩ Ω1,M .
Proof. We first prove (a). Using (8.4), it is clear that v = 0 on ∂1,M Ω. Further-
more, we have
d−1 p · (x − 2x0 ) p · x
2a d x i vx i = xi pi + (x1 . . . xd )1/d + −xi pi + (x1 . . . xd )−1/d
d d
and
d
d−1 X
2a d xi vxi = 2p · (x − x0 )(x1 . . . xd )1/d .
i=1
and it follows that (8.7) is satisfied. To see that (8.8) holds, observe that
d
Y d
X Y
(wxi + vxi ) = wx1 . . . wxd + vx i wxj + E = a + p · (x − x0 ) + E
i=1 i=1 j̸=i
where
d
X X Y Y
E= vx i wxj .
ℓ=2 K⊂{1,...,d} i∈K j ∈K
/
|K|=ℓ
To bound |E| within B(x0 , ε), we establish some estimates on vxi . Using that
49
1
maxx∈Ω1,M xi = M d−1 , it is straightforward to verify that Dv(x0 ) = 0 and
d−1
Cd a− d M d |p|
vxi xj (x0 ) ≤ (8.9)
x0,i
Y Y d−1 Y Y Ca1/d
|vxi | wxj ≤ Cd (a− d M d |p| ε)ℓ x−1
0,i
x0,j
i∈K j ∈K
/ i∈K j ∈K
/
ℓ
≤ Cd a1−ℓ M d |p| ε .
2
It follows that |E| ≤ Cd a−1 |p| M 2d ε2 for ε ≤ 1
M d (1+|p|)
. To verify that u is
nondecreasing in each coordinate within B(x0 , ε) ∩ Ω1,M , observe that in B(x0 , ε)
we have
Cd a1/d M d
wxi xj (x0 ) ≤ .
x0,i
d−1
uxi (x) ≥ a1/d x−1
0,i − Cd εa
− d |p| x−1
0,i ≥ 0.
50
Proposition 8.4. Let u denote the solution to (3.6), where ρ ∈ C 2 (Ω1,M )
satisfies (3.1). Given x0 ∈ ∂1,M Ω, let u = v + w where v and w are as in
(8.4) and (8.5) with a = ρ(x0 ) and p = Dρ(x0 ). Then there exists a constant
Cd,ρ,M > 0 such that when ε ≤ Cd,ρ,M we have
sup |u − u| ≤ Cd ε3 M 3d−1 Aρ
B(x0 ,ε)∩Ω1,M
where
−2 2
Aρ = [ρ1/d ]C 0,1 (Rd ) ρmin ∥Dρ∥L∞ (∂1,M Ω) + ρ−1 2
min D ρ L∞ (∂1,M Ω)
.
2
where |E| ≤ Cd ρ−1 2d 2 2
min ∥Dρ(x0 )∥ M ε in B(x0 , ε). As ρ ∈ C (Ω1,M ), for x ∈
B(x0 , ε) ∩ Ω1,M we have
Given λ > 0, by Proposition 4.2 and (8.10) we have in B(x0 , ε) ∩ Ω1,M that
2
Letting λ = Cd ρ−2 2d 2 −1 2
min ∥Dρ∥L∞ (∂1,M Ω) M ε + ρmin D ρ L∞ (∂1,M Ω)
ε2 , we have
51
analogous application of Theorem 4.3 shows that (1−λ)u ≤ u on B(x0 , ε)∩Ω1,M ,
and we conclude that supB(x0 ,ε)∩Ω1,M |u − u| ≤ Cd M d−1 ρ1/d C 0,1 (Rd )
λ.
2 d−1
η ⊤ (D2 u(x0 ))η ≥ −Cd |η| (a− d M 2d−1 |p| + a1/d M 2d−2 ).
2 d−1
η ⊤ (D2 u(x0 ))η ≤ −Cd |η| (a− d M 2d−1 |p| + a1/d M 2d−2 ).
d
!
d−1
⊤ 2
X 2dx0,i pi + 2dx0,j pj − 2p · x0 + da a
2a d η (D u(x0 ))η = ηi ηj − δij 2
i,j=1
d2 x0,i x0,j x0,i
d d d
ηi2
2 X pi pj p · x0 a X ηi ηj X
= ηi ηj + − + −a
d i,j=1 x0,j x0,i dx0,i x0,j d i,j=1 xi xj x2
i=1 0,i
2 1 a 2
= (η · x−1 0 ) 2(η · p) − (p · x0 )(η · x −1
0 ) + (η · x−1 2 −1
0 ) − a η · x0
d d d
2 2
≥ −Cd |p| x−1 0 + |p| |x0 | x−10 + a x−1 0
d−1
η ⊤ (D2 u(x0 ))η ≥ −Cd M 2d−1 |p| + aM 2d−2 .
2a d
1
(b) Let a = 1, η = e1 , and define x0 ∈ ∂1,M Ω by x0,1 = M d−1
, x0,j = M for
52
v
where v = 2e1 − dxx0,1
j = 2, . . . , d and p = − ∥v∥ 0
. Then M 2d−1 |p| + aM 2d−2 ≤
2M 2d−1 . We have
⊤ 2 2 x0 1 − (1/d)
η (D u(x0 ))η = p · 2ei − −
dx0,1 dx0,1 x20,1
2
≤ − M d−1 ∥v∥
d
2
q
= − M d−1 (2 − d−1 )2 + (d − 1)d−1 M 2d
d
≤ −Cd M 2d−1 .
d−1
η ⊤ (D2 u(x0 ))η ≥ −Cd (a− d |p| M d−1 + a1/d M 2d−2 ).
Proof of Theorem 3.5. We will prove the result in two steps, first considering
the R = 1 case, and then proving the general case using a scaling argument.
Given M ≥ 1, let u denote the solution of
(
(ux1 ux2 . . . uxd )1/d = ρ1/d in Ω1,2M
(8.11)
u=0 on ∂1,2M Ω.
1/d 1/d
Letting w(x) = dρmax (x1 . . . xd )1/d − dρmax , we have w = 0 on ∂1,2M and
1/d
(wx1 . . . wxd )1/d = ρmax ≥ u on Ω1,2M . By Theorem 4.3 we have u ≤ w on
1/d
Ω1,2M , hence ∥u∥Ω1,2M ≤ Cd M ρmax . Given ε > 0, let h ∈ Rd with |h| = ε and
set Γε = {x ∈ Ω1,M : dist(x, ∂1,M Ω) < ε} and Uε = Γ3ε \ Γε . Given x ∈ Uε ,
there exists x0 ∈ ∂1,M Ω such that x ∈ B(x0 , 3ε). Let u be as in Proposition 8.4,
with a = ρ(x0 ) and p = Dρ(x0 ). By Proposition 8.4, there exists a constant
Cd,ρ,M > 0 such that when ε ≤ Cd,ρ,M we have
sup |u − u| ≤ Cd,ρ,M ε3 .
B(x0 ,3ε)∩Ω1,M
η ⊤ D2 u(x0 )η ≥ −Cd Ku
53
where
−(d−1)/d
Ku = ρmin ∥Dρ∥L∞ (∂1,M Ω) M 2d−1 + ρ1/d
max M
2d−2
.
2
u(x + h) − 2u(x) + u(x − h) ≥ − |h| (Cd Ku + Cd,ρ,M ε).
Hence, we have
≥ −(Cd Ku + Cd,ρ,M ε)
This holds for all x ∈ Uε , hence also for all x ∈ Uε . Letting Ω = Ω1,M \
Γε and Γ = {x ∈ Ω1,M : dist(x, ∂1,M Ω) = ε}, we have ∂Ω = Γ ∪ ∂ ∗ Ω and
y ∈ Rd : dist(y, Ω) < ε ⊂ Ω1,2M for ε ≤ M . As ρ1/d is semiconvex on Ω1,M
with semiconvexity constant D2 (ρ1/d ) L∞ (Ω1,M )
, we may apply Theorem 8.2
to conclude that for all x ∈ Ω1,M \ Γε we have
where
−1/d
Ku = max Ku , ρmin D2 (ρ1/d ) .
L∞ (Ω1,M )
As this holds for all ε < Cd,ρ,M and h ∈ Rd with |h| = ε, we conclude that for
all x ∈ Ω1,M and h ∈ Rd with x ± h ∈ Ω1,M we have
Now we prove Theorem 3.5 in full generality. Let u denote the solution of (3.6)
54
and let q(x) = u(Rx). Then q satisfies
(
qx1 qx2 . . . qxd = g in Ω1,R−1 M
q=0 on ∂1,R−1 M Ω
−(d−1)/d
where K = R−2d+3 (E1 + E2 ) with E1 = ρmin ∥Dρ∥L∞ (∂R,M Ω) M 2d−1 and
1/d
E2 = ρmax M 2d−2 . Then there exists a constant Cρ > 0 such that when R ≤ Cρ
we have
Replacing Rh′ with h and Ry with x, we conclude that for all x ∈ ΩR,M and
h ̸= 0 we have
55
Part II
56
9 Introduction
Semi-supervised learning uses both labeled and unlabeled data in learning tasks.
For problems where very few labels are available, geometric or topological
structure in unlabeled data can be used to greatly improve the performance of
classification, regression, or clustering algorithms. One of the most widely used
methods in graph-based semi-supervised learning is Laplace learning, originally
proposed in [71], which seeks a graph harmonic function that extends the
labels. Laplace learning, and variants thereof, have been widely applied in
semi-supervised learning [67, 66, 69, 2] and manifold ranking [35, 64, 63], among
many other problems.
This paper is concerned with graph-based semi-supervised learning at very
low label rates. In this setting, it has been observed that Laplace learning can
give very poor classification results [55, 28]. The poor results are often attributed
to the fact that the solutions develop localized spikes near the labeled points and
are almost constant far from the labeled points. In particular, label values are
not propagated well by the Laplacian learning approach. To address this issue,
recent work has suggested to consider p-Laplace learning [28]. Several works
have rigorously studied p-Laplace regularization with few labels [57, 14, 12], and
recent numerical results show that p > 2 is superior to Laplace learning at low
label rates [30]. The case of p = ∞ is called Lipschitz learning [48], which seeks
the absolutely minimal Lipschitz extension of the training data. Other methods
to address low label rate problems include higher order Laplacian regularization
[70] and spectral cutoffs [5].
While p-Laplace learning improves upon Laplace learning, the method is
more computationally burdensome than Laplace learning, since the optimality
conditions are nonlinear. Other recent approaches have aimed to re-weight the
graph more heavily near labels, in order to give them wider influence when
the labeling rate is very low. One way to re-weight the graph is the Weighted
Nonlocal Laplacian (WNLL) [56], which amplifies the weights of edges directly
connected to labeled nodes. The WNLL achieves better results at moderately
low label rates, but still performs poorly at very low label rates [30]. To address
this, [19] proposed the Properly Weighted Laplacian, which re-weights the graph
in a way that is well-posed at arbitrarily low label rates.
Much of the recent work on low label rate problems has focused on formulating
and implementing new learning approaches that are well-posed with few labels.
The exact nature of the degeneracy in Laplace learning, and the question of how
57
the tails of the spikes propagate label information, has not been studied and
is still poorly understood. For some problems, the performance of Laplacian
learning is good [71], while for other problems it is catastrophic, yielding very
poor classification results similar to random guessing [56, 30].
In this paper, we carefully analyze Laplace learning at very low label rates,
and we discover that nearly all of the degeneracy of Laplace learning is due to a
large constant bias in the solution of the Laplace equation that is present only at
low label rates. In order to overcome this problem we introduce a new algorithm,
we call Poisson learning, that gives very good classification performance down
to extremely low label rates. We give a random walk interpretation of Poisson
learning that shows how the method uses information from the random walkers
only before they reach the mixing time of the random walk and forget their initial
condition. We also propose a graph-cut enhancement of Poisson learning, called
Poisson MBO, that can incorporate knowledge about class sizes, and further
improves the class-label accuracy.
The rest of the paper is organized as follows. In Section 10 we first briefly
introduce Poisson learning, and then provide a detailed motivation for the
algorithm and a theoretical analysis from both the variational and random walk
perspectives. The Poisson MBO approach is presented in Section 10.4. In Section
11 we present the step-by-step algorithms and discuss implementation details for
the Poisson and Poisson MBO algorithms. In Section 12 we present numerical
experiments with semi-supervised classification on the MNIST, FashionMNIST,
and Cifar-10 datasets. The proofs of the results are available in the supplementary
materials.
10 Poisson learning
Let X = {x1 , x2 , . . . , xn } denote the vertices of a graph with edge weights wij ≥ 0
between xi and xj . We assume the graph is symmetric, so wij = wji . We define
Pn
the degree di = j=1 wij . For a multi-class classification problem with k classes,
we let the standard basis vector ei ∈ Rk represent the ith class. We assume the
first m vertices x1 , x2 , . . . , xm are given labels y1 , y2 , . . . , ym ∈ {e1 , e2 , . . . , ek },
where m < n. The task of graph-based semi-supervised learning is to extend the
labels to the rest of the vertices xm+1 , xm+2 , . . . , xn .
The well-known Laplace learning algorithm [71] extends the labels by solving
58
the problem )
Lu(xi ) = 0, if m + 1 ≤ i ≤ n,
(10.1)
u(xi ) = yi , if 1 ≤ i ≤ m,
where L is the unnormalized graph Laplacian given by
n
X
Lu(xi ) = wij (u(xi ) − u(xj )).
j=1
The label decision for vertex xi is determined by the largest component of u(xi )
We note that Laplace learning is also called label propagation (LP) [72], since
the Laplace equation (10.1) can be solved by repeatedly replacing u(xi ) with
the weighted average of its neighbors, which can be viewed as dynamically
propagating labels.
At very low label rates, we propose to replace the problem (10.1) by Poisson
1
Pm
learning: Let y = m j=1 yj be the average label vector and let δij = 1 if i = j
and δij = 0 if i ̸= j. One computes the solution of the Poisson equation
m
X
Lu(xi ) = (yj − y)δij for i = 1, . . . , n (10.3)
j=1
Pn
satisfying i=1 di u(xi ) = 0. While the label decision can be taken to be the
same as (10.2), it is also simple to account for unbalanced classes or training
data with the modified label decision
59
conditions in a Laplace equation, while in Poisson learning, the labels appears as
a source term in a graph Poisson equation. In the sections below, we explain why
Poisson learning is a good idea for problems with very few labels. In particular,
we give random walk and variational interpretations of Poisson learning, and
we illustrate how Poisson learning arises as the low label rate limit of Laplace
learning.
P(Xkx = xj | Xk−1
x
= xi ) = d−1
i wij .
Let u be the solution of the Laplace learning problem (10.1). Define the stopping
time to be the first time the walk hits a label, that is
Let iτ ≤ m denote the index of the point Xτx , so Xτx = xiτ . Then, by Doob’s
optimal stopping theorem, we have
This gives the standard representation formula for the solution of Laplace learning
(10.1). The interpretation is that we release a random walker from x and let it
60
Figure 2: Demonstration of spikes in Laplace learning. The graph consists of
n = 104 independent uniform random variables on [0, 1]2 and two points are
given labels of 0 and 1. Most values of the solution u of Laplace learning are
very close to 0.5.
walk until it hits a labeled vertex and then record that label. We average over
many random walkers to get the value of u(x).
When there are insufficiently many labels, the stopping time τ is so large that
we have passed the mixing time of the random walk, and the distribution of Xτx
P
is very close to the invariant distribution of the random walk π(xi ) = di / i di .
In this case, (10.5) gives that
Pm
di yi
u(x) ≈ Pi=1
m =: y w . (10.6)
j=1 dj
61
above.
In contrast, the Poisson equation (10.3) has a random walk interpretation
that involves the Green’s function for a random walk, and is in some sense dual
to Laplace learning. The source term on the right hand side of (10.3) represents
random walkers being released from labeled points and exploring the graph,
while carrying their label information with them. For T > 0 let us define
T X
X m
wT (xi ) = E yj 1{X xj =xi } .
k
k=0 j=1
x
Essentially, each time the random walk starting from xj , denoted Xk j , visits xi
we record the label yj and sum this over all visits from 0 ≤ k ≤ T . For short
time T , this quantity is meaningful, but since the random walk is recurrent (it
is a finite graph), wT (xi ) → ∞ as T → ∞. If we normalize by 1/T , we still
have the same issue as with Laplace learning, where we are only measuring the
invariant distribution. Indeed, we note that
T X
m
x
X
wT (xi ) = yj P(Xk j = xi )
k=0 j=1
x di
and lim P(Xk j = xi ) = Pn .
k→∞ i=1 di
Therefore, when k is large we have
m m
X x di X
yj P(Xk j = xi ) ≈ Pn yj ,
j=1 i=1 di j=1
and so the tail of the sum defining wT (xi ) is recording a blind average of labels.
The discussion above suggests that we should subtract off this average tail
behavior from wT , so that we only record the short-time behavior of the random
walk, before the mixing time is reached. We also normalize by di , which leads
us to define
T m
X 1 X
uT (xi ) = E (yj − y)1{X xj =xi } , (10.7)
di j=1 k
k=0
1
Pm
where y = m j=1 yj . It turns out that as T → ∞, the function uT (xi ) converges
to the solution of (10.3).
62
Theorem 10.1. For every T ≥ 0 we have
m
X
uT +1 (xi ) = uT (xi ) + d−1
i
(yj − y)δij − LuT (xi ) .
j=1
If the graph G is connected and the Markov chain induced by the random walk is
aperiodic, then uT → u as T → ∞, where u is the unique solution of the Poisson
Pn
equation (10.3) satisfying i=1 di u(xi ) = 0.
Theorem 10.1 gives the foundation for Poisson learning through the random
walk perspective, and in fact, it also gives a numerical method for computing
the solution (see Algorithm 1).
Remark 10.2. The representation formula (10.7) for the solution of Poisson
learning (10.3) shows that the solution u is a linear function of the label vectors
y1 , . . . , ym . That is, for any A ∈ Rk×k , the solution uA : X → Rk of
m
X
LuA (xi ) = (Ayj − Ay)δij for i = 1, . . . , n
j=1
Pn
satisfying i=1 di uA (xi ) = 0 is exactly uA = Au, where u is the solution of
(10.3). This shows that any reweighting of the point sources, by which we mean
yj 7→ Ayj , is equivalent to reweighting the solution by u 7→ Au.
If we set A = diag(s1 , . . . , sk ), then Au corresponds to multiplying the point
sources for class i by the weight si . We can use this reweighting to account
for unbalanced classes, or a discrepancy between the balancing of training and
testing data, in the following way. Let nj be the number of training examples
from class j, and let bj denote the true fraction of data in class j. We can choose
sj so that nj sj = bj to ensure that the mass of the point sources for each class,
weighted by sj , is proportional to the true fraction of data in that class. Since
nj is proportional to y · ej , this explains our modified label decision (10.4).
wij
P(Xkx = xj | Xk−1
x
= xi ) = .
di
63
Before giving the proof of Theorem 10.1, we recall some properties of random
walks and Markov chains. The random walk described above induces a Markov
chain with state space X. Since the graph is connected and X is finite, the
Markov chain is positive recurrent. We also assume the Markov chain is aperiodic.
This implies the distribution of the random walker converges to the invariant
distribution of the Markov chain as k → ∞. In particular, choose any initial
Pn
distribution p0 ∈ ℓ2 (X) such that i=1 p0 (xi ) = 1 and p0 ≥ 0, and define
n
X wij
pk+1 (xi ) = pk (xj ). (10.8)
j=1
dj
Then pk is the distribution of the random walker after k steps. Since the Markov
chain is positive recurrent and aperiodic we have that
n
X
lim pk (xi ) = π(xi ) p0 (xj ). (10.9)
k→∞
j=1
64
Then we have
T X
n
X wℓi x
j
di GT (xi , xj ) = δij + P(Xk−1 = xℓ )
dℓ
k=1 ℓ=1
n T
X wℓi X x
j
= δij + P(Xk−1 = xℓ )
dℓ
ℓ=1 k=1
n T −1
X wℓi X x
= δij + P(Xk j = xℓ )
dℓ
ℓ=1 k=0
n
X
= δij + wℓi GT −1 (xℓ , xj ).
ℓ=1
Therefore we have
where the Laplacian L is applied to the first variable of GT −1 while the second
variable is fixed (i.e. LGT −1 (xi , xj ) = [LGT −1 (·, xj )]xi ). Since
m
X
uT (xi ) = (yj − y)GT (xi , xj )
j=1
we have
m
X
di (uT (xi ) − uT −1 (xi )) + LuT −1 (xi ) = (yj − y)δij .
j=1
m
X
di u0 (xi ) = (yj − y)δij ,
j=1
we have (u0 )d,X = 0, and so (uT )d,X = 0 for all T ≥ 0. Let u ∈ ℓ2 (X) be the
65
solution of
m
X
Lu(xi ) = (yj − y)δij
j=1
satisfying (u)d,X = 0. Define vT (xi ) = di (uT (xi ) − u(xi )). We then check that
vT satisfies
n
X wij
vT (xi ) = vT −1 (xj ),
j=1
dj
and (vT )X = 0. Since the random walk is aperiodic and the graph is connected,
we have by (10.9) that limT →∞ vT (xi ) = π(xi )(v0 )X = 0, which completes the
proof.
This induces a norm ∥u∥2ℓ2 (X) = (u, u)ℓ2 (X) . We also define the space of mean-
zero functions
n n
X o
ℓ20 (X) = u ∈ ℓ2 (X) : di u(xi ) = 0
i=1
66
We now consider the variational problem
m
1 X
min ∥∇u∥2ℓ2 (X 2 ) − (yj − y)·u(xj ) . (10.10)
2
u∈ℓ0 (X) 2 j=1
The following theorem makes the connection between Poisson learning and the
variational problem (10.10).
Theorem 10.3 shows that the Poisson learning equation (10.3) arises as the
necessary conditions for a gradient regularized variational problem with an affine
loss function. We contrast this with the solution of Laplace learning (10.1),
which is the minimizer of the variational problem
n o
min
2
∥∇u∥2ℓ2 (X 2 ) : u(xi ) = yi , 1 ≤ i ≤ m . (10.11)
u∈ℓ (X)
Thus, while both Laplace and Poisson learning are gradient regularized variational
problems, the key difference is how each algorithm handles the labeled data;
Laplace learning enforces hard label constraints while Poisson learning adds an
affine loss function to the energy. Of course, many variants of Laplace learning
have been proposed with various types of soft label constraints in place of hard
constraints. These variations perform similarly poorly to Laplace learning at low
label rates, and the key feature of Poisson learning is the affine loss function
that can be easily centered.
We note that it would be natural to add a weight µ > 0 to one of the terms in
(10.10) to trade-off the importance of the two terms in the variational problem.
However, as we show in the supplementary material (see Lemma A.3), this
weight would have no effect on the final label decision. We also mention that the
variational problem (10.10) has a natural ℓp generalization that we also explore
in the supplementary material (Section A.2).
67
The divergence is the negative adjoint of the gradient; that is, for every vector
field V ∈ ℓ2 (X 2 ) and function u ∈ ℓ2 (X)
n
1 X
∥V ∥pℓp (X 2 ) = wij |V (xi , xj )|p ,
2 i,j=1
In particular (Lu, v)ℓ2 (X) = (u, Lv)ℓ2 (X) , and so the graph Laplacian L is self-
adjoint as an operator L : ℓ2 (X) → ℓ2 (X). We also note that
68
For p ≥ 1 and µ > 0 we consider the variational problem
m
1 X
min ∥∇u∥pℓp (X 2 ) − µ (yj − y)·u(xj ) (10.13)
p
u∈ℓa,0 (X) p j=1
1
Pm
where y = m j=1 yj . This generalizes the variational problem (10.10) for
Poisson learning, and the theorem below generalizes Theorem 10.3.
We give the proof of Theorem 10.4 below, after some remarks and other
results.
Remark 10.5. When p = 1, solutions of (10.13) may not exist for all µ ≥ 0,
since the variational problem (10.13) may not be bounded from below. We
can show that there exists C > 0 such that if µ < C, the variational problem
is bounded from below and our argument for existence in Theorem 10.4 goes
through.
Lemma 10.6. Let p > 1 and for µ > 0 let uµ be the solution of (10.13). Then,
uµ = µ1/(p−1) u1 .
It follows from Lemma 10.6 that when p > 1, the fidelity parameter µ > 0
is completely irrelevant for classification problems, since the identity uµ =
µ1/(p−1) u1 implies that the label decision (10.2) gives the same labeling for any
value of µ > 0. Hence, in Poisson learning with p > 1 we always take µ = 1.
This remark is false for p = 1.
Before proving Theorem 10.4 we first record a Poincaré inequality. The proof
is standard but we include it for completeness. We can prove the Poincaré
inequality for non-negative vectors a ∈ Rn , meaning that ai ≥ 0 for every
Pn
i = 1, . . . , n as long as i=1 ai > 0.
69
Proposition 10.7. Assume G is connected, a ∈ Rd is non-negative with
Pn
i=1 ai > 0, and p ≥ 1. There exists λp > 0 such that
Proof. Define
∥∇u∥ℓp (X 2 )
λp = min .
u∈ℓp (X) ∥u − (u)a,X ∥ℓp (X)
u̸≡(u)a,X
∥∇uk ∥ℓp (X 2 ) 1
≤ .
∥uk − (u)a,X ∥ℓp (X) k
We may assume that (uk )a,X = 0 and ∥uk ∥ℓp (X) = 1, and so
1
∥∇uk ∥ℓp (X 2 ) ≤ . (10.16)
k
Since |uk (x)| ≤ ∥uk ∥ℓp (X) = 1, the sequence uk is uniformly bounded and by the
Bolzano-Weierstrauss Theorem there exists a subsequence ukj such that ukj (xi )
is a convergent sequence in Rk for every i. We denote u(xi ) = limj→∞ ukj (xi ).
Since ∥ukj ∥ℓp (X) = 1 we have ∥u∥ℓp (X) = 1, and thus u ̸≡ 0. Similarly, since
(uk )a,X = 0 we have (u)a,X = 0 as well. On the other hand it follows from
(10.16) that ∥∇u∥ℓp (X 2 ) = 0, and so
It follows that u(xi ) = u(xj ) whenever wij > 0. Since the graph is connected,
it follows that u is constant. Since (u)a,X = 0 we must have u ≡ 0, which is
a contradiction, since ∥u∥ℓp (X) = 1. Therefore λp > 0, which completes the
proof.
70
By Proposition 10.7 we have
m
1 p X
Ip (u) ≥ λp ∥u∥pℓp (X) − µ (yj − y) · u(xj )
p j=1
m
X m
X
(yj − y) · u(xj ) ≤ |yj − y||u(xj )|
j=1 j=1
1/q 1/p
Xm Xm
≤ |yj − y|q |u(xj )|p
j=1 j=1
1/q
Xm
≤ |yj − y|q ∥u∥ℓp (X) ,
j=1
P 1/q
m
where q = p/(p − 1). Letting C = j=1 |yj − y|q we have
1 p
Ip (u) ≥ λ ∥u∥pℓp (X) − Cµ∥u∥ℓp (X) . (10.18)
p p
1 p
λ ∥uk ∥pℓp (X) − Cµ∥uk ∥ℓp (X) ≤ inf Ip (u) + 1,
p p u∈ℓp
a,0 (X)
for k sufficiently large. Since p > 1, it follows that there exists M > 0 such that
∥uk ∥ℓp (X) ≤ M for all k ≥ 1. Since |uk (xi )| ≤ ∥uk ∥ℓp (X) ≤ M for all i = 1, . . . , n,
we can apply the Bolzano-Weierstrauss Theorem to extract a subsequence ukj
such that ukj (xi ) is a convergent sequence in Rk for all i = 1, . . . , n. We denote
by u∗ (xi ) the limit of ukj (xi ) for all i. By continuity of Ip we have
71
and (u∗ )a,X = 0. This shows that there exists a solution of (10.13).
We now show that any solution of (10.13) satisfies −div |∇u|p−2 ∇u = µf .
The proof follows from taking a variation. Let v ∈ ℓpa,0 (X) and consider g(t) :=
Ip (u + tv), where Ip is defined in (10.17). Then g has a minimum at t = 0 and
hence g ′ (0) = 0. We now compute
m
d 1 X
g ′ (0) = ∥∇u + t∇v∥pℓp (X 2 ) − µ (yj − y) · (u(xj ) + tv(xj ))
dt t=0 p j=1
n m
1 X d X
= wij |∇u(xi , xj ) + t∇v(xi , xj )|p − µ (yj − y) · v(xj )
2p i,j=1 dt t=0
j=1
n m
1 X X
= wij |∇u(xi , xj )|p−2 ∇u(xi , xj ) · ∇v(xi , xj ) − µ (yj − y) · v(xj )
2 i,j=1 j=1
m
X
= (|∇u|p−2 ∇u, ∇v)ℓ2 (X 2 ) − µ (yj − y) · v(xj )
j=1
m
X
= (−div (|∇u|p−2 ∇u), v)ℓ2 (X) − µ (yj − y) · v(xj )
j=1
where
m
X
f (xi ) = (yj − y)δij .
j=1
We choose
1
−div |∇u|p−2 ∇u (xi ) − µf (xi )
v(xi ) =
ai
then
n
X
−div |∇u|p−2 ∇u (xi ) − µf (xi ) = 0
(v)a,X =
i=1
so v ∈ ℓpa,0 (X).
n
′
X 1 2
div |∇u|p−2 ∇u (xi ) + µf (xi )
0 = g (0) =
a
i=1 i
1 2
div |∇u|p−2 ∇u (xi ) + µf (xi )
≥ ℓ2 (X)
.
max ai
So, −div |∇u|p−2 ∇u = µf as required.
To prove uniqueness, let u, v ∈ ℓpa,0 (X) be minimizers of (10.13). Then u and
72
v satisfy (10.14) which we write as
and
−div (|∇v|p−2 ∇v)(xi ) = g(xi )
p
where C is a constant depending only on p and q = p−1 .
Remark 10.9. If −div (|∇u|p−2 ∇u) = f then we can write |∇u|p−2 ∇u, ∇φ ℓ2 (X 2 ) =
(f, φ)ℓ2 (X) for any φ ∈ ℓ2 (X). Choosing φ = u implies ∥∇u∥pℓp (X 2 ) = (f, u)ℓ2 (X) ≤
∥f ∥ℓq (X) ∥u∥ℓp (X) so we could write the bound for p ∈ (1, 2) in the above lemma
without ∥∇u∥ℓp (X) and ∥∇v∥ℓp (X) on the right hand side.
73
found in Lemma 4.4 Chapter I [26]) to obtain
n
1 X
∥∇u − ∇v∥pℓp (X 2 ) = wij |∇u(xi , xj ) − ∇v(xi , xj )|p
2 i,j=1
n
X
wij |∇u(xi , xj )|p−2 ∇u(xi , xj ) − |∇v(xi , xj )|p−2 ∇v(xi , xj )
≤C
i,j=1
· (∇u(xi , xj ) − ∇v(xi , xj ))
= C(|∇u|p−2 ∇u − |∇v|p−2 ∇v, ∇(u − v))ℓ2 (X 2 )
= C(−div (|∇u|p−2 ∇u) + div (|∇v|p−2 ∇v), u − v)ℓ2 (X)
= C(f − g, u − v)ℓ2 (X)
≤ C∥f − g∥ℓq (X) ∥u − v∥ℓp (X) ,
1 1
where in the last line we used Hölder’s inequality, p + q = 1, and the value of C
may change from line-to-line. By Proposition 10.7 we have
λpp ∥u − v∥pℓp (X) ≤ ∥∇u − ∇v∥pℓp (X 2 ) ≤ C∥f − g∥ℓq (X) ∥u − v∥ℓp (X) .
Therefore we deduce
1/(p−1)
∥u − v∥ℓp (X) ≤ Cλ−q
p ∥f − g∥ℓq (X) .
Now for 1 < p < 2 we follow the proof of Lemma 4.4 in Chapter I [26] to
infer
Z 1
p−2
|a|p−2 a − |b|p−2 b · (a − b) = |a − b|2 ds
|sa + (1 − s)b|
0
Z 1
p−4 2
+ (p − 2) |sa + (1 − s)b| |(sa + (1 − s)b) · (a − b)| ds
0
74
In the sequel we make use of the inequality
C|a − b|2
|a|p−2 a − |b|p−2 b · (a − b) ≥
.
(|a| + |b|)2−p
By Hölder’s inequality and the above inequality we have (where again the
constant C may change from line-to-line)
n
p 1 X
∥∇u − ∇v∥ℓp (X 2 ) = wij |∇u(xi , xj ) − ∇v(xi , xj )|p
2 i,j=1
p2 2−p
2
n 2 n
1 X wij |∇u(xi , xj ) − ∇v(xi , xj )| 1 X p
≤ wij (|∇u(xi , xj )| + |∇v(xi , xj )|)
2 i,j=1 (|∇u(xi , xj )| + |∇v(xi , xj )|)2−p 2 i,j=1
p2
n
X
wij |∇u(xi , xj )|p−2 ∇u(xi , xj )−|∇v(xi , xj )|p−2 ∇v(xi , xj ) ·(∇u(xi , xj )−∇v(xi , xj ))
≤ C
i,j=1
(2−p)p
× ∥∇u∥ℓp (X 2 ) + ∥∇v∥ℓp (X 2 ) 2
p (2−p)p
= C |∇u|p−2 ∇u − |∇v|p−2 ∇v, ∇(u − v) ℓ22 (X 2 ) ∥∇u∥ℓp (X 2 ) + ∥∇v∥ℓp (X 2 ) 2
p (2−p)p
= C −div (|∇u|p−2 ∇u) + div (|∇v|p−2 ∇v), u − v ℓ22 (X) ∥∇u∥ℓp (X 2 ) + ∥∇v∥ℓp (X 2 ) 2
p (2−p)p
= C (f − g, u − v)ℓ22 (X) ∥∇u∥ℓp (X 2 ) + ∥∇v∥ℓp (X 2 ) 2
p p (2−p)p
≤ C ∥f − g∥ℓ22 (X) ∥u − v∥ℓ22 (X) ∥∇u∥ℓp (X 2 ) + ∥∇v∥ℓp (X 2 ) 2
.
We note that
Ip,µ (µ1/(p−1) u) = µp/(p−1) Ip,1 (u).
75
Therefore
Therefore
Ip,µ (µ1/(p−1) u1 ) = Ip,µ (uµ ).
76
explained by a large shift bias that occurs only at very low label rates. Fixing
this seems as simple as applying the appropriate shift before making a decision
on a label, but this does not lead to a well-grounded method, since the shifted
function u − y w is exactly the graph-harmonic extension of the shifted labels
yj − y w . Why should we have to use a harmonic extension of the wrong labels in
order to achieve a better result? On the other hand, Poisson learning, which we
introduced above, provides an intuitive and well-grounded way of fixing the shift
bias in Laplace learning.
To see the connection to Poisson learning, let us assume the solution u of the
Laplace learning equation (10.1) is nearly equal to the constant vector y w ∈ Rk
from (10.6) at all unlabeled points xm+1 , . . . , xn . For any labeled node xi with
i = 1, . . . , m we can compute (assuming wij = 0 for all j ∈ {1, . . . , m}) that
n
X
Lu(xi ) = wij (u(xi ) − u(xj ))
j=1
Xn
≈ wij (yi − y w ) = di (yi − y w ).
j=m+1
This gives a connection, at a heuristic level, been Laplace equations with hard
constraints, and Poisson equations with point sources, for problems with very
low label rates. We note that since constant functions are in the kernel of L,
u − y w also satisfies (10.19). We also note that the labels, and the constant y w ,
are weighted by the degree dj , which does not appear in our Poisson learning
equation (10.3). We have found that both models give good results, but that
(10.3) works slightly better, which is likely due to the rigorous foundation of
(10.3) via random walks.
77
boundary so as to improve the label accuracy and account for prior knowledge
of class sizes. The graph-cut method we propose is to apply several steps of
gradient descent on the graph-cut problem
m
1 2
X
min ∥∇u∥ℓ2 (X 2 ) − µ (yj − y)·u(xj ) , (10.20)
u:X→Sk 2 j=1
(u)X =b
1
Pn
where Sk = {e1 , e2 , . . . , ek }, b ∈ Rk is given, and (u)X := n i=1 u(xi ). Since
1 2
we are restricting u(xi ) ∈ Sk , the term 2 ∥∇u∥ℓ2 (X 2 ) is exactly the graph-cut
energy of the classification given by u. Likewise, the components of the average
(u)X represent the fraction of points assigned to each class. The constraint
(u)X = b therefore allows us to incorporate prior knowledge about relative class
sizes through the vector b ∈ Rk , which should have positive entries and sum to
one. If there exists u : X → Sk with (u)X = b, then (10.20) admits a solution,
which in general may not be unique.
On its own, the graph-cut problem (10.20) can admit many local minimizers
that would yield poor classification results. The phenomenon is similar to the
degeneracy in Laplace learning at low label rates, since it is very inexpensive
to violate any of the label constraints. Our overall plan is to first use Poisson
learning to robustly propagate the labels, and then project onto the constraint set
for (10.20) and perform several steps of gradient-descent on (10.20) to improve
the classification accuracy. While Poisson learning propagates the labels in
a robust way, the cut energy is more suitable for locating the exact decision
boundary.
To relax the discrete problem (10.20), we approximate the graph-cut energy
with the Ginzburg-Landau approximation
n m
X o
min
2
GLτ (u) − µ (yj − y) · u(xj ) , (10.21)
u∈ℓ (X)
j=1
(u)X =b
where n k
1 1 XY
GLτ (u) = ∥∇u∥2ℓ2 (X 2 ) + |u(xi ) − ej |2 .
2 τ i=1 j=1
78
works have rigorously studied how GLτ approximates graph-cut energies in the
scalar setting [60]. Here, we extend the results to the vector multi-class setting.
Theorem 10.10 indicates that we can replace the graph-cut energy (10.20)
with the simpler Ginzburg-Landau approximation (10.21). To descend on the
energy (10.21), we use a time-spitting scheme that alternates gradient descent
on
m
1 X
E1 (u) := ∥∇u∥2ℓ2 (X 2 ) − µ (yj − y) · u(xj ),
2 j=1
n k
1 XY
and E2 (u) := |u(xi ) − ej |2 .
τ i=1 j=1
The first term E1 is exactly the energy for Poisson learning (10.10), and gradient
descent amounts to the iteration
m
X
ut+1 (xi ) = ut (xi ) − dt Lut (xi ) − µ (yj − y)δij .
j=1
We note that Lu and the source term above both have zero mean value. Hence,
the gradient descent equation for E1 is volume preserving, i.e., (ut+1 )X = (ut )X .
This would not be true for other fidelity terms, such as an ℓ2 fidelity, and this
volume conservation property plays an important role in ensuring the class size
constraint (u)X = b in (10.21).
Gradient descent on the second term E2 , when τ > 0 is small, amounts to
projecting each u(xi ) ∈ Rk to the closest label vector ej ∈ Sk , while preserving
the volume constraint (u)X = b. We approximate this by the following procedure:
Let ProjSk : Rk → Sk be the closest point projection, let s1 , . . . , sk > 0 be
positive weights, and set
79
We use a simple gradient descent scheme to choose the weights s1 , . . . , sk > 0 so
that the volume constraint (ut+1 )X = b holds (see Steps 9-14 in Algorithm 2).
By Remark 10.2, this procedure can be viewed as reweighting the point sources
in the Poisson equation (10.3) so that the volume constraint holds. In particular,
increasing or decreasing si grows or shrinks the size of class i.
We note that the work of [41] provides an alternative way to enforce explicit
class balance constraints with a volume constrained MBO method based on
auction dynamics. Their method uses a graph-cut based approach with a
Voronoi-cell based initialization.
80
Cauchy–Schwarz inequality
v
n
λ22 ′ um
uX
1 X
M≥ ∥uτ − b∥2ℓ2 (X) + V (u′τ (xi )) −µ t (yj − y)2 ∥u′τ ∥ℓ2 (X)
2 τ i=1 j=1
| {z } | {z }
≥0 =:C
λ22
≥ ∥u′τ − b∥2ℓ2 (X) − Cµ∥u′τ − b∥ℓ2 (X) − Cµ∥b∥ℓ2 (X) .
2
Hence,
s
Cµ 2λ22 (M + Cµ∥b∥ℓ2 (X) )
∥u′τ − b∥ℓ2 (X) ≤ 1+ 1+ =: C
e
λ22 C 2 µ2
Now we have shown that there exists convergent subsequences we show that any
limit must be a minimiser of E0 .
81
Taking the limit as m → ∞, and applying (b) we have
It follows that for all v ∈ ℓ2 (X) we have E0 (u0 ) ≤ E0 (v), hence u0 is a minimiser
of E0 .
To show (a), we easily notice that
n m
1 1X X
Eτ (v) = ∥∇v∥2ℓ2 (X 2 ) + V (v(xi )) −µ (yj − y) · v(xj ) = E0 (v).
2 τ i=1 | {z } j=1
=0
Pn
As uτ (xi ) = b for all τ > 0 and uτ (xi ) → u0 (xi ) for every i ∈ {1, . . . , n}
i=1
Pn
then (u0 )X = i=1 u0 (xi ) = b. And since V (uτ (xi )) ≤ τ Eτ (uτ ) → 0 then we
have V (u0 (xi )) = 0, hence u0 (xi ) ∈ Sk . Now,
n m
1 1X X
Eτ (uτ ) = ∥∇uτ ∥2ℓ2 (X 2 ) + V (uτ (xi )) −µ (yj − y) · uτ (xj ) .
|2 {z } τ i=1 | {z } j=1
≥0
→ 1 ∥∇u0 ∥2 | P {z }
2 ℓ2 (X 2 ) m
→ j=1 (yj −y)·u0 (xj )
Remark 10.11. If (a) and (b) in the proof of Theorem 10.10 are strengthened
to
(a′ ) for all v ∈ ℓ2 (X) there exists vτ → v such that limτ →0 Eτ (vτ ) = E0 (v),
(b′ ) for all v ∈ ℓ2 (X) and for all vτ → v then lim inf τ →0 Eτ (vτ ) ≥ E0 (v)
then one says that Eτ Γ-converges to E0 (and one can show that (a′ ) and (b′ )
hold in our case with a small modification of the above proof). The notion of
Γ-convergence is fundamental in the calculus of variations and is considered the
variational form of convergence as it implies (when combined with a compactness
result) the convergence of minimisers.
82
11 Poisson learning algorithms
We now present our proposed Poisson learning algorithms. The Python source
code and simulation environment for reproducing our results is available online.1
We let W = (wij )ni,j=1 denote our symmetric weight matrix. We treat all
vectors as column vectors, and we let 1 and 0 denote the all-ones and all-zeros
column vectors, respectively, of the appropriate size based on context. We assume
that the first m data points x1 , x2 , . . . , xm are given labels y1 , y2 , . . . , ym ∈
{e1 , e2 , . . . , ek }, where the standard basis vector ei ∈ Rk represents the ith class.
We encode the class labels in a k × m matrix F, whose j th column is exactly
yj . Let b ∈ Rk be the vector whose ith entry bi is the fraction of data points
belonging to class i. If this information is not available, we set b = k1 1.
Poisson learning is summarized in Algorithm 1. The label decision for node i
is ℓi = arg max1≤j≤k Uij , and the reweighting in Step 11 implements the label
decision (10.4). In all our results, we always set b = k1 1, so Poisson learning does
not use prior knowledge of class sizes (the true value for b is used in PoissonMBO
below). The complexity is O(T E), where E is the number of edges in the graph.
We note that before the reweighting in Step 11, the Poisson learning algorithm
computes exactly the function uT defined in (10.7). In view of this, there is
little to be gained from running the iterations beyond the mixing time of the
random walk. This can be recorded within the loop in Steps 8-10 by adding the
iteration pt+1 = WD−1 pt , where the initial value p0 is the vector with ones
in the positions of all labeled vertices, and zeros elsewhere. Up to a constant,
pt is the probability distribution of a random walker starting from a random
labeled node after t steps. Then the mixing time stopping condition is to run
the iterations until
∥pt − p∞ ∥∞ ≤ ε,
83
Algorithm 1 PoissonLearning
1: Input: W, F, b, T
2: Output: U ∈ Rn×k
3: D ← diag(W1)
4: L←D−W
1
5: y← m F1
6: B ← [F − y, zeros(k, n − m)]
7: U ← zeros(n, k)
8: for i = 1 to T do
9: U ← U + D−1 (BT − LU)
10: end for
11: U ← U · diag(b/y)
set the time step as dτ = 10 and set the clipping values in Step 12 to smin = 0.5
and smax = 2. We tested on datasets with balanced classes, and on datasets
with very unbalanced classes, one may wish to enlarge the interval [smin , smax ].
The additional complexity of PoissonMBO on top of Poisson learning is
O(Ninner Nouter E). On large datasets like MNIST, FashionMNIST and Cifar-10,
our Poisson learning implementation in Python takes about 8 seconds to run on
a standard laptop computer, and about 1 second with GPU acceleration.2 The
additional 20 iterations of PoissonMBO takes about 2 minutes on a laptop and
30 seconds on a GPU. These computational times do not include the time taken
to construct the weight matrix.
12 Experimental Results
We tested Poisson learning on three datasets: MNIST [49], FashionMNIST [62]
and Cifar-10 [46]. FashionMNIST is a drop-in replacement for MNIST consisting
of 10 classes of clothing items. To build good quality graphs, we trained
autoencoders to extract important features from the data. For MNIST and
FashionMNIST, we used variational autoencoders with 3 fully connected layers
of sizes (784,400,20) and (784,400,30), respectively, followed by a symmetrically
defined decoder. The autoencoder was trained for 100 epochs on each dataset.
The autoencoder architecture, loss, and training, are similar to [44]. For Cifar-
10, we used the AutoEncodingTransformations architecture from [65], with all
the default parameters from their paper, and we normalized the features to
2 We used an NVIDIA RTX-2070 GPU, and it took 3 seconds to load data to/from the
84
Algorithm 2 PoissonMBO
1: Input: W, F, b, T, Ninner , Nouter , µ > 0
2: Output: U ∈ Rn×k
3: U ← µ · PoissonLearning(W, F, b, T )
4: dt ← 1/ max1≤i≤n Dii
5: for i = 1 to Nouter do
6: for j = 1 to Ninner do
7: U ← U − dt (LU − µBT )
8: end for
9: s ← ones(1, k)
10: for j = 1 to 100 do
11: b ← 1 1T Proj (U · diag(s))
b n Sk
12: s ← max(min(s + dτ (b − b), b smax ), smin )
13: end for
14: U ← ProjSk (U · diag(s))
15: end for
6
Difference in Accuracy (%)
0.2
5
0.0 4
3
−0.2
2
−0.4 MNIST MNIST
FashionMNIST 1 FashionMNIST
Cifar-10 Cifar-10
−0.6 0
5 10 15 20 1 2 3 4 5
Number of neighbors (k) Number of labels per even class
(a) (b)
Figure 3: Accuracy of Poisson Learning for (a) different numbers of neighbors k
used to construct the graph and (b) unbalanced training data. In (a) we used 5
labels per class and in (b) we used 1 label per class for the odd numbered classes,
and m = 1, 2, 3, 4, 5 labels per class for the even numbered classes. Both figures
show the difference in accuracy compared to k = 10 and balanced training data.
unit-vectors.
For each dataset, we constructed a graph over the latent feature space. We
used all available data to construct the graph, giving n = 70, 000 nodes for
MNIST and FashionMNIST, and n = 60, 000 nodes for Cifar-10. The graph was
constructed as a K-nearest neighbor graph with Gaussian weights given by
where xi represents the latent variables for image i, and dK (xi ) is the distance
85
Table 1: MNIST: Average accuracy scores over 100 trials with standard deviation
in brackets.
86
Table 2: FashionMNIST: Average accuracy scores over 100 trials with standard
deviation in brackets.
We see that in nearly all cases, PoissonMBO outperforms all other methods,
with PoissonMBO typically outperforming Poisson learning by a few percentage
points. The most drastic improvements are seen at the ultra low label rates,
and at the moderate label rates shown in Tables 2 and 3, several other methods
perform well. We note that VolumeMBO and PoissonMBO are the only methods
that incorporate prior knowledge of class sizes, and are most suitable for direct
comparison. Our results can be compared to the clustering results of 67.2% on
FashionMNIST [52] and 41.2% on Cifar-10 [32].
Figure 3(a) shows the accuracy of Poisson learning at 5 labels per class as a
function of the number of neighbors K used in constructing the graph, showing
that the algorithm is not particularly sensitive to this. Figure 3(b) shows the
accuracy of Poisson learning for unbalanced training data. We take 1 label
per class for half the classes and m = 1, 2, 3, 4, 5 labels per class for the other
half. Since the training data is unbalanced, y is not a constant vector and the
87
Table 3: Cifar-10: Average accuracy scores over 100 trials with standard deviation
in brackets.
13 Continuum limits
We briefly discuss continuum limits for the Poisson learing problem (10.3).
We take p = 2 for simplicity, but the arguments extend similarly to other
values of p ≥ 1. In order to analyze continuum limits of graph-based learning
algorithms, we make the manifold assumption, and assume G is a random
geometric graph sampled from an underlying manifold. To be precise, we assume
88
the vertices of the graph corresponding to unlabeled points x1 , . . . , xn are a
sequence of i.i.d. random variables drawn from a d-dimensional compact, closed,
and connected manifold M embedded in RD , where d ≪ D. We assume the
probability distribution of the random variables has the form dµ = ρdVolM ,
where VolM is the volume form on the manifold, and ρ is a smooth density. For
the labeled vertices in the graph, we take a fixed finite set of points Γ ⊂ M. The
vertices of the random geometric graph are
Xn := {x1 , . . . , xn } ∪ Γ.
where ε > 0 is the length scale on which we connect neighbors, |x−y| is Euclidean
distance in RD , and η : [0, ∞) → [0, ∞) is smooth with compact support, and
ηε (t) = ε1d η εt . We denote the solution of the Poisson learning problem (10.3)
2 X
Ln,ε u(x) = 2
ηε (|x − y|)(u(x) − u(y)),
ση nε
y∈Xn
89
we write
m
X
Ln,ε un,ε (x) = n (g(y) − y)δx=y for x ∈ Xn , (13.1)
y∈Γ
1
P
where g(y) ∈ R denotes the label associated to y ∈ Γ and y = |Γ| x∈Γ g(x).
We restrict to the scalar case (binary classification) for now. Note that the
normalisation plays no role in the classification problem (10.2). To see what
should happen in the continuum, as n → ∞ and ε → 0, we multiply both sides
of (13.1) by a smooth test function φ ∈ C ∞ (M), sum over x ∈ X, and divide
by n to obtain
1 X
(Ln,ε un,ε , φ)ℓ2 (X) = (g(y) − y)φ(y). (13.2)
n
y∈Γ
for every smooth φ ∈ C ∞ (M). Combining these observations with (13.2) we see
that
Z
1 (λ + ε) X
(un,ε , ∆ρ φ)ℓ2 (X) +O ∥un,ε ∥ℓ1 (X) = (g(y)−y)δy (x)φ(x) dVolM (x).
n n M y∈Γ
If we can extend un,ε to a function on M in a suitable way, then the law of large
numbers would yield
Z
1
(un,ε , ∆ρ φ)ℓ2 (X) ≈ un,ε (x)ρ(x)∆ρ φ(x) dVolM (x).
n M
90
function u : M → R would satisfy
Z Z X
− u div (ρ2 ∇φ) dVolM = (g(y) − y)δy (x)φ(x) dVolM (x)
M M y∈Γ
Since φ is arbitrary, this would show that u is the solution of the Poisson problem
X
−div ρ2 ∇u = (g(y) − y)δy on M. (13.3)
y∈Γ
Let u ∈ C ∞ (M \ Γ) be the solution of the Poisson equation (13.3) and un,ε solve
the graph Poisson problem (13.1). Then with probability one
91
continuum limit unless the number of labels grows to ∞ as n → ∞ sufficiently
fast. This partially explains the superior performance of Poisson learning for
low label rate problems.
14 Conclusion
We proposed a new framework for graph-based semi-supervised learning at very
low label rates called Poisson learning. The method is efficient and simple to
implement. We performed a detailed analysis of Poisson learning, giving random
walk and variational interpretations. We also proposed a graph-cut enhancement
of Poisson learning, called Poisson MBO, that can give further improvements.
We presented numerical results showing that Poisson Learning outperforms all
other methods for semi-supervised learning at low label rates on several common
datasets.
92
References
[1] B. Abbasi, J. Calder, and A. M. Oberman. Anomaly detection and classifi-
cation for streaming data using partial differential equations. SIAM Journal
on Applied Mathematics, In press, 2017.
[5] M. Belkin and P. Niyogi. Using manifold structure for partially labelled
classification. In Advances in Neural Information Processing Systems, 2002.
[8] B. Bollobás and P. Winkler. The longest chain among random points
in Euclidean space. Proceedings of the American Mathematical Society,
103(2):347–353, 1988.
93
[12] J. Calder. Consistency of Lipschitz learning with infinite unlabeled data
and finite labeled data. arXiv:1710.10364, 2017.
[13] J. Calder. Numerical schemes and rates of convergence for the Hamilton–
Jacobi equation continuum limit of nondominated sorting. Numerische
Mathematik, 137(4):819–856, 2017.
[20] J. Calder and C. K. Smart. The limit shape of convex hull peeling. To
appear in Duke Mathematical Journal, 2020.
[21] T. Chan and L. Vese. Active contours without edges. IEEE Transactions
on Image Processing, 10(2):266–277, Feb. 2001.
[22] B. Cook and J. Calder. Rates of convergence for the continuum limit of
nondominated sorting. SIAM Journal on Mathematical Analysis, 54(1):872–
911, 2022.
94
[24] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multi-
objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary
Computation, 6(2):182–197, 2002.
[25] J.-D. Deuschel and O. Zeitouni. Limiting curves for i.i.d. records. The
Annals of Probability, 23(2):852–878, 1995.
[34] J. Handl and J. Knowles. Exploiting the trade-off – the benefits of multiple
objectives in data clustering. In Evolutionary Multi-Criterion Optimization,
pages 547–560. Springer, 2005.
95
[35] J. He, M. Li, H.-J. Zhang, H. Tong, and C. Zhang. Manifold-ranking
based image retrieval. In Proceedings of the 12th annual ACM International
Conference on Multimedia, pages 9–16. ACM, 2004.
[36] M. Hein, J.-Y. Audibert, and U. v. Luxburg. Graph Laplacians and their
convergence on random neighborhood graphs. Journal of Machine Learning
Research, 8(Jun):1325–1368, 2007.
[37] A. O. Hero III. Gene selection and ranking with microarray data. In
IEEE International Symposium on Signal Processing and its Applications,
volume 1, pages 457–464, 2003.
[38] K.-J. Hsiao, J. Calder, and A. O. Hero III. Pareto-depth for multiple-query
image retrieval. IEEE Transactions on Image Processing, 24(2):583–594,
2015.
[39] K.-J. Hsiao, K. Xu, J. Calder, and A. O. Hero III. Multi-criteria anomaly
detection using Pareto Depth Analysis. In Advances in Neural Information
Processing Systems 25, pages 854–862. 2012.
[40] K.-J. Hsiao, K. Xu, J. Calder, and A. O. Hero III. Multi-criteria similarity-
based anomaly detection using Pareto Depth Analysis. IEEE Transactions
on Neural Networks and Learning Systems, 2015. To appear.
96
[47] A. Kumar and A. Vladimirsky. An efficient method for multiobjective
optimal control and optimal control subject to integral constraints. Journal
of Computational Mathematics, 28(4):517–551, 2010.
[53] I. Mitchell and S. Sastry. Continuous path planning with multiple constraints.
In IEEE Conference on Decision and Control, volume 5, pages 5502–5507,
Dec. 2003.
97
[58] W. Thawinrak and J. Calder. High-order filtered schemes for the Hamilton-
Jacobi continuum limit of nondominated sorting. Journal of Mathematics
Research, 10(1), Nov 2017.
[63] B. Xu, J. Bu, C. Chen, D. Cai, X. He, W. Liu, and J. Luo. Efficient manifold
ranking for image retrieval. In Proceedings of the 34th International ACM
SIGIR Conference on Research and Development in Information Retrieval,
pages 525–534. ACM, 2011.
[64] C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang. Saliency detection
via graph-based manifold ranking. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 3166–3173, 2013.
[65] L. Zhang, G.-J. Qi, L. Wang, and J. Luo. Aet vs. aed: Unsupervised
representation learning by auto-encoding transformations rather than data.
In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 2547–2555, 2019.
[67] D. Zhou, J. Huang, and B. Schölkopf. Learning from labeled and unla-
beled data on a directed graph. In Proceedings of the 22nd International
Conference on Machine Learning, pages 1036–1043. ACM, 2005.
98
[68] D. Zhou and B. Schölkopf. Learning from labeled and unlabeled data using
random walks. In Joint Pattern Recognition Symposium, pages 237–244.
Springer, 2004.
99