A Deterministic Strongly Polynomial Algorithm For Matrix Scaling and Approximate Permanents
A Deterministic Strongly Polynomial Algorithm For Matrix Scaling and Approximate Permanents
A Deterministic Strongly Polynomial Algorithm For Matrix Scaling and Approximate Permanents
Abstract
∗
Hebrew University. Work supported in part by a grant of the Binational Israel-US Science Founda-
tion.
†
Hebrew University
‡
Hebrew University. Work partially supported by grant 032-7736 from the Israel Academy of Sciences.
Part of this work was done during a visit to the Institute for Advanced Study, under the support of a
Sloan Foundation grant 96-6-2.
1
1 Introduction
1
Our results: We achieve O(en )-factor approximation deterministically. Our algorithm
is strongly polynomial.
Our approach to this problem is completely different from the previously taken routes.
It involves a natural reduction technique between problem instances: scaling. Observe the
following linearity of permanents: Multiplying a row or column by a constant c, multiplies
the permanent by c as well. More generally, we say that a matrix B is a scaling of A (by
positive vectors x, y ∈ (<+ )n ) if B = XAY , where X = diag(x) and Y = diag(y) are
diagonal matrices with x (resp. y) on their diagonal (these being the factors that multiply
Q Q
the rows and the columns respectively). As observed, per(B) = ( i xi )( i yi )per(A).
Thus scaling reductions not only allow us to compute per(A) ¿from per(B), but in fact any
k-factor approximation of per(B) efficiently yields the same approximation for per(A).
The idea, then, is to scale an input matrix A into a matrix B whose permanent we can
efficiently approximate. A natural strategy is to seek an efficient algorithm for scaling A
to a doubly stochastic B. For suppose we succeed: the permanent of B is clearly at most
1, and per(B) ≥ n!/nn > e−n by the lower bound of [7, 8]. Consequently, per(A) is also
approximated to within an en factor, as claimed.
Note that such a scaling may not always exist - when per(A) = 0. Moreover, even if
scaling exists, the scaling vectors x, y may have irrational coordinates, so we may have
to settle for an approximately doubly stochastic matrix. The scaling algorithm must,
therefore, be accompanied by approximate versions of the van der Waerden bound, and
indeed we prove results of the following type (see also proposition 5.1).
Lemma 1.2: Let B be a nonnegative n × n matrix, in which all row sums are 1, and
where no column sum exceeds 1 + n12 , then per(B) ≥ e−(n+1) .
2
1.2 Matrix Scaling
Background
Our discussion will be restricted to square matrices, though everything generalizes to
rectangular matrices (and some of it even to multidimensional arrays).
Let r, c ∈ (<+ )n be two positive vectors with ri = cj . A matrix B is an (r, c)-
P P
matrix if r and c are the vectors of row and columns sums of B respectively. An (r, c)-
scaling of a matrix A consists of two vectors x, y for which B = XAY is an (r, c)-matrix.
(Again X = diag(x) and Y = diag(y).) Given A, r, c one is led to consider the existence
of a scaling, its uniqueness, and of course the complexity of deciding and finding (or
approximating) such a scaling. Note that scaling to a doubly stochastic matrix is exactly
(1̄, 1̄)-scaling. Since multiplying the vectors r and c by the same constant does not change
P P
the problem, we will assume ri = cj = n (to make it consistent with the doubly
stochastic case).
The (r, c)-scaling problem arose in such diverse contexts as statistics [30, 10], nu-
merical analysis [32], operations research [28], image reconstruction [17], engineering [6],
and as we were not surprised to discover, permanents [12]. Here are a couple of quick
examples: To solve the linear system of equations Av = u, we consider instead the scaled
system B(Y v) = X −1 u. A judicious choice of scaling can increase numerical stability,
and indeed this method is used for matrix preconditioning [32]. In computer tomography
and some statistical applications we wish to reconstruct a (sometimes multidimensional)
array of new data, from complete old data and some additional information. Different
dimensions represent essential (spatial or statistical) parameters. The assumption is that
data was modified as a result of a (multiplicative) change in “intensity” or “importance”
of the parameters, leading naturally to the scaling problem. Finally, observe also that
scaling does not affect the zero/nonzero pattern of the matrix, and this (along with lin-
ear constraints on the marginals and positivity requirements) is a natural constraint in
various optimization problems.
The mathematical literature abounds with equivalent formulations of the problem,
connections with other problems, and various necessary and sufficient conditions for
existence and uniqueness of an (r, c)-scaling (see e.g. [29] and the references therein).
Again our discussion is very brief: The max-flow-min-cut theorem provides a necessary
and sufficient condition for the existence of scaling. The scaling vectors are then the
(unique) optimal solution of a nonlinear program which can be made convex (but ugly)
after appropriate changes of variables.
Finally, the algorithmics: Here the literature seems quite sparse. We note that (r, c)-
scalability can be decided in polynomial time, using network flows. Thus we will assume
that the input matrix A is (r, c)-scalable, and the task is to find (or approximate to a
desired accuracy) the target matrix and the scaling vectors.
The original paper by Sinkhorn [30] already introduced an iterative algorithm for
3
this task1 . The idea is simple – apply a sequence of scalings to the given matrix A,
alternately normalizing the row sums to r (say in odd steps) and the column sums to c
(in even steps). Let At be the matrix generated after t iterations. Sinkhorn proved that
this sequence converges to an (r, c)-matrix, but gave no bound on convergence rates. It
should be clear that each iteration is computationally trivial, and that the final scaling
vectors are obtained by keeping a running product of the scaling vectors used at odd and
even steps.
To quantify convergence rates, let be the desired accuracy (say in L∞ ), and L(A)
the binary input length (i.e. log of the ratio of the largest to smallest nonzero entries) of
A.
The convergence rate of this procedure was first considered by Franklin and Lorenz
[11]. They showed that each step in Sinkhorn’s method is a contraction map in the
Hilbert projective metric. By estimating the contraction constant in terms of L(A), they
concluded that the number of iterations is bounded by
O(L(A) · 1/).
This shows that Sinkhorn’s is an approximation scheme, but of course not polynomial in
log(1/).
Moreover, simple examples show that the dependence on the input length cannot be
improved. This is the best existing bound for the general (r, c)-scaling problem.
For the (1, 1)-scaling problem, Kalantari and Kachiyan [21] have recently developed
an approximation scheme based on convex optimization via the ellipsoid method. Their
algorithm requires
O(L(A)n4 log(1/))
arithmetic operations on integers of length O(L(A) log(n/)). This is thus a polynomial
time approximation scheme, which is, however, not strongly polynomial.
This still leaves open the polynomiality of the general (r, c)-problem, and the quest
for a strongly polynomial scheme, both of which we accomplish in the present paper.
Our results: We develop two strongly polynomial algorithms. The first is restricted to
the (1, 1)-scaling problem, and the other works for the general (r, c)-scaling problem.
For (1, 1)-scaling our approach is this: We start with a certain preprocessing step
that’s followed by Sinkhorn’s procedure. This, along with a different convergence anal-
ysis, yield a strongly polynomial scheme. Our preprocessing is itself also a scaling step
that is attained through a weighted matching algorithm. Following this step, there is
a generalized diagonal of A each element of which is largest in its respective row. This
guarantees a lower bound of n−n on the permanent. The analysis then proves (and this
1
Sinkhorn suggested his procedure (sometimes called the RAS procedure) only for the (1, 1)-scaling
problem he was interested in, but it was naturally extended to the general problem.
4
is simple) that every step in Sinkhorn’s procedure increases the permanent. Moreover,
this increase can be quantified in terms of the distance of the matrix from being doubly
stochastic. This allows us to bound the number of iterations till an approximation is
attained by
O((n/)2 )
Note that the number of iterations does not involve L(A) (Naturally the arithmetic
operations are applied to integers of length L(A)O(1) .) However, it is only polynomial
in 1/, so this is not a fully polynomial approximation scheme. Still, for the purpose of
approximating the permanent = n−2 suffices, and Theorem 1.1, (with a total running
time of O(n6 )) follows. These results are described in Section 3.
For the general (r, c)-scaling problem, we develop a new algorithm. Like Sinkhorn’s it
is iterative, and every odd step normalizes the row sums to r. In the even steps, however,
a more elaborate scaling procedure is performed on the columns. This scheme allows us
to quantify the progress we make: After odd steps, when row sums are as designated,
the Euclidean distance of the vector of current columns sums to its target c shrinks by
a constant factor. This comprises a fully polynomial approximation scheme that is also
strongly polynomial, with the number of iterations bounded by
Õ(n7 log(1/)).
2
These results are described in section 4.
2 Preliminaries
We start with a general overview of the setting in which our algorithms work and of the
features they share. We also discuss the complexity matters, and introduce the necessary
notation.
First we describe the setting in which our algorithms are applicable. Let A be a
nonnegative n × n matrix and let r = (r1 , ..., rn ), c = (c1 , ..., cn ) be two positive vectors.
We say that A is an (r, c) matrix if the i-th row sum of A is ri and the j-th column sum
is cj .
Next we would like to introduce the notion of (r, c)-scalability. It turns out that there
are several pertinent definitions, which we now present.
The matrix A is said to be exactly (r, c)-scalable, if there exist positive finite n-tuples
(x1 , ..., xn ), (y1 , ..., yn ) such that B = (bij ) = (xi aij yj ) is an (r, c) matrix.
2
Leonid Gurvitz [15] had pointed out to us, that algorithms of similar spirit are presented, without
performance analysis, in [27].
5
We are also interested in a weaker notion of approximate scalability: Given an > 0,
we say that B is an “ − (r, c)” if B’s row sums are given by the vector r, whereas B’s
columns sums c0j , satisfy 3 :
n
(c0j − cj )2 ≤ .
X
(1)
j=1
We say that A is − (r, c)-scalable if there exist positive finite n-tuples (x1 , ..., xn ),
(y1 , ..., yn ) such that B = (bij ) = (xi aij yj ) is an − (r, c) matrix.
Since our official business (in the doubly stochastic case) is in approximating the
permanent, we will be satisfied if we can scale A arbitrarily close to a (r, c)-matrix. This
leads to a notion of an almost (r, c) scalable matrix, which turns out to be precisely the
notion we need, and therefore deserves a formal definition:
Proposition 2.2: A nonnegative matrix A is exactly (r, c)-scalable iff for every zero
minor Z × L of A,
1. X X X X
ri ≥ cj equivalently ri ≤ cj .
i∈Z c j∈L i∈Z j∈Lc
Proof: This characterization for exact scalability is well known [29]. To see that the
first condition is already sufficient for almost-(r, c) scalability, note that the (r, c) scaling
algorithm presented in Theorem 4.5 accepts as input a matrix satisfying condition 1 of
the proposition and an > 0, and proceeds to (efficiently) scale the matrix to an − (r, c)
matrix.
We observe that almost (r, c)-scalability of A can be efficiently checked.
3
Note it is always possible to normalize the row sums to the desired ri . (Or, alternatively, normalize
the column sums. Doing both is the challenge.) Thus, henceforth all our matrices are row-normalized.
6
Lemma 2.3: The question whether a given matrix A satisfies condition 1 of proposi-
tion 2.2 can be decided in polynomial time (via a max-flow formulation [14]).
We proceed to describe two scaling algorithms. We call our (1, 1) scaling algorithm
DSS - for Doubly Stochastic Scaling, and the general (r, c) scaling algorithm is RCS -
for (r, c)-Scaling.
Given a matrix A to be scaled, the two algorithms operate by producing a sequence
of matrices A = A0 , A1 , .., AN , where At+1 = f (At ), t = 1, .., and the last matrix AN
is nearly (r, c). 4 The function f is specific to the algorithm and is denoted by fDSS
and fRCS correspondingly. The matrices At are always row-normalized, i.e., the i-th
rows sum of At is ri , for all t and n. An iteration of the algorithm proceeds from At to
At+1 , and therefore N is the number of iterations required for the approximate scaling.
To quantify the notion of “proximity”, we define N () to be the number of iterations
required for -scaling, namely so that AN is an − (r, c) matrix.
Let I denote the number of arithmetic operations needed for one iteration of the
algorithm. Then the total complexity of the -scaling problem is NDSS () · IDSS in the
doubly stochastic case, and NRCS () · IRCS in the general case. We also set L() denote
the maximal length of numbers encountered in the -scaling process. (Recall that L(A)
denotes the binary input length of A.)
7
3.1 The DSS Algorithm
DSS(A)
Input: (1, 1)-scalable matrix A.
1. A1 = Preprocess(A)
2. For t = 1, 2, . . . , N = O(n2 log(n)) do At+1 = fDSS (At ).
3. Output AN .
Preprocess(A)
Find a permutation σ ∈ Sn , such that σ is the “heaviest” generalized diagonal of A,
i.e.,
n
Y n
Y
Aiσ(i) = max Aiτ (i) .
τ ∈Sn
i=1 i=1
fDSS (·).
Input Nonnegative matrix At with unit row sums and positive column sums.
1. Normalize(columns)
2. Normalize(rows)
3. Output At+1 .
3.2 Theorems
Theorem 3.1: (The preprocessing step)
Let A be a nonnegative matrix in which σ is a “heaviest” generalized diagonal, i.e.,
n
Y n
Y
Aiσ(i) ≥ Aiτ (i)
i=1 i=1
for every permutation τ . Then, there exists a positive diagonal matrix Y such that B =
AY , satisfies:
The diagonal σ and the matrix Y can be found in O(n5 log n) arithmetic operations.
8
Theorem 3.2: (Number of iterations)
1 1
The number NDSS ( n log n
) of DSS iterations required for n log n
-scaling is at most O(n2 log(n)).
Corollary 3.5: Given an (1, 1)-scalable matrix A, modified Sinkorn’s algorithm solves
1
the n log n
-scaling problem for A in at most O(n5 log(n)) elementary operations on numbers
of length at most poly (L(A), n)).
1
Remark 3.6: Below (Proposition 5.1) we show that if A if n log n
-doubly stochastic then
1 n+o(n)
per(A) ≥ ( e ) . Therefore Corollary 3.5 enables us to use a version of the van der
Waerden conjecture.
Remark 3.7: The number of iterations of the Unmodified Sinkorn’s procedure required
1
to solve the n log n
-scaling problem may be arbitrarily large. This is demonstrated in the
following example:
with (arbitrarily) small positive α. This matrix satisfies the conditions of proposition 2.2,
so Sinkhorn’s algorithm converges to a doubly stochastic matrix. However, note that
per(A) = O(α) and that each application of DSS increases the permanent by a factor
no bigger than 5. Consequently, it takes at least Ω(log( α1 )) steps, for the matrix A to
become close enough to doubly stochastic.
9
3.3 Proofs
Proof: Of Theorem 3.1
Given A, a “heaviest” generalized diagonal σcorresponds to a maximum-weight perfect
matching in a bipartite graph, so σ may be found in at most O(n3 ) arithmetic opera-
tions [1, 13]. It is convenient (and kosher) to assume henceforth that σ is the identity
permutation.
We turn to the main part of the theorem, namely to show that the matrix Y exists
and can be found in polynomial time.
We start with the existential part. Define the matrix  via Âij = log(Aij ), 1 ≤ i, j ≤
n, (with the convention log(0) = −∞) and restate the theorem thus: Let  be an n × n
matrix where ni=1 Âii ≥ ni=1 Âiτ (i) for every permutation τ . Then there exist numbers
P P
Note, that we are only concerned with indices i, j for which Âij > −∞, since in the
other case the inequalities obviously hold. By LP-duality, if this system of inequalities is
inconsistent, this may be established by linearly combining these inequalities. So, assume
that there exist nonnegative weights wij , so that
wij (Âij − Âii ) > 0, i.e., the combination of constant terms is positive.
P
2.
The first condition implies that the map which assigns value wij to the edge (i, j), is
a circulation in Kn . By the Circulation Theorem, w is a positive combination of the
characteristic functions of directed cycles: w = γ∈Γ λγ 1γ , where each γ ∈ Γ is a directed
P
cycle and the λγ are positive. But then, the second condition implies that for some γ ∈ Γ:
X X
Âiγ(i) > Âii .
i∈γ i∈γ
We return to the computational part of the theorem. Note, that (4) is a system of
m = n2 linear inequalities with only two variables per inequality. Such a system is solv-
able [26] by a strongly polynomial algorithm in O(mn3 log(n)) = O(n5 log(n)) arithmetic
operations.
10
Proof: Of Theorem 3.2
Theorem 3.1 shows how to perform the Preprocessing Step in the Modified Sinkhorn’s
algorithm. Our gain is that following this preprocessing, the permanent of the resulting
matrix A1 is ≥ n1n , since all the elements on the σ-diagonal of A1 are ≥ n1 .
Now, recall that At is the matrix generated from A1 after t − 1 iterations of DSS.
At+1 = fDSS (At ). Our goal is to estimate the convergence rate of At to the set of doubly
stochastic matrices. We measure our progress with the help of a potential function,
namely, per(At ).
Lemma 3.9: Let B be a nonnegative matrix with unit row sums. Then
per(DSS(B)) ≥ per(B)
Proof: Let C be obtained by normalizing B’s columns, and D = DSS(B) be the row
normalization of C. We claim per(D) ≥ per(C) ≥ per(B), and the inequalities are strict
unless B = C = D, are doubly stochastic.
Let cj be the j-th column sum of B. All cj are positive, and nj=1 cj = n. Clearly,
P
per(B)
. The arithmetic-geometric inequality applied to {cj }nj=1 yields nj=1 cj ≤
Q
per(C) = Q n
cj
j=1
1, with equality only if all the cj are 1. The same argument proves also the second claim,
switching rows and columns.
We have seen, in fact, that each iteration of DSS multiplies the permanent by at
least Qn1 cj . The convergence rate can be estimated through the following lemma:
j=1
11
1
Now we have to show that NDSS (( n log n
)) ≤ O(n2 log2 n). Indeed, as long as Condition
1
(1), with = n log n
, doesn’t hold, each iteration of DSS multiplies the permanent by
at least 1 + Ω( n log n ). The bound on N follows, since per(A1 ) ≥ n1n and for any t,
1
per(At ) ≤ ni=1 ri = 1.
Q
fRCS (·).
Input (r, c)-scalable matrix At with row sums ri and positive column sums c0j .
1. Sort the differences dj = c0j − cj . (We assume, for convenience, that they are
already ordered: d1 ≤ d2 ... ≤ dn .)
12
2. If nj=1 d2j ≤ Stop.
P
4.2 Theorems
Theorem 4.1: Existence of δ0
An RCS iteration can always be performed, i.e., the positive real δ0 in step 4 of the
iteration is well defined.
Corollary 4.5: Given an (r, c)-scalable matrix A, RCS algorithm solves the -scaling
problem for A in at most O(n7 log(n) log( n )) elementary operations on numbers of length
at most poly L(A), n, log( 1 ) .
4.3 Proofs
Proof: Of Theorem 4.1
For notational convenience we prove the existence and the uniqueness of δ0 for the first
iteration. In this case, At is simply A. Let δ be a positive real number. The expression
for the (i, l) entry in A(δ) is:
ail ri
(
(1 + δ) · ri +δwi
if l ∈ L
ail (δ) = ail ri ,
ri +δwi
if l ∈
/L
13
P
where wi = j∈L aij is the contribution of the lacunary columns to the i-th row sum in
A. Since all wi ≤ ri , the function aij (δ) is nondecreasing for j ∈ L, and nonincreasing
/ L. Therefore the quantity dj (δ) = c0j (δ) − cj is nondecreasing for j ∈ L, and
for j ∈
nonincreasing for j ∈
/ L. We show that as δ grows, at least one of these quantities reaches
the midpoint M of the largest gap G.
Lemma 4.6: Let A be a nonnegative (r, c)-scalable n × n matrix with row sums ri and
column sums c0j . Then, there are j ∈ L and k 6∈ L and a (possibly infinite) δ > 0 for
which dj (δ) = dk (δ). Consequently, there is an index l and δ > 0 for which dl (δ) = M.
Proof: Let Z be the set of rows i for which wi = 0. Note that aij = 0 for i ∈ Z and j ∈
L, so for l ∈ L, c0l (δ) = (1 + δ) · i∈Z c ria+δw
ij ri
. Therefore, for l ∈ L, when δ → ∞,
P
i
X ail ri
dl (δ) → − cl .
Zc wi
The inequality being from Proposition 2.2. On the other hand, for j 6∈ L, as δ → ∞,
X
dj (δ) → aij − cj
Z
whence X XX X X X
dj (δ) → aij − cj = ri − cj ≤ 0
j6∈L j6∈L i∈Z j6∈L i∈Z j6∈L
14
the set ofqP (r, c)-matrices. We assess our progress, through the decrease of the l2 norm
n 2
kdk2 = j=1 dj . We now show that our choice of δ0 in the fourth step of the RCS
iteration suits our needs. Once again we assume, for convenience, that we deal with the
first iteration
Lemma 4.7:
kd(δ0 )k22 ≤ kdk22 · (1 − Ω(n−5 )).
and so we may consider the change in kd − M · jk2 . The definition of δ0 implies for
G G
all i, that aij ≤ aij (δ0 ) ≤ aij + 8n , for all j ∈ L, and aik ≥ aik (δ0 ) ≥ aik − 8n for all
k 6∈ L. Therefore, for all j ∈ L, k 6∈ L, dj ≤ dj (δ0 ) ≤ M ≤ dk (δ0 ) ≤ dk , implying that
|dj (δ) − M | ≤ |dj − M | for all 1 ≤ j ≤ n. Our gain comes from the ν-th coordinate, and
G2
is, at least 64n2.
|dj − dj−1 | ≤ G for all j. The maximum of kdk22 under these conditions is attained when
dj −dj−1 = G for all j, in which case kdk22 = Θ(n3 G2 ). In other words, G ≥ Ω(kdk2 ·n−3/2 ).
This, together with (5) implies the claim of the lemma.
It immediately follows, that in N = O(n5 log( n )) iterations we have kdk22 ≤ , and
therefore AN is an − (r, c) matrix.
Proof: Of Theorem 4.3
To perform the t-th iteration of RCS we have to find δ0 and then compute the matrix
At (δ0 ). Given δ0 , the computation of At (δ0 ) requires O(n2 ) elementary operations. To
G
find δ0 we define δij for each pair of indices i, j via |aij (δij ) − aij | = 8n which is a
linear equation in δij , and δ0 = minij δij . Consequently, this involves only O(n2 log(n))
elementary operations.
Proof: Of Theorem 4.4
Let an iteration RCS act on At = (aij ) and produce At+1 = (a0ij ). Since all the entries
in At , At+1 are numbers between zero and one, we only have to worry about an entry in
At+1 suddenly becoming very small - thus requiring a long representation.
This is probably the right moment to specify that the entries of At are represented in
floating point. Namely, aij is represented as (bij , αij ), where an integer bij and 12 < αij ≤ 1
are uniquely determined via aij = 2−bij αij .
Our first observation is, that for our purposes it suffices to remember only the first
few bits of αij . This is the contents of the following lemma.
15
Lemma 4.8: Truncating all but the most significant 10 log( n ) bits in every αij changes
10
kdk22 = nj=1 (c0j − cj )2 by an additive term of at most O( n9 ).
P
Proof: Since all entries are bounded above by 1, the truncation affects each entry by an
10 10
additive term of at most n 10 . Therefore each c0j is changed by at most n9 and the second
claim of the lemma trivially follows.
Corollary 4.9: The “truncated” RCS algorithm terminates in at most O(n5 log( n ))
iterations.
χ2 (A)
!
0
χ(A ) > Ω . (6)
n
Consequently, after t iterations B ≤ O 2t · (2L(A) + t log(n)) . Since RCS is iterated
only poly(n) times, this suffices for our purposes.
Recall, that we know exactly how the entries change:
ail ri
(1 + δ) · if l ∈ L
ri +δwi
a0il = , (7)
ail ri
ri +δwi
if l ∈
/L
G
|a0µν − aµν | = . (8)
8n
G
|a0st − ast | = . (9)
4n
We will assume that the equality in (9) is obtained for t ∈ L (the other case is quite
similar). It is convenient at this point to introduce the following notation: For any pair
of indices i and j and for any positive ∆, let us denote by δij (∆) the minimal positive x
such that |aij (x) − aij | = ∆, if such an x exists.
16
By (9) and the explicit formula (7) we conclude:
G Grs
δst ( )= < +∞.
4n 4nast (rs − ws ) − Gws
In particular, the denominator ρ = 4nast (rs −ws )−Gws is strictly positive, i.e. 4nast (rs −
ws ) > Gws . This implies:
G G Grs Grs rs
δ = δµν ( ) ≤ δst ( ) = ≤ = .
8n 8n 8nast (rs − ws ) − Gws Gws ws
χ2 (A)
Since rs ≤ n, we obtain a0il ≥ Ω n
, proving (6), and we are done.
Proposition 5.1: Let A be a nonnegative n × n matrix in which all row sums are 1. Let
cj be the j-th column sum of A and let > 0. If
n
1
(cj − 1)2 <
X
,
j=1 n1+
then !
1
per(A) ≥ Ω
−2
.
en(1+n )
In particular,
n
1
(cj − 1)2 <
X
,
j=1 n log n
implies
1
per(A) ≥ Ω n(1+ √ 1 )
.
e log n
17
Lemma 5.2: With the same notations, if
n
1
(cj − 1)2 <
X
,
j=1 n
then
per(A) > 0.
where O is an s×n−s+1 zero submatrix. Since all row sums are 1, the sum of all entries
in B is s. Since A is nonnegative, it implies nj=n−s+2 cj ≥ s. By Cauchy-Schwartz:
P
n n
(cj − 1)2 ≥ (cj − 1)2 ≥
X X
j=0 j=n−s+2
n
1 1 1
(cj − 1))2 ≥
X
( > .
s − 1 j=n−s+2 s−1 n
18
!
1
≥Ω
−2
.
en(1+n )
Proposition 5.1 together with the estimates on the running time of algorithms RCS
and DSS imply:
return(*Perfect Matching*);
return(*No Perfect Matchings*);
End
The maximum matching problem [25] is, of course a classical question in the theory
of algorithms. Among the more recent interesting findings is the fact that this problem
5
Essentially the same algorithm has been independently developed in [16].
19
is in RNC [24, 23]. It is a major challenge to find an NC algorithm for it. While our
work is very far from resolving this difficulty, it may shed some new light, and may be
useful in discovering alternative routes to deciding the existence of perfect matchings.
The above algorithm is polynomial-time, though much inferior to standard algorithms.
We note that each iteration of the present scheme can be carried out in NC. If one can find
another iterative NC-realizable scheme that requires only polylogarithmic many iterations,
this would finally put the decision problem for perfect matchings in NC.
6 Conclusions
With respect to matrix scaling, we have hardly begun investigating the advantage in
applying our new algorithms rather than existing ones. Nor have we studied the possible
extensions of our algorithm to more general scaling and nonlinear optimization problems
from this literature.
With respect to permanent approximation, it is again but a first step on a path that
advocates simple reduction of the input matrix into one that may be easily approximated.
An obvious challenge is of course to develop a polynomial time (1 + )-factor approxima-
tion scheme. At present even finding a (1 + )n -factor approximation scheme seems hard
and challenging. This is at present even elusive for 0, 1 matrices with a constant number
of 1’s in every row and column.
7 Acknowledgement
We are grateful to Yuri Rabinovitch for opening our eyes to the fact that what we were
doing was actually matrix scaling. We also thank Leonid Gurvitz for interesting remarks.
References
[1] M. O. Ball and U. Derigs, An analysis of alternate strategies for implementing
matching algorithms, Networks 13, 517-549, 1983.
[2] A. I. Barvinok, Computing Mixed Discriminants, Mixed Volumes, and Permanents,
Discrete & Computational Geometry , 18 (1997), 205-237.
[3] A. I. Barvinok, A simple polynomial time algorithm to approximate the permanent
within a simply exponential factor, Preprint, 1998.
[4] A. I. Barvinok, A simple polynomial time algorithm to approximate permanents and
mixed discriminants within a simply exponential factor, Preprint, 1998.
20
[5] L. M. Bregman, Certain properties of nonnegative matrices and their permanents,
Soviet Math. Dokl. 14, 945-949, 1973.
[6] D. T. Brown, A note on approximations of discrete probability distributions, Inform.
and Control, 2, 386-392, 1959.
[7] G.P. Egorychev, The solution of van der Waerden’s problem for permanents, Ad-
vances in Math., 42, 299-305, 1981.
[8] D. I. Falikman, Proof of the van der Waerden’s conjecture on the permanent of a
doubly stochastic matrix, Mat. Zametki 29, 6: 931-938, 957, 1981, (in Russian).
[9] U. Feige and C. Lund. On the hardness of computing the permanent of random
matrices, STOC 24, 643-654, 1992.
[10] S. E. Fienberg, The Analysis of Cross Classified Data, MIT press, Cambridge, MA,
1977.
[11] J. Franklin and J. Lorenz, On the scaling of multidimensional matrices, Linear Al-
gebra Appl. 114/115, 717-735, 1989.
[12] S. Friedland, C. Li, H. Schneider, Additive Decomposition of Nonnegative Matrices
with Applications to Permanents and Scaling, Linear and Multilinear Algebra,23,
63–78, 1988.
[13] Z. Galil, S. Micali and H. Gabow, Priority queues with variable priority and an
O(EV log V ) algorithm for finding maximal weighted matching in a general graph,
FOCS 23, 255-261, 1982.
[14] A. V. Goldberg and R. E. Tarjan, A new approach to the maximum-flow problem,
J. Assoc. Comp. Mach., 35, 921–940, 1988.
[15] L. Gurvits, Private communication, 1998.
[16] L. Gurvits, P. N. Yianilos, The deflation-inflation method for certain semidefinite
programming and maximum determinant completion problems, Preprint, 1998.
[17] G. T. Herman and A. Lint, Iterative reconstruction algorithms, Comput. Biol. Med.
6, 276, 1976.
[18] M. Jerrum and A. Sinclair, Approximating the permanent, SIAM J. Comput., 18,
1149-1178, 1989.
[19] M. Jerrum and U. Vazirani, A mildly exponential approximation algorithm for the
permanent, Algorithmica, 16(4/5), 392-401, 1996.
[20] P. W. Kasteleyn, The statistics of dimers on a lattice 1. The number of dimer
arrangements on a quadratic lattice. Physica, 27, 1209–1225, 1961.
21
[21] B. Kalantari and L Khachian, On the complexity of nonnegative matrix scaling,
Linear Algebra Appl., 240, 87-104, 1996.
[23] R. M. Karp, E. Upfal and A. Wigderson, Are search and decision problems compu-
tationally equivalent? STOC 17, 1985.
[27] B. N. Parlett, T. L. Landis, Methods for Scaling to Doubly Stochastic Form, Linear
Algebra Appl. 48, 53-79, 1982.
[29] U. Rothblum and H. Schneider, Scaling of matrices which have prespecified row
sums and column sums via optimization, Linear Algebra Appl. 114/115, 737-764,
1989.
[30] R. Sinkhorn, A relationship between arbitrary positive matrices and doubly stochas-
tic matrices, Ann. Math. Statist. 35, 876-879, 1964.
22