Perfect Sampling for (Atomic) Lovász Local Lemma
arXiv:2107.03932v1 [cs.DS] 8 Jul 2021
Kun He∗
Xiaoming Sun†
Kewen Wu‡
Abstract
We give a Markov chain based perfect sampler for uniform sampling solutions of constraint
satisfaction problems (CSP). Under some mild Lovász local lemma conditions where each constraint of the CSP has a small number of forbidden local configurations, our algorithm is accurate and efficient: it outputs a perfect uniform random solution and its expected running time
is quasilinear in the number of variables. Prior to our work, perfect samplers are only shown to
exist for CSPs under much more restrictive conditions (Guo, Jerrum, and Liu, JACM’19).
Our algorithm has two components:
• A simple perfect sampling algorithm using bounding chains (Huber, STOC’98; Haggstrom
and Nelander, Scandinavian Journal of Statistics’99).
This sampler is efficient if each variable domain is small.
• A simple but powerful state tensorization trick to reduce large domains to smaller ones.
This trick is a generalization of state compression (Feng, He, and Yin, STOC’21).
The crux of our analysis is a simple information percolation argument which allows us to achieve
bounds even beyond current best approximate samplers (Jain, Pham, and Vuong, ArXiv’21).
Previous related works either use intricate algorithms or need sophisticated analysis or even
both. Thus we view the simplicity of both our algorithm and analysis as a strength of our work.
1
Introduction
The constraint satisfaction problem (CSP) is one of the most important topics in computer science
(both theoretically and practically). A CSP is a collection of constraints defined on a set of variables,
and a solution to the CSP is an assignment of variables that satisfies all the constraints. For any
given CSP, it is natural to ask the following questions:
• Decision. Can we decide efficiently if the CSP has a solution?
• Search. If the CSP is satisfiable, can we find a solution efficiently?
• Sampling. If we can efficiently find a solution, can we efficiently sample a uniform random
solution from the whole solution space?
These questions, each deepening one above, progressively enhance our understanding on the computational complexity of CSPs. One can easily imagine the hardness of fully resolving these broad
questions. Thus, not surprisingly, despite enormous results centered around them, we only have
partial answers. Here we mention those related to our work.
∗
Institute of Computing Technology, Chinese Academy of Sciences. Email: hekun.threebody@foxmail.com
Institute of Computing Technology, Chinese Academy of Sciences. Email: sunxiaoming@ict.ac.cn
‡
Department of EECS, University of California at Berkeley. Email: shlw kevin@hotmail.com
†
1
The Decision Problem. A fundamental criterion for the existence of solutions is given by the
famous Lovász local lemma (LLL) [EL75]. Interpreting the space of all possible assignments as a
probability space and the violation of each constraint as a bad event, the local lemma provides a
sufficient condition for the existence of an assignment to avoid all the bad events. This sufficient
condition, commonly referred to as the local lemma regime, is characterized in terms of the violation
probability of each constraint and the dependency relation among the constraints.
The Search Problem. The algorithmic LLL (also called constructive LLL) provides efficient
algorithms to find a solution in the local lemma regime. Plenty of works have been devoted to this
topic [Bec91, Alo91, MR98, CS00, Mos09, MT10, KM11, HSS11, HS17, HS19]. The Moser-Tardos
algorithm [MT10] is a milestone along this line: it finds a solution efficiently up to a sharp condition
known as the Shearer’s bound [She85, KM11].
The Sampling Problem. The sampling LLL asks for efficient algorithms to sample a uniform
random solution from all solutions in the local lemma regime. It serves as a standard toolkit for
the probabilistic inference problem in graphical models [Moi19], and has many applications in the
theory of computing, such as all-terminal network reliability [GJL19, GJ19, GH20]. With a better
understanding of the decision and search problem, much attention has been devoted to the sampling
LLL in recent years [GJL19, Moi19, GLLZ19, GGGY20, FGYZ20, FHY20, JPV20, JPV21]. Since
this is also our focus, we elaborate it in the next subsection.
1.1
Sampling Lovász Local Lemma
To state the long list of results on sampling LLL, we need the following notations. Given a CSP, let
n be the number of variables, k be the maximum number of variables in each constraint, Q be the
maximum number of values that each variable can take, ∆ be the maximum constraint degree, p be
the maximum violation probability of a constraint, and N be the maximum number of forbidden
local configurations for each constraint. A constraint is called atomic if it only has one forbidden
local configuration. A CSP is called atomic if all of its constraints are atomic, i.e., N = 1.
For example in Boolean k-CNF formula, each constraint will depend on exactly k Boolean
variables and thus Q = 2, p = 2−k , N = 1. For hypergraph coloring, each vertex is allowed to
choose a color from {1, 2, . . . , Q} and each edge contains exactly k vertices. The constraints require
no edge is monochromatic. Therefore N = Q, p = Q1−k , and ∆ equals the maximum edge degree
in the hypergraph. For general CSPs, each constraint may depend on different number of variables
and each variable may have different domain size.
The sampling LLL turns out to be computationally more challenging than the algorithmic LLL.
For example, for k-CNF the Moser-Tardos algorithm can efficiently find a solution if ∆ . 2k where
& informally hides lower order terms. However, it is intractable to approximately sample a uniform
solution if ∆ & 2k/2 , even when the formula is monotone [BGG+ 19].
On the algorithmic side, most efforts are on the approximate sampling, where the output distribution is close to uniform under total variation distance. The breakthrough of Moitra [Moi19]
shows k-CNF solutions can be sampled in time npoly(k∆) if ∆ . 2k/60 , where they novelly use
the algorithmic LLL to mark/unmark variables and then convert the problem into solving linear
programs of size npoly(k∆) . We remark that this algorithm is deterministic if we only need a multiplicative approximation of the number of solutions, which is another topic closely related with
approximate sampling [JVV86a]. Moitra’s method has been successfully applied to hypergraph
2
colorings [GLLZ19] and random CNF formulas [GGGY20]1 .
Recently, a much faster algorithm for sampling solutions of k-CNF is given in [FGYZ20], which
implements a Markov chain on the assignments of the marked variables chosen
via Moitra’s method.
e n1.001 together with an imThe resulting sampling algorithm has a near linear running time O
e hides poly(N, k, ∆, Q, log(n)). We also remark that this algoproved regime ∆ . 2k/20 , where O
rithm is inherently randomized even if we move to approximate counting.
This nonadapive mark/unmark approach seems to only work for the Boolean variables, where
each variable has two possible values. To extend the approach to general CSPs, Feng, He, and
Yin [FHY20] introduced states compression, which considerably expands the
applicability of the
e n1.001 if p1/350 · ∆ . 1/N .
method used in [FGYZ20]. Their sampling algorithm runs in time O
This algorithm is limited to the special cases of CSPs where each constraint is violated by a small
number of local configurations (i.e., N is small).
Recently, Jain, Pham, and Vuong [JPV20], shaving the dependency on N , provides a sampling
algorithm with running time npoly(∆,k,log(Q)) when p1/7 · ∆ . 1. They revisit Moitra’s mark/unmark
framework and use it in an adaptive way. This is the first polynomial time algorithm (assuming
∆, k, Q = O(1)) for general CSPs in local lemma regime. By a highly sophisticated information
percolation argument,
they [JPV21] also prove that the sampling algorithm in [FHY20] runs in
e n1.001 if p0.142 · ∆ . 1/N .
time O
Method
[Moi19]
[GLLZ19]
[FGYZ20]
[FHY20]
[JPV20]
[JPV21]
k-CNF
∆ . 2k/60
Hypergraph Coloring
General CSPs
∆ . Qk/16
∆ . 2k/20
∆ . 2k/13
∆ . 2k/7
∆ . 20.175k
∆ . Qk/9
∆ . Qk/7
∆ . Qk/3
p1/350 · ∆ . 1/N
p1/7 · ∆ . 1
p0.142 · ∆ . 1/N
Time
npoly(k∆)
npoly(k∆ log(Q))
e n1.001
O
e n1.001
O
npoly(k∆ log(Q))
e n1.001
O
Table 1: Approximate sampling algorithms in the local lemma regime.
Table 1 summarizes the efficient regimes of these algorithms. We emphasize that all these sampling results, via standard reductions [JVV86b, ŠVV09a], also imply efficient algorithms for (random) approximate counting, which estimates the number of solutions within some multiplicative
error. In addition, for algorithms using Moitra’s linear programming approach [Moi19, GLLZ19,
GGGY20, JPV20], their approximate counting counterparts are deterministic. For the approaches
using Markov chains [FGYZ20, FHY20, JPV21], the running time of their approximate counting
e
counterparts is O(m
· T ), where T is the running time of the corresponding approximate sampling
algorithm and m is the number of constraints.
Though much progress has been made for the approximate sampling, much less are known for
the perfect sampling. As far as we know, the only result on the perfect sampling in the local lemma
regime is due to Guo, Jerrum, and Liu [GJL19], which provides a perfect sampler for the extremal
CSPs where any two constraints sharing common variables cannot be violated simultaneously by
the same assignment. Though there are known reductions from approximate sampling/counting
to perfect sampling [JVV86b], it is unclear how to adopt them here considering the local lemma
conditions.
1
[GGGY20] only provides an approximate counting algorithm for random CNF formulas. But with a close inspection, their algorithm can be turned to do approximate sampling. This follows from standard reductions and noticing
fixing bad variables (defined in [GGGY20]) does not influence their (deterministic) algorithm.
3
Meanwhile, perfect sampling is an important topic in theoretical computer science. Plenty of
works have been devoted to the study of perfect samplers [JVV86b, HN99, Hub98, Hub04, BC20,
JSS20, Fil97, FMMR00, ACG12, FVY19]. Apart from its mathematical interest, one advantage of
perfect sampler over approximate sampler is that the quality of the output is never in question. In
contrast, some solution may never be outputted by an approximate sampler. Consider the following
simple example: Let D1 and D2 be two distributions where D1 is uniform over {1, 2, . . . , n} and
√
√
D2 is uniform over { n + 1, n + 2, . . . , n}. Then the total variation distance between D1 and D2
√
√
is only 1/ n = o(1). Thus D2 is considered a good approximation of D1 while the last n items
are never sampled. This is indeed the case for [FGYZ20, FHY20, JPV21], and is undesirable if
the CSP is used for addressing social problems and the missing solutions are contributed by the
minority or the underrepresented. Besides the potential drawback in social fairness, this also leads
to the following reasonable worry: is it possible that some solution is inherently harder to find than
others. Fortunately our work shows this is not the case.
Perfect sampling is also advantageous for practical purposes and heuristic algorithms. To perform a sampling task, Markov chain is arguably the most common approach. By the convergence
theorem for Markov chains [LP17], it is usually easy to show the chain mixes to the desired distribution almost surely. However to provide a good bound for the mixing time is in general a difficult
task. On the other hand, if the mixing time is unknown or poorly analyzed, it is not sure when
to stop the chain so the output distribution is close enough to the desired one. However given the
Markov chain, there are known techniques, like coupling from the past [PW96], to convert it into
a perfect sampler, which always gives desired distribution when it stops even if we may not know
any bounds on its expected running time.
1.2
Our Results
In this paper, we provide perfect samplers for solutions of atomic CSPs in the local lemma regime.
Though in previous paragraphs we only focus on sampling perfect uniform solution, our algorithm
in fact works for general underlying distribution.
Let Φ be an atomic CSP with variable set V and |V | = n where each v ∈ V is endowed with a
distribution Dv supported on finite domain Ωv .
Q
• Let p be the maximum violation probability of a constraint under distribution v∈V Dv .
• Let ∆ be the maximum constraint degree.
• Let Q be the maximum size of a variable domain Ωv .
• Let k be the maximum number of variables that a constraint depends on.
Q
Let µ be the distribution of solutions of Φ under v∈V Dv , i.e.,
Y
′
µ(σ) =
Pr
σ = σ σ ′ is a solution of Φ
for each σ ∈
Ωv .
Q
σ′ ∼
v∈V
Dv
v∈V
The original Lovász local lemma [EL75] states if p · ∆ . 1 then Φ has a solution, i.e., µ is
well-defined. Then the algorithmic LLL [MT10] shows one can efficiently find a solution under the
same condition. Our main theorem shows one can efficiently sample a solution distributed as µ
under a similar condition.
Theorem 1.1 (Theorem 5.11, Informal). If pγ · ∆ . 1/c where
q
3 + ln(c + 1) − ln2 (c + 1) + 6 ln(c + 1)
Dv (q)
and c = max 2, max max
,
γ=
v∈V q,q ′ ∈Ωv Dv (q ′ )
9
4
then our algorithm runs in expected time poly(k, Q, ∆) · n log(n) and outputs a random solution
distributed perfectly as µ.
We remark that for the uniform case (i.e., Dv is the uniform distribution), we have c = 2 and
γ > 0.145, which already beats 1/7 from [JPV20] and 0.142 from [JPV21].
Theorem 1.1 is proved by a black-box reduction, using our state tensorization trick, to the
perfect sampling algorithm on small variable domains. It is possible to get improved bounds for
specific underlying distributions by a finer analysis. We take the uniform distribution as a starting
example and improve 0.145 to 0.175 for general atomic CSPs.
Theorem 1.2 (Corollary 5.18, Informal). If p0.175 · ∆ . 1 and each Dv is the uniform distribution,
then our algorithm runs in expected time poly(k, Q, ∆) · n log(n) and outputs a perfect uniform
random solution.
We remark that this 0.175 also matches previous best bound for approximately uniform sampling
solutions of k-CNF formula [JPV21]. In fact, in our analysis binary domains are the worst case
for general atomic CSPs: the bound on k-CNF formula is the bottleneck for the bound on general
atomic CSPs.
Indeed, Theorem 1.2 can be further improved if variable domains are large. We use hypergraph coloring as an illustrating example, the bound of which matches the current best bound of
approximate samplers [JPV21].
Theorem 1.3 (Theorem 5.14, Informal). Let H be a hypergraph on n vertices where each edge
contains exactly k vertices. Let ∆ be the edge degree of H. Assume each vertex can choose a color
from {1, 2, . . . , Q}, and a coloring of vertices is proper if it does not produce monochromatic edge.
If ∆ . Qk/3 , then our algorithm runs in expected time poly(k, Q, ∆) · n log(n) and outputs a
perfect uniform random proper coloring of H.
Finally we briefly discuss the connection between our result and other topics.
• Non-atomic CSPs. For a non-atomic CSP, let N be the maximum number of forbidden
local configurations. We can convert it into an atomic CSP by decomposing the non-atomic
constraints to atomic ones as in [FHY20], which only increases the constraint degree ∆ to at
most ∆ · N . Then our perfect sampler is still efficient if N is small.
• Approximate Sampling. Our perfect sampler is a Las Vegas algorithm with quasilinear
expected running time T . It is well-known that terminating the algorithm after T /ε steps
gives an ε-approximate sampler under total variation distance. In particular, the local lemma
condition of our approximate sampler is the same as our perfect sampler, which breaks the
current best record for atomic CSPs by [JPV20, JPV21].
We remark that one can obtain a better bound by analyzing the moments of the running
time of our perfect sampler. We do not make the effort here since this is not our focus and
this may require stronger local lemma conditions.
• Approximate Counting. One way to reduce counting to sampling is to start from a
CSP with no constraint, then add clauses one by one and use the self-reducibility [JVV86a].
Another strategy is to use the simulated annealing approach developed in [BSVV08, SVV09b,
FGYZ20]. Both reductions produce efficient randomized approximate counting algorithms.
We refer interested readers to their paper for details.
Similarly as in the approximate sampling case, the local lemma condition of our approximate
counting algorithm is the same as our perfect sampler, which breaks the current best record
for atomic CSPs by [JPV20, JPV21].
5
1.3
Proof Overview
To illustrate the idea, we first focus on sampling a uniform solution of k-CNF formula:
• There are n Boolean variables. Each variable is endowed with the uniform distribution over
{0, 1}, and appears in at most d constraints.
• Each constraint is a clause depending on exactly k variables and has exactly one forbidden
local assignment.
For example, (x1 ∨x2 ∨¬x3 )∧(x1 ∨x5 ∨x7 )∧(x2 ∨¬x4 ∨¬x6 ) is a 3-CNF formula where n = 6, m = 3
and k = 3, d = 2.
Similar as previous works to deal with the connectivity issue of Glauber dynamics [Wig19], the
first step of our algorithm is to mark variables so every clause has a certain amount of marked and
unmarked variables. Let V be the set of variables and M ⊆ V be the set of marked variables.
Then by the local lemma (Theorem 2.3), for any σ ∈ {0, 1}M and any v ∈ M the following two
distributions are close under total variation distance:
• An unbiased coin in {0, 1}.
• The distribution of σ ′ (v) where σ ′ ∈ {0, 1}V is a uniform random solution conditioning on
σ ′ (M \ {v}) = σ(M \ {v}).
We call this local uniformity.
To sample a solution approximately, previous works [FGYZ20, FHY20, JPV21] simulate an
idealized Glauber dynamics PGlauber (Algorithm 7) on the assignments of the marked variables as
follows:
(A) Initialize σ(v) ∼ {0, 1} uniformly and independently for each v ∈ M.
(B) Going forward from time 0 to T → +∞, let vt be the variable selected at time t ≥ 0.
Iteratively find all clauses that are not yet satisfied by σ(M \ {vt }) and are connected to vt
(Algorithm 1). Let Φ′ be this sub-k-CNF.
Then update σ(vt ) ← σ ′ (vt ), where σ ′ ∈ {0, 1}V is a uniform random solution of Φ′ conditioning on σ ′ (M \ {vt }) = σ(M \ {vt }). Algorithmically, σ ′ is provided via rejection sampling
(Algorithm 2) on Φ′ .
(C) After Step (A) (B), extend σ to unmarked variables V \ M by sampling a uniform random
solution conditioning on σ(M).
To sample a solution perfectly, we simulate bounding chains PBChains (Algorithm 5) of PGlauber
as follows: The algorithm guarantees at any point, each variable v ∈ M is assigned with a value in
{0, 1, ⋆} where ⋆ represents uncertainty.
(1) Initialize σ(v) = ⋆ for each v ∈ M.
(2) Going forward from time −T to −1, let vt be the variable selected at time −T ≤ t < 0.
Iteratively find all clauses that are not yet satisfied by σ(M \ {vt }) and are connected with
vt (Algorithm 1). Let Φ′ be this sub-CSP.
– If all marked variables connected to vt in Φ′ have value 0 or 1, we say vt is coupled. Then
we update σ(vt ) by rejection sampling on Φ′ and σ(vt ) is always updated to 0 or 1.
– Otherwise σ(vt ) is updated based on the local uniformity, which may be assigned to ⋆
with small probability (SafeSampling subroutine in Algorithm 5).
6
(3) After Step (1) (2),
– if some marked variable has value ⋆, then we double T and re-run Step (1) (2);2
– otherwise we stop and extend σ to unmarked variables V \ M by sampling a uniform
random solution conditioning on σ(M).
To simplify the analysis of the algorithm, we use systematic scan for PGlauber and PBChains rather
than random scan [HDSMR16]. Specifically, at time t ∈ Z the algorithm always updates the variable
with index (t mod m) (Algorithm 7) where m = |M|.
Let µM be the distribution of a uniform random solution projected on the marked variables
M. Our goal is to prove the following claims for PBChains :
• Correctness. When we stop in Step (3), σ(M) has distribution µM (Subsection 4.2).
• Efficiency. In expectation, each update in Step (2) is efficient. (Subsection 4.1.1)
• Coalescence. In expectation, we stop with T = O(n log(n)) (Subsection 4.1.2).
Proof of Correctness. Firstly we show PGlauber converges to µM in Step (B) when T → +∞.
Though it is a time inhomogeneous Markov chain, we are able to embed it into a time homogeneous
Markov chain P ′ by viewing |M| consecutive updates as one step. Then it is easy to check P ′ is
aperiodic and irreducible with unique stationary distribution µM . After that, we unpack P ′ to
show PGlauber also converges to µM (Lemma 4.22).
Next, we use the idea of coupling from the past [PW96] and bounding chains [Hub98, HN99].
• Coupling from the Past. Observe that for any positive integer L if we run PGlauber from
−L · m to −1, it has the same distribution as we run it from 0 to L · m − 1. Thus by the
argument above, Step (B) also has distribution µM if we run PGlauber from time −∞ to −1.
• Bounding Chains. For each t ∈ Z, if σ(M) in PBChains has no ⋆, then the update process
is exactly PGlauber . This means PBChains is a coupling of PGlauber (Proposition 4.25). Note
that we use ⋆ to denote uncertainty which includes all possible assignments that we need to
couple. Thus when PBChains stops at time T with σ
b ∈ {0, 1}M at Step (3), any assignment,
going through the updates from time −T to −1, converges to σ
b.
Combining the two observations above, we know σ
b is distributed exactly as µM .
Proof of Efficiency. We first remark that only the rejection sampling is time consuming,
and its running time is a geometric distribution with expectation controlled by the local uniformity (Proposition 3.3). Thus to bound its expectation, it suffices to bound the size of Φ′
(Proposition 4.11). This uses the same 2-tree argument as in previous works and we briefly explain
here.
Since k-CNF has bounded degree, if Φ′ is large then we can find a large independent set S of
clauses in Φ′ . Note that clauses in Φ′ are connected, thus we can further assume S is a 2-tree — S
will be connected if we link any two clauses in S that are at distance 2. Intuitively, a 2-tree is an
independent set that is not very spread out. Then it suffices to union bound the probability that
some large 2-tree growing out of vt survives after previous updates.
One potential pitfall is the total running time of PBChains depends on both T and the running
time of each update, where they can be arbitrarily correlated. Thus we need to calculate the second
2
We remark that the randomness is reused. That is, the randomness used for time t < 0 is the same one regardless
of the starting time −T .
7
moment of the update time (Subsection 4.1.1) and apply Cauchy-Schwarz inequality to break the
correlation (Subsection 4.4).
Proof of Coalescence. To upper bound the round T , we employ the information percolation
argument similarly used in [LS16, HSZ19, JPV21]. For the sake of analysis, we assume Step (2)
in PBChains also includes unmarked variables though it does nothing for the update, i.e., vt is the
variable with index (t mod n) where n = |V |.
The crucial observation is the following. If σ(vt0 ) is updated to ⋆ at time t0 , then at that point
there must be some variable u 6= vt0 with value ⋆ and connected to vt0 . Let t1 be the last update
time of u before t0 , and thus u = vt1 . Then we can find another variable u′ 6= vt1 with value ⋆ and
connected to vt1 at time t1 . Continuing this process until we reach the initialization phase, we will
find a list of time 0 > t0 > t1 > · · · > tℓ ≥ −T such that for each time ti ,
• σ(vti ) is updated to ⋆,
• vti is connected to vti+1 and vti+1 has value ⋆.
To express constraints through time, we define the extended constraint (e, C) (Definition 4.12),
where C is a clause and e = {t′1 , . . . , t′k } ⊆ {−T, . . . , −1} is a time sequence such that
• vt′1 , . . . , vt′k are the variables C depends on,
• t′1 , . . . , t′k are succinct rounds of update for each vt′1 , . . . , vt′k .
Since each vti and vti+1 are connected at time ti as we discussed above, it means we are able to
find extended constraints (ei1 , C1i ), . . . , (eisi , Csii ) such that ti ∈ ei1 and ti+1 ∈ eisi and ei1 , . . . , eisi are
connected over {−T, . . . , −1}. Thus all the edges {eij }i,j form a connected sub-hypergraph H on
vertex set {−T, . . . , −1}.
Since each extended constraint (e, C) represents succinct rounds of update for variables in C, e
has range less than n = |V |, i.e., maxt1 ,t2 ∈e |t1 − t2 | < n. Thus the longest path P in H has length
|P| = Ω(T /n). On the other hand, Step (2) of PBChains only finds clauses that are not satisfied by
marked variables, which means each fixed extended constraint (e, C) appears in H with extremely
low (roughly 2−k ) probability.
Putting everything together, if we do not stop at round T , we should find a path P of extended
constraints of length |P| = Ω(T /n). Meanwhile, each fixed extended constraint is found with
probability at most roughly 2−k . Thus any fixed P exists with probability at most 2−k·|P|/2 , since
extended constraints in odd positions of P form an independent set. Moreover, it is easy to see
each extended constraint overlaps with O(k 2 d) many other extended constraints, which provides
an upper bound O(k2 d)|P| for the number of possible P. By union bound, the probability that we
do not stop at round T is roughly
4 2 Ω(T /n)
k d
≪ n · 2−T /n ,
n·
2k
where we assume k4 d2 ≪ 2k and the additional n comes from choosing t0 ∈ {−1, . . . , −n}, i.e., the
last update resulting in ⋆.
We remark that to deal with general atomic CSPs, we need to be more careful with the union
bound (Subsection 4.1.2). This is because in general atomic CSPs constraints may depend on
different number of variables. Nevertheless, our analysis is much simpler than the one in [JPV21].
One main reason is that we use systematic scan instead of random scan, which makes the updates
of each variable well behave through time. Moreover, our main data structure, extended constraint
(Definition 4.12), is also much simpler than the discrepancy checks used in their argument.
8
State Tensorization. The marking process is only efficient when the variable domains are small;
otherwise it cannot guarantee useful local uniformity. Similarly for ⋆, it compresses too much
information when the domain is large. Therefore to deal with large domains, we introduce a simple
state tensorization trick to perform the reduction.
For intuition, let’s consider the following concrete atomic CSP Φ:
• The variables are u, v where u is endowed with distribution Du by Du (a) = Du (b) = Du (c) =
1/3; and v is endowed with Dv by Dv (A) = Dv (B) = 1/4, Dv (C) = 1/3, Dv (D) = 1/6.
• The constraints are C1 , C2 where C1 = False iff u = a; and C2 = False iff u = c, v = B.
Then we describe one possible state tensorization as follows (See Figure 1):
• Define variables u1 , u2 where u1 is endowed with Du1 by Du1 (0) = 2/3, Du1 (1) = 1/3; and u2
is endowed with Du2 (0) = Du2 (1) = 1/2.
We interpret u = a if (u1 , u2 ) = (0, 0); u = b if (u1 , u2 ) = (0, 1); and u = c if u1 = 1.
• Define variables v1 , v2 , v3 where v1 is endowed with Dv1 by Dv1 (0) = Dv1 (1) = 1/2; and v2 is
endowed with Dv2 (0) = Dv2 (1) = 1/2; and v3 is endowed with Dv3 (0) = 2/3, Dv3 (1) = 1/3.
We interpret v = A if (v1 , v2 ) = (0, 0); u = C if (v1 , v3 ) = (1, 0), etc.
u1
u
u2
a b c
A B C D
1
3
1
4
1
3
1
3
1
4
1
3
1
2
1
6
a
v1
1
3
2
3
v
1
2
v2
c
1
2
b
1
2
1
2
A
v3
1
2
2
3
B C
1
3
D
(b) The new variables u1 , u2 and v1 , v2 , v3 .
(a) The original variables u and v.
Figure 1: An example for state tensorization.
Hence after the state tensorization, C1 = False iff u1 = 0, u2 = 0; and C2 = False iff u1 = 1, v1 =
0, v2 = 1. Moreover, sampling the value of u, v from Du × Dv is equivalent to first sampling the
value of u1 , u2 , v1 , v2 , v3 from Du1 × Du2 × Dv1 × Dv2 × Dv3 and then interpret the value of u, v from
them. Therefore, to obtain a random solution under distribution Du × Dv , it suffices to first obtain
a random solution under the product distribution of u1 , u2 , v1 , v2 , v3 and then interpret them back.
Most importantly, this reduction does not change the violation probability of any individual constraint, nor change the dependency relation among constraints. These essentially guarantees that
the desired local lemma condition does not deteriorate after the reduction. The formal description
of the reduction can be found in Subsection 5.2.
We remark that state tensorization, combined with the marking M, generalizes the state compression technique in [FHY20]. On the other hand, state tensorization is similar to standard gadget
reduction in the study of complexity theory. For example by encoding large alphabets using binary
bits, one can show Boolean CSPs are no easier to solve than CSPs with large variable domains for
polynomial time algorithms. However we are not aware of such simple idea being used for perform
sampling tasks.
Organization. We give formal definitions in Section 2. Useful subroutines are provided in
Section 3 and then we describe and analyze our main algorithm in Section 4. We discuss our
result for different applications in Section 5.
9
2
Preliminaries
We use e ≈ 2.71828 to denote the natural base. We use log(·) and ln(·) to denote the logarithm
with base 2 and e respectively. We use [N ] to denote {1, 2, . S
. . , N }; and use Z to denote the set of
all integers. We say V is a disjoint union of (Vi )i∈[s] if V = i∈[s] Vi and Vi ∩ Vj = ∅ holds for any
distinct i, j ∈ [s]. For positive integer m, t mod m = t − m · ⌊t/m⌋ for non-negative integer t; and
t mod m = t · (1 − m) mod m for negative integer t. Q
For any index set
Q I and domains (Ωi )i∈I , we use i∈I Ωi to denote their product space. For
some vector
Q vec ∈ i∈I Ωi , we use vec(i) ∈ Ωi to denote the entry of vec indexed by i; and use
vec(J) ∈ i∈J Ωi to denote the entries of vec on indices J ⊆ I.
For a finite set X and a distribution D over X , we use x ∼ D to denote that x is a random
variable sampled from X according to distribution D. For two events E1 , E2 with Pr [E2 ] = 0, we
define the conditional probability Pr [E1 (x) | E2 (x)] = 0. We say event E happens almost surely if
Pr [E] = 1.
Constraint Satisfaction Problems. QLet V be a set of variables with finite domains (Ωv )v∈V .
A constraint C on VQis a mapping C : v∈V Ωv → {True, False}. We say C depends on v ∈ V if
there exists σ1 , σ2 ∈ v∈V Ωv such that C(σ1 ) 6= C(σ2 ) and σ1 , σ2 differ in (and only in) v. We use
vbl(C)
to denote the set of variables that C depends on, then C can be viewed as a mapping from
Q
v∈vbl(C) Ωv to {True, False}.
Q
Q
C
C
⊆ v∈vbl(C) Ωv ) to denote the set
⊆ v∈vbl(C) Ωv (resp., σTrue
For convenience we use σFalse
of falsifying (resp., satisfying) assignments of C. More generally, for C being a set of constraints,
C ) to denote the set of falsifying (resp., satisfying) assignments of C, i.e.,
C
(resp., σTrue
we use σFalse
C
C
C(σ) = False for all σ ∈ σFalse
and some C ∈ C (resp., C(σ) = True for all σ ∈ σTrue
and all C ∈ C).
For sampling LLL, we also need to specify the underlying distribution. Assume each v ∈ V has
some distribution Dv supported on Ωv .3 Define µCTrue as the distribution of solutions of C induced
by (Dv )v∈V , i.e.,
Y
′
C
µCTrue (σ) =
Pr
σ = σ σ ′ ∈ σTrue
for each σ ∈
Ωv .
Q
σ′ ∼
v∈V
Dv
v∈V
Definition 2.1 ((Atomic) Constraint Satisfaction Problem). A constraint satisfaction problem
is specified by Φ = V, (Ωv , Dv )v∈V , C where C is a set of constraints on V and each Dv is a
distribution supported on Ωv .
C
= 1 for all C ∈ C. In this case, we abuse the notation to define
We say Φ is atomic if σFalse
C
σFalse as the unique falsifying assignment of C.
In addition, we define the following measures for Φ:
• the width is k = k(Φ) = maxC∈C |vbl(C)|;
• the variable degree is d = d(Φ) = maxv∈V |{C ∈ C | v ∈ vbl(C)}|;
• the constraint degree is ∆ = ∆(Φ) = maxC∈C |{C ′ ∈ C | vbl(C) ∩ vbl(C ′ ) 6= ∅}|;4
• the domain size is Q = Q(Φ) = maxv∈V |Ωv |;
• the maximal individual falsifying probability is
p = p(Φ) = max
C∈C σ∼
3
4
Q
Pr
v∈vbl(C)
Dv
[C(σ) = False] = max
C∈C
X
C
σ∈σFalse
Y
v∈vbl(C)
Dv (σ(v)).
This means Dv (U ) > 0 for all U ∈ Ωv . One natural choice is the uniform distribution.
Here ∆ is one plus the maximum degree of the dependency graph of Φ since C ∈ {C ′ ∈ C | vbl(C) ∩ vbl(C ′ ) 6= ∅}.
10
We will simply use k, d, ∆, Q, p when Φ is clear from the context. In addition we assume ∆ ≥ 2,
d ≥ 2, and |V | ≥ 2 since otherwise the constraints in Φ are independent and the sampling problem
becomes trivial.
Lovász Local Lemma. The Lovász local lemma provides sufficient conditions to guarantee the
existence of a solution of CSPs.
Theorem 2.2 ([EL75]). Let Φ = V, (Ωv , Dv )v∈V , C be a CSP. If ep∆ ≤ 1, then
σ∼
QPr
v∈V
Dv
C
σ ∈ σTrue
≥ (1 − ep)|C| > 0.
The following more general version, first stated in [HSS11], can be proved with minor modification of the original proof of the Lovász local lemma [EL75].
Theorem 2.3 ([HSS11, Theorem 2.1]). Let Φ = V, (Ωv , Dv )v∈V , C be a CSP. If ep∆ ≤ 1, then
C
6= ∅ and for any constraint B (not necessarily from C) we have
σTrue
Pr [B(σ) = True] ≤ (1 − ep)−|Γ(B)|
σ∼µC
True
σ∼
QPr
v∈V
Dv
[B(σ) = True] ,
where Γ(B) = {C ∈ C | vbl(C) ∩ vbl(B) 6= ∅}.
Hypergraphs. Here we give definitions related with hypergraphs. All the definitions directly
translate to graphs when every edge in the hypergraph contains two vertices.
Let H be a hypergraph with finite vertex set V (H) and finite edge set E(H). Each edge
e ∈ E(H) is a non-empty subset
of V (H). We allow multiple occurrence of a same edge. For any
CSP Φ = V, (Ωv , Dv )v∈V , C we naturally view it as a hypergraph H(Φ) where
V (H(Φ)) = V
and
E(H(Φ)) = {vbl(C)}C∈C .
(1)
Similar as the measures of CSPs, we define the following measures for a hypergraph H:
• the width is k = k(H) = maxe∈E(H) |e|;
• the vertex degree is d = d(H) = maxv∈V (H) |{e ∈ E(H) | v ∈ e}|;
• the edge degree is ∆ = ∆(H) = maxe∈E(H) |{e′ ∈ E(H) | e ∩ e′ 6= ∅}|.
For any two vertices u, v ∈ V (H), we say they are adjacent if there exists some e ∈ E(H) such
that u ∈ e and v ∈ e; we say they are connected if there exists a vertex sequence w1 , w2 , . . . , wd ∈
V (H) such that w1 = u, wd = v and each wi , wi+1 are adjacent. Then hypergraph H is connected
if any two vertices u, v ∈ V (H) are connected. Furthermore, we have the following basic fact
regarding connected hypergraphs.
Fact 2.4. Assume H is a connected hypergraph. Then for any e, e′ ∈ E(H), there exists a sequence
of edges e1 , e2 , . . . , eℓ such that the following holds.
• e1 = e, eℓ = e′ , and ei ∩ ei+1 6= ∅ for all i ∈ [ℓ − 1].
• ei ∩ ej = ∅ for all i, j ∈ [ℓ] with |i − j| > 1.
A hypergraph H ′ is a sub-hypergraph of H if V (H ′ ) ⊆ V (H) and E(H ′ ) ⊆ E(H). If in addition
e ∩ V (H ′ ) = ∅ holds for all e ∈ E(H) \ E(H ′ ), we say H ′ is an induced sub-hypergraph of H.
11
Marking. Apart from the CSP Φ = V, (Ωv , Dv )v∈V , C itself, our algorithms will also need a
subset of V which we call marking. Both the correctness and efficiency of our algorithms rely on
the marking.
Assume Φ is atomic and recall our measures for Φ from Definition 2.1. We define the following
constants given marking M, the meaning of which will be clear as we proceed to the next section:
• The maximal conditional falsifying probability of (Φ, M) is
Y
C
α = α(Φ, M) = max
Dv (σFalse
(v)).
C∈C
(2)
v∈vbl(C)\M
• When eα ≤ 1, define
– the multiplicative bias of (Φ, M) as
β = β(Φ, M) = (1 − eα)−d ;
(3)
– the maximal multiplicative-biased falsifying probability of (Φ, M) as
Y
C
(v));
ρ = ρ(Φ, M) = max
β · Dv (σFalse
(4)
– the maximal multiplicative-biased unpercolated probability of (Φ, M) as
Y
C
(v)) + (β − 1) (|Ωv | − 2) .
β · Dv (σFalse
λ = λ(Φ, M) = max |vbl(C)|2
(5)
C∈C
C∈C
v∈vbl(C)∩M
v∈vbl(C)∩M
When context is clear, we will just use α, β, ρ, λ.
Wildcard. We will reserve ⋆ as a wildcard symbol. Our algorithms will use ⋆ to represent all the
possibilities in some domain.
Let Φ = V, (Ωv , Dv )v∈V , C be
For each v ∈ V we assume ⋆ ∈
/ Ωv and define Ω⋆v =
Q a CSP.
Ωv ∪ {⋆}. For any C ∈ C and σ ∈ v∈V Ω⋆v we abuse notation to define
(
Q
False ∃σ ′ ∈ v∈V Ωv such that C(σ ′ ) = False and σ(v) ∈ {σ ′ (v), ⋆} for all v ∈ V,
(6)
C(σ) =
True otherwise.
Here we define CSPs projected on assignments with ⋆ coordinates. Intuitively this assignment
fixes (and only fixes) the non-⋆ variables for the CSP. But pedantically we provide the following
definition.
Q
Definition 2.5 (Projected Constraint Satisfaction Problem). For any σ ∈ v∈V Ω⋆v , we define the
projected constraint satisfaction problem Φ|σ = V, (Ωv |σ , Dv |σ )v∈V , C|σ by setting
(
(Ωv , Dv )
σ(v) = ⋆,
(Ωv |σ , Dv |σ ) =
for all v ∈ V
({σ(v)} , point distribution) otherwise
and C|σ = {C|σ | C ∈ C, C(σ) = False} where C|σ has the same evaluation rule as C ∈ C but
depends on possibly fewer variables, i.e., vbl(C|σ ) = {v ∈ vbl(C) | σ(v) = ⋆} ⊆ vbl(C). Similarly we
sometimes view C|σ as constraint only on vbl(C|σ ).
Recall the measures defined in Definition 2.1. We note the following simple fact.
Fact 2.6. k(Φ|σ ) ≤ k(Φ), d(Φ|σ ) ≤ d(Φ|σ ), ∆(Φ|σ ) ≤ ∆(Φ), and Q(Φ|σ ) ≤ Q(Φ). Moreover, if Φ
is an atomic CSP, then Φ|σ is an atomic CSP.
12
3
Useful Subroutines
In this section, we provide some useful subroutines for later reference in our main algorithm.
3.1
A Component Subroutine
Recall notations defined in Definition 2.5. We first set up the
Q following Component(Φ, M, σ, u)
subroutine, which uses u ∈ V and current assignment σ ∈ v∈V Ω⋆v to (hopefully) decompose
projected CSP Φ|σ into two disjoint parts: One containing u and one isolated from u. For our
purpose, the input will guarantee that Φ is atomic, σ(u) = ⋆, and σ(v) ≡ ⋆ for all v ∈ V \ M.
Algorithm 1: The Component subroutine
Input: an atomic CSP Φ = V, (Ωv , Dv )v∈V , C , a marking M ⊆ V , an assignment
Q
V \M
⋆
σ∈
, and u ∈ V with σ(u) = ⋆
v∈M Ωv × {⋆}
′
′
Output: (Φ , Token) where Φ = V ′ , (Ωv |σ , Dv |σ )v∈V ′ , C ′ and Token ∈ {True, False}
1 Initialize V ′ ← {u} and C ′ ← ∅
2 while ∃C|σ ∈ C|σ \ C ′ with vbl(C|σ ) ∩ V ′ 6= ∅ do
3
if vbl(C|σ ) ⊆ {u} ∪ (V \ M) then Update V ′ ← V ′ ∪ vbl(C|σ ) and C ′ ← C ′ ∪ {C|σ }
4
else return (Φ′ , False)
5 end
6 return (Φ′ , True)
Here we note the following observation regarding Algorithm 1.
Proposition 3.1. The following holds for Component(Φ, M, σ, u).
(1) It runs in time O (∆k|C ′ | + dk).
C (v) holds for any
(2) u ∈ V ′ ⊆ V , C ′ ⊆ C|σ , and σ(v) = ⋆ for all v ∈ V ′ . Moreover, σ(v) = σFalse
′
C|σ ∈ C and v ∈ (vbl(C) ∩ M) \ {u}.
(3) Φ′ is an atomic CSP and hypergraph H(Φ′ ) (Defined in (1)) is connected.
(4) If Token = True, let V ′′ = V \ V ′ and C ′′ = C|σ \ C ′ . Then Φ′′ = V ′′ , (Ωv |σ , Dv |σ )v∈V ′′ , C ′′ is
C|
′
′′
C|
′
′′
σ
σ
C
C
= σTrue
an atomic CSP. Moreover σTrue
× σTrue
= µCTrue × µCTrue .
and µTrue
Proof. Item (2) is evident from the algorithm, Φ being atomic, and Definition 2.5.
For Item (3), note that each time we add a constraint C|σ into C ′ , we add vbl(C|σ ) into V ′ .
Thus Φ′ is a CSP. Since Φ is atomic, by Fact 2.6 Φ′ is also atomic. In addition, we only consider
C|σ with vbl(C|σ ) ∩ V ′ 6= ∅ at Line 2, hence H(Φ′ ) is connected.
For Item (1), the algorithm can be executed by first checking all (at most d) constraints related
with u, then iteratively checking (at most ∆|C ′ | in total) constraints related with the newly added
constraints in C ′ . Thus the total running time is O(k) · (∆|C ′ | + d).
Now we focus on Item (4) when Token = True. The condition from Line 2 implies for any
C|σ ∈ C ′′ , vbl(C|σ ) ∩ V ′ = ∅ and thus vbl(C|σ ) ⊆ V ′′ . Therefore Φ′′ is a CSP. Since Φ is atomic, by
Fact 2.6 Φ′′ is also atomic. Then the “moreover” part follows from V (resp., C|σ ) being a disjoint
union of V ′ and V ′′ (resp., C ′ and C ′′ ).
13
3.2
A RejectionSampling Subroutine
The following simple perfect sampler, which is based on the standard rejection sampling technique,
will be a building block for our main algorithm.
Algorithm 2: The RejectionSampling algorithm
Input: a CSP Φ = V, (Ωv , Dv )v∈V , C and a randomness tape r
Output: an assignment σ ∈ µCTrue
1 while True do
Q
2
Sample σ ∼ v∈V Dv with fresh randomness from r
3
if C(σ) = True for all C ∈ C then return σ
4 end
We have the following result on Algorithm 2 by basic facts of geometric distributions.
C
6= ∅.
Fact 3.2. The following holds for RejectionSampling(Φ, r) over random r if σTrue
• It halts almost surely, and outputs σ ∼ µCTrue when it halts.
• Let T be the number of while iterations it takes before it halts. Then
E[T ] =
1
Prσ∼Qv∈V
Dv
C
σ ∈ σTrue
• Let X be its total running time.5 Then
E[X] = O (E[T ] · (k|C| + Q|V |))
and
and
E T 2 = 2 · (E[T ])2 − E[T ].
E X 2 = O (E[T ] · (k|C| + Q|V |))2 .
The following result is useful when we perform rejection sampling on a projected CSP, say Φ′
from Algorithm 1.
Proposition 3.3. Let Φ = V, (Ωv , Dv )v∈V , C be an atomic CSP and M ⊆ V be a marking. Let
Q
V \M
⋆
k = k(Φ), ∆ = ∆(Φ), Q = Q(Φ), α = α(Φ, M), and β = β(Φ, M).
Let σ ∈
v∈M Ωv × {⋆}
′
′
′
be an arbitrary assignment and Φ = V , (Ωv |σ , Dv |σ )v∈V ′ , C be an arbitrary sub-CSP of Φ|σ
where V ′ ⊆ V and C ′ ⊆ C|σ .
If eα∆ ≤ 1, then the following holds for RejectionSampling(Φ′, r) over random r.
(1) Let X be its running time. Then X < +∞ almost surely and, if hypergraph H(Φ′ ) (Defined
in (1)) is connected, we have
!
!
2
kQ|C ′ | + Q
(kQ|C ′ | + Q)2
E[X] = O
and E X = O
.
′
′
(1 − eα)|C |
(1 − eα)2·|C |
′
(2) Let σ ′ be its output. Then σ ′ ∼ µCTrue . Moreover for any v ∈ V ′ and any q ∈ Ωv |σ , we have
Pr σ ′ (v) = q ≤ β · Dv |σ (q).
5
Each while iteration can be performed in time O(k|C| + Q|V |) where O(Q) · |V |) is for Line 2 and O(k) · |C| is for
Line 3.
14
Proof. Firstly by Fact 2.6, ∆(Φ′ ) ≤ ∆. Then for any C|σ ∈ C ′ , we have
σ
e∼
Q
=
Pr
v∈vbl(C|σ )
Y
v∈vbl(C|σ )
≤
Y
[C|σ (e
σ ) = False]
C
Dv (σFalse
(v))
v∈vbl(C)\M
≤ α.
Dv |σ
(since Φ is atomic and by Definition 2.5)
C
(v))
Dv (σFalse
(since σ(V \ M) = ⋆V \M and thus vbl(C|σ ) ⊆ V \ M)
(by (2))
Thus p(Φ′ ) ≤ α (Recall p from Definition 2.1) and we apply Theorem 2.2 to obtain
i
h
′
C′
≥ (1 − eα)|C | > 0.
σ
e ∈ σTrue
Q Pr
σ
e∼
v∈V ′
Dv |σ
By Fact 2.6, k(Φ′ ) ≤ k. Then Item (1) follows immediately from Fact 3.2 by noticing |V ′ | ≤
′
k|C ′ | + 16 when H(Φ′ ) is connected. σ ′ ∼ µCTrue in Item (2) follows from Fact 3.2 as well.
Note that β ≥ 1. Thus by Definition 2.5 we may safely assume σ(v) = ⋆ in Item (2). Let B(σ ′ )
be the event (i.e., constraint) “σ ′ (v) = q”. Then vbl(B) = {v} and
Pr ′ B(σ ′ ) = Pr ′ σ ′ (v) = q
and
B(σ ′ ) = Dv |σ (q).
Q Pr
σ′ ∼µC
True
σ′ ∼
σ′ ∼µC
True
v∈V ′
Dv |σ
By Fact 2.6, d(Φ′ ) ≤ d. Thus Item (2) follows naturally from Theorem 2.3 by the definition of β
and noticing
C|σ ∈ C ′ vbl(B) ∩ vbl(C|σ ) 6= ∅ = C|σ ∈ C ′ v ∈ vbl(C|σ ) ≤ d(Φ′ ) ≤ d.
3.3
A SafeSampling Subroutine
The following simple SafeSampling complements RejectionSampling when there is uncertainty to
update u ∈ M. Details will be clear in Subsection 4.2.2.
Algorithm 3: The SafeSampling algorithm
Input: an atomic CSP Φ = V, (Ωv , Dv )v∈V , C , a marking M ⊆ V , some u ∈ M, and a
randomness tape r
Output: some value q ∈ Ω⋆u
1 Recall β = β(Φ, M) from (3)
⋆ using r where D ⋆ is a distribution over Ω⋆ by
2 Sample c from Du
u
u
Du⋆ (q)
return c
3
(
max {0, 1 − β · (1 − Du (q))} q ∈ Ωu ,
=
P
q = ⋆.
1 − q′ ∈Ωu Du⋆ (q ′ )
We note the following observation regarding SafeSampling.
Proposition 3.4. The following holds for SafeSampling(Φ, M, u, r) over random r if eα∆ ≤ 1.
6
The additional +1 is for the case |V ′ | = 1 and C ′ = ∅.
15
(1) It runs in time O(Q) where Q = Q(Φ) from Definition 2.1 and Du⋆ from Line 2 is a distribution.
(2) For any q ∈ Ωu , Du⋆ (q) ≤ Du (q) and Du⋆ ({q, ⋆}) ≤ β · Du (q) + (β − 1)(|Ωu | − 2).
Q
V \M
⋆
(3) Let σ ∈
be an arbitrary assignment with σ(u) = ⋆. Let Φ′ = V ′ , (Ωv |σ , Dv |σ )v∈V ′ , C ′
v∈M Ωv ×{⋆}
′
be an arbitrary sub-CSP of Φ|σ where V ′ ⊆ V and C ′ ⊆ C|σ . Let σ ′ ∼ µCTrue .7 Then for any
q ∈ Ωu |σ = Ωu , we have Du⋆ (q) ≤ Pr [σ ′ (u) = q].
Proof. First note that β ≥ 1. For Item (2), observe that
Du⋆ (q) = max {0, 1 − β · (1 − Du (q))} ≤ max {0, 1 − 1 · (1 − Du (q))} = Du (q)
and
Du⋆ ({q, ⋆}) = 1 −
X
q ′ ∈Ωu \{q}
Du⋆ (q ′ ) ≤ 1 −
X
q ′ ∈Ωu \{q}
= 1 + (β − 1)(|Ωu | − 1) − β ·
X
q ′ ∈Ωu \{q}
1 − β · (1 − Du (q ′ ))
Du (q ′ )
= 1 − (β − 1)(|Ωu | − 1) − β · (1 − Du (q)) = β · Du (q) + (β − 1)(|Ωu | − 2).
P
For Item (1), it suffices to observe, using Item (2), that Du⋆ (⋆) ≥ 1 − q′ ∈Ωu Du (q ′ ) = 0.
Finally for Item (3), we simply have
X
Pr σ ′ (u) = q = 1 −
Pr σ ′ (u) = q ′
q ′ ∈Ωu \{q}
≥1−β ·
X
q ′ ∈Ωu \{q}
Du (q ′ )
(by Item (2) of Proposition 3.3 and Du |σ = Du )
= 1 − β · (1 − Du (q))
≥ Du (q) ≥ Du⋆ (q).
4
(since β ≥ 1 and by Item (2))
The AtomicCSPSampling Algorithm
We now formally describe our main algorithm AtomicCSPSampling in Algorithm 4. The missing
subroutines will be provided as we prove the correctness and efficiency of Algorithm 4.
Intuitively, the σ
e after the while iterations will be a random assignment over M (i.e., σ
e ∈
Q
V \M
) with certain distribution; and the final output σ simply extends the assignv∈M Ωv × {⋆}
ment of σ
e to V \ M. Putting them together, we will prove σ is distributed as µCTrue .
Recall measures α, ρ, λ, k, d, ∆, Q defined in (2), (4), (5), and Definition 2.1. Now we present
our main theorem, the proof of which is the focus of the rest of the section.
Theorem 4.1. Let Φ = V, (Ωv , Dv )v∈V , C be an atomic CSP. Let M ⊆ V be a marking. If
eα∆ ≤ 1, e∆2 ρ ≤ 1/32, and ∆2 λ ≤ 1/16, then the following holds for AtomicCSPSampling(Φ, M).
• Correctness. It halts almost surely and outputs σ ∼ µCTrue when it halts.
• Efficiency. Its expected total running time is O kQ∆3 |V | log(|V |) .
7 C′
µTrue
is well-defined guaranteed by Item (2) of Proposition 3.3.
16
Algorithm 4: The AtomicCSPSampling algorithm
Input: an atomic CSP Φ = V, (Ωv , Dv )v∈V , C and a marking M ⊆ V
C
Output: an assignment σ ∈ σTrue
1 Assign infinitely long randomness ri independently for each i ∈ Z
2 Initialize T ← 1
3 while True do
Q
V \M
⋆
4
σ
e ← BoundingChain(Φ, M, −T, r−T , . . . , r−1 )
/* σ
e∈
*/
v∈M Ωv × {⋆}
5
if σ
e(v) 6= ⋆ for all v ∈ M then break
6
else Update T ← 2 · T
7 end
8 σ ← FinalSampling(Φ, M, σ
e)
9 return σ
Remark 4.2. We remark that λ ≥ ρ since β ≥ 1 and domains have at least 2 elements8 . Thus we
can use, say, ∆2 λ ≤ 1/100 to dominate both e∆2 ρ ≤ 1/32 and ∆2 λ ≤ 1/16 and thus simplify the
conditions in Theorem 4.1. This indeed only loses minor factors. However, we choose to present in
the current format to make the proofs cleaner as e∆2 ρ and ∆2 λ comes from different places.
4.1
The BoundingChain Subroutine
Recall our SafeSampling subroutine from Subsection 3.3. We present the BoundingChain subroutine.
Algorithm 5: The BoundingChain subroutine
Input: an atomic CSP Φ = V, (Ωv , Dv )v∈V , C , a marking M ⊆ V , a starting time −T ,
and randomness tapesQr−T , . . . , r−1
Output: an assignment σ ∈ v∈V Ω⋆v
1 Initialize σ(v) ← ⋆ for all v ∈ V
/* Assume V = {v0 , . . . , vn−1 } */
2 for t = −T to −1 do
3
it ← t mod n, and σ(vit ) ← ⋆
/* Update σ(vit ) in this round */
4
(Φt , Token t ) ← Component(Φ, M, σ, vit ) where Φt = Vt , (Ωv |σ , Dv |σ )v∈Vt , Ct
5
if vit ∈ V \ M then Continue
6
else if (vit ∈ M) ∧ (Token t = True) then
7
σ ′ ← RejectionSampling(Φt , rt )
8
Update σ(vit ) ← σ ′ (vit )
9
else
/* (vit ∈ M) ∧ (Tokent = False) */
10
Update σ(vit ) ← SafeSampling(Φ, M, vit , rt )
11
end
12 end
13 return σ
Remark 4.3. Algorithm 5 also works if we ignore variables outside M. This is because σ(V \
M) is always kept ⋆V \M . However the current version is more convenient for our analysis in
Subsection 4.1.2 and does not influence the running time much.
8
If some domain has only 1 element, then the corresponding distribution must be a point distribution. Thus we
can simply fix the variable to this value and simplify the CSP.
17
We first note the following fact regarding each round of update.
Q
⋆
Lemma 4.4. Let t ∈ {−T, . . . , −1} be an arbitrary time with vit ∈ M. Let σ ∈
v∈M Ωv ×
{⋆}V \M be an arbitrary assignment. Let q ∈ Ω⋆vit be the update of σ(vit ) in the t-th for iteration of
Algorithm 5 over random rt . If eα∆ ≤ 1, then
(1) q is well-defined almost surely;
(2) for any q ′ ∈ Ωvit we have
Pr q = q ′ σ, t ≤ β ·Dvit (q ′ )
and
Pr q ∈ q ′ , ⋆
σ, t ≤ β ·Dvit (q ′ )+(β −1) Ωvit − 2 .
Proof. Note that σ uniquely determines Tokent , i.e., whether we update σ(vit ) by RejectionSampling
or SafeSampling. Thus we only need to consider two possibilities separately.
• RejectionSampling. By Item (3) of Proposition 3.1 and Item (1) of Proposition 3.3, q is welldefined almost surely. Since σ(vit ) is set to ⋆ right before the update and q is never ⋆ in
RejectionSampling, we have Dvit |σ = Dvit and, by Item (2) of Proposition 3.3,
Pr q ∈ q ′ , ⋆
σ, t = Pr q = q ′ σ, t ≤ β · Dvit (q ′ ).
• SafeSampling. By Item (1) of Proposition 3.4, q is always well-defined. Then the two bounds
follow from Item (2) of Proposition 3.4.
We will show the following result for one single call of Algorithm 5.
Proposition 4.5. If eα∆ ≤ 1, e∆2 ρ ≤ 1/32, and ∆2 λ ≤ 1/16, then the following holds over
random r−T , . . . , r−1 for BoundingChain(Φ, M, −T, r−T , . . . , r−1 ).
• Efficiency. Let Xt be the running
time of the t-th for iteration. Then Xt < +∞ almost
2
2
5
2
surely and E Xt = O dk ∆ Q .
• Coalescence. Let E be the event “in the returned assignment σ, there exists some u ∈ M
such that σ(u) = ⋆”. If T ≥ 2|V | − 1, we have Pr [E] ≤ 4|V | · 2−T /|V | .
4.1.1
Moment Bounds on the Running Time
To establish the efficiency part of Proposition 4.5, we need to control the size of Φt in each for
iteration. This requires some additional definitions.
Definition 4.6 (2-tree). Let G be an undirected graph. A set of vertices S ⊆ V (G) is a 2-tree if
the following holds.
• distG (u, v) ≥ 2 holds for any distinct u, v ∈ S where distG (u, v) is the length of the shortest
path in G from u to v.9
• If we add an edge between every u, v ∈ S with distG (u, v) = 2, then S is connected.
Intuitively a 2-tree is an independent set that is not very spread out. The following lemmas
bound the number of 2-trees and show how to extract a large 2-tree from any connected subgraph.
9
For example distG (u, u) ≡ 0 for all u ∈ V (G), and distG (u, v) = 1 iff (u, v) is an edge in E(G).
18
Lemma 4.7 ([FGYZ20, Corollary 5.7]). Let G be a graph with maximum degree d. Then for any
ℓ−1
/2.
v ∈ V (G) and integer ℓ ≥ 1, the number of 2-trees in G of size ℓ containing v is at most ed2
Lemma 4.8 ([JPV21, Lemma 4.5]). Let G be a graph with maximum degree d. Let G′ be a
connected subgraph of G. Then for any v ∈ V (G′ ), there exists a 2-tree S ⊆ V (G′ ) with v ∈ S and
size |S| ≥ |V (G′ )|/(d + 1).
Lemma 4.9 ([FGYZ20, Observation 5.5]). If a graph G has a 2-tree of size ℓ > 1 containing
v ∈ V (G), then G also has a 2-tree of size ℓ − 1 containing v.
The following result is an immediate corollary of Lemma 4.8 and Lemma 4.9.
Corollary 4.10. Let G be a graph with maximum degree d. Let G′ be a connected subgraph of G.
Then for any v ∈ V (G′ ) and any integer ℓ ≤ ⌈|V (G′ )|/(d + 1)⌉, there exists a 2-tree S ⊆ V (G′ )
with v ∈ S and size |S| = ℓ.
Now we show the following concentration bound.
Proposition 4.11. Let d = d(Φ), ∆ = ∆(Φ), α = α(Φ, M), and ρ = ρ(Φ, M). Assume eα∆ ≤ 1.
For any t ∈ {−T, . . . , −1}, recall Ct from Line 4 of Algorithm 5. Then we have
ℓ−1
d
· e∆2 ρ
for any integer ℓ ≥ 1.
2
Proof. Construct the line graph Lin(Φ) = V Φ , E Φ of Φ = V, (Ωv , Dv )v∈V , C where
Pr [|Ct | ≥ ℓ · ∆] ≤
VΦ =C
and
E Φ = {{e1 , e2 } ∈ C × C | vbl(e1 ) ∩ vbl(e2 ) 6= ∅, e1 6= e2 } .
Then Lin(Φ) is an undirected graph with maximum degree ∆ − 1.
Let σ and vit be the assignment and variable to update at time t respectively. Let G be
the subgraph of Lin(Φ) induced by vertex set V (G) = {C ∈ C | C|σ ∈ Ct }. Then by Item (3) of
Proposition 3.1, G is a connected subgraph of Lin(Φ).10 For any C|σ ∈ Ct with vit ∈ vbl(C), by
e = ℓ provided ℓ ≤ ⌈|Ct | /∆⌉.
Corollary 4.10 there exists a 2-tree Se ⊆ V (G) with C ∈ Se and size |S|
Define
S = 2-tree S ⊆ V Φ (|S| = ℓ) ∧ (∃C ∈ S, vit ∈ vbl(C)) .
Then by Lemma 4.7 and noticing there are at most d choices of C, we have
ℓ−1
ℓ−1
d · e (∆ − 1)2
d · e∆2
|S| ≤
≤
.
2
2
C (v) for any C| ∈ C and v ∈ (vbl(C) ∩ M) \ {v }.
By Item (2) of Proposition 3.1, σ(v) = σFalse
σ
t
it
Note that σ(v) is initialized as ⋆. Thus σ(v) must be updated before time t. Let UpdTime(v, t)
be the last update time of v before the t-th for iteration in Algorithm 5, and let Ev,C be the event
C (v) in the UpdTime(v, t)-th for iteration”. Recall the definition of ρ from
“σ(v) is updated to σFalse
11
(4), then we have
Pr [|Ct | ≥ ℓ · ∆] ≤ Pr [Ev,C , ∀C|σ ∈ Ct , ∀v ∈ (vbl(C) ∩ M) \ {vit }]
10
Actually Item (3) of Proposition 3.1 says the subgraph of Lin(Φ|σ ) induced by vertex set Ct is connected, which
implies G is a connected subgraph of Lin(Φ) as vbl(C|σ ) ⊆ vbl(C).
11
We remark that for the fourth inequality below, we do not assume any independence between Ev,C . We simply
use the chain rule of conditional probability in the order of time. For example, if UpdTime(v1 , t) < UpdTime(v2 , t),
then Pr [Ev1 ,C1 ∧ Ev2 ,C2 ] = Pr [Ev1 ,C1 ] · Pr [Ev2 ,C2 | Ev1 ,C1 ] and then we apply Item (2) of Lemma 4.4 twice.
19
h
i
e ∀v ∈ (vbl(C) ∩ M) \ {vit }
≤ Pr Ev,C , ∀C ∈ S,
X
≤
Pr [Ev,C , ∀C ∈ S, ∀v ∈ (vbl(C) ∩ M) \ {vit }]
S∈S
≤
X Y
Y
S∈S C∈S v∈(vbl(C)∩M)\{vi }
t
(by union bound)
C
(v))
min 1, β · Dv (σFalse
(since (vbl(C))C∈S are pairwise disjoint and by Item (2) of Lemma 4.4)
X
Y
Y
C
≤
(v))
β · Dv (σFalse
S∈S C∈S,vit ∈vbl(C)
/
v∈vbl(C)∩M
≤
X
S∈S
ρ|S|−1 ≤
ℓ−1
d
· e∆2 ρ
.
2
Now we obtain moment bounds for the running time of each for iteration in Algorithm 5.
Proof of the Efficiency Part of Proposition 4.5. Let Yt and Zt be the running time
of Line 4 and
2
Line 5-11 respectively. By Item (1) of Proposition 3.1, we have E Yt |Ct | = O (∆k |Ct | + dk)2 .
By Item (3) of Proposition 3.1, Item (1) of Proposition 3.3, and Item (1) of Proposition 3.4, we
also have
(
)!
!
2
2
2
(kQ
|C
|
+
Q)
(kQ
|C
|
+
Q)
t
t
E Zt |Ct | = O max 1, Q2 ,
=O
.
(1 − eα)2·|Ct |
(1 − eα)2·|Ct |
By Proposition 4.11, we have
ℓ−1 d
d
Pr [|Ct | ≥ ℓ · ∆] ≤ · e∆2 ρ
≤ ·
2
2
1
32
ℓ−1
for any integer ℓ ≥ 1.
Since Xt = Yt + Zt + O(1) and (a + b + c)2 ≤ 4 · (a2 + b2 + c2 ), we have
E
Xt2
=O 1+E
≤ O d2 k
2
Yt2
+
+E
+∞
X
Zt2
=O 1+
+∞
X
L=0
Pr [|Ct | = L] E
Yt2
+
Pr [|Ct | ≥ ℓ · ∆] · ∆ · O ∆2 k2 (∆(ℓ + 1))2 +
Zt2
|Ct | = L
!
(Qk∆(ℓ + 1))2
!
(1 − eα)2(ℓ+1)∆
(bucketing L ∈ [ℓ · ∆, (ℓ + 1) · ∆))
+∞
X
Pr [|Ct | ≥ ℓ · ∆] · ∆ · O ∆2 k2 (∆(ℓ + 1))2 + (Qk∆(ℓ + 1))2 · 16ℓ+1
≤ O d2 k2 +
ℓ=0
ℓ=0
(since eα ≥ 1/∆ and ∆ ≥ 2, we have (1 − eα)∆ ≥ 1/4)
+∞
X
1 ℓ−1 2 2
2 2
≤ O d k + O (d∆)
∆ k (∆(ℓ + 1))2 + (Qk∆(ℓ + 1))2 · 16ℓ+1
32
ℓ=1
= O d2 k2 + dk2 ∆5 + dk 2 ∆3 Q2 = O dk 2 ∆5 Q2 .
(since d ≤ ∆)
20
4.1.2
Concentration Bounds for the Coalescence
Here we analyze the coalescence part of Proposition 4.5, which is also the stopping condition for
the while iterations in Algorithm 4. We use information percolation argument and need additional
setup.
We follow the notation convention in Subsection 4.1.1: V = {v0 , . . . , vn−1 } and it = t mod n
for t ∈ {−T, . . . , −1}. UpdTime(v, t) is the last update time of v before time t, i.e.,
UpdTime(v, t) = max −T − 1, max t′ < t vit′ = v .
(7)
The additional −T − 1 is to set up the boundary condition corresponding to the initialization step.
Definition 4.12 (Extended Constraints). For any C ∈ C and e = {t1 , . . . , tm } ⊆ {−T, . . . , −1},
we say (e, C) is an extended constraint if the following holds.
(1) m = |vbl(C)| and vbl(C) = vit1 , vit2 , . . . , vitm .
(2) e = {UpdTime(v, tmax + 1) | v ∈ vbl(C)} where tmax = maxt′ ∈e t′ .
Intuitively an extended constraint of C is a consecutive rounds of updates in vbl(C).
Fact 4.13. The following holds for extended constraints.
(1) If (e1 , C1 ) and (e2 , C2 ) are two extended constraints and vbl(C1 )∩vbl(C2 ) = ∅, then e1 ∩e2 = ∅.
(2) If (e, C) is an extended constraint, then
(2a) 0 ≤ tmax (e) − tmin (e) < n, where tmin (e) = mint′ ∈e t′ ;
(2b) e = {UpdTime(v, t′ ) | v ∈ vbl(C)} for any tmax + 1 ≤ t′ ≤ tmin + n;
(2c) for any C ′ ∈ C, we have
′
e (e′ , C ′ ) is an extended constraint with e ∩ e′ 6= ∅ < 2 · vbl(C ′ ) .
Proof. Item (1) is evident from Item (1) of Definition 4.12.
For Item (2), we assume |vbl(C)| = m and vbl(C) = {va1 , . . . , vam }. Let
S = {−T, . . . , −1} ∩ {ai − j · n | i ∈ [m], j ∈ Z} = b1 , b2 , . . . , bT (C)
where −T ≤ b1 < · · · < bT (C) ≤ −1. Note that bi ≡ bi+m mod n for all i ∈ [T (C) − m]. If (e, C) is
an extended constraint, then by Item (2) of Definition 4.12, e consists of a consecutive interval of S,
i.e., e = {bo , bo+1 , . . . , bo+m−1 } for some o ∈ [T (C)−m+1]. Thus tmax (e)−tmin (e) = bo+m−1 −bo < n
(since bi ≡ bi+m mod n) which verifies Item (2a). Since either bo + n = bo+m or bo + n ≥ 0, we
know UpdTime(vai , t′ ) = UpdTime(vai , bo+m−1 + 1) for all i ∈ [m] and bo+m−1 + 1 ≤ t′ ≤ bo + n
which verifies Item (2b).
′
Now we prove Item (2c). By n
Item (1), assume
o without loss of generality vbl(C) ∩ vbl(C ) 6= ∅.
Let m′ = |vbl(C ′ )| and vbl(C ′ ) = va′1 , . . . , va′ ′ and define
m
n
o
S ′ = {−T, . . . , −1} ∩ a′i − j · n i ∈ [m′ ], j ∈ Z = b′1 , . . . , b′T (C ′ )
where −T ≤ b′1 < · · · < b′T (C ′ ) ≤ −1. Then similarly, we have b′i ≡ b′i+m′ mod n for all i ∈ [T (C ′ ) −
m′ ] and e′ = b′o′ , . . . , b′o′ +m′ −1 for some o′ ∈ [T (C ′ ) − m′ + 1]. Let imin = min {i ∈ [T (C ′ )] | b′i ∈ e}
and imax = max {i ∈ [T (C ′ )] | b′i ∈ e}. Then e ∩ e′ 6= ∅ iff imin ≤ o′ + m′ − 1 and imax ≥ o′ . Therefore
there are at most imax − imin + m′ choices of o′ . Since b′imin , b′imax ∈ e, by Item (2a) we know
b′imax − b′imin < n. Hence imax < imin + m′ . In all, there are at most 2m′ − 1 choices of e′ .
21
Definition 4.14 (Extended Hypergraph). Extended hypergraph H ext = (V ext , E ext ) has vertex set
V ext = {−T, . . . , −1} and extended constraints as edges:
E ext = {e ⊆ {−T, . . . , −1} | (e, C) is an extended constraint} .
Moreover, we label each edge e with C if it is added into E ext by extended constraint (e, C). We
allow multiple occurrence of the same edge but the labels are different.
Define σ0 as the final returned assignment in Algorithm 5. For each t ∈ {−T, . . . , −1}, let σt
be the assignment at Line 4 of the t-th for iteration in Algorithm 5. In particular σt (vit ) = ⋆ due
to Line 3.
Now we present the following algorithm informally described in Subsection 1.3 to sequentially
find constraints that are not satisfied during the BoundingChain process. We remark that this
algorithm is only for our analysis, and we do not run it during AtomicCSPSampling.
Algorithm 6: Find failed constraints during the BoundingChain process
Input: assignments (σt )t∈{−T,...,0} defined above and some u ∈ M with σ0 (u) = ⋆
Output: H ′ = (V ′ , E ′ ) where V ′ ⊆ V ext and E ′ ⊆ E ext
1 Set t0 ← UpdTime(u, 0) and initialize V ′ ← {t0 } , E ′ ← ∅
2 FailedConstraints(t0)
3 return (V ′ , E ′ )
4 Procedure FailedConstraints(t):
5
if t < −T + n − 1 then return
/* (vit ∈ M) ∧ (Tokent = False) */
6
Initialize Vt ← {vit } and Ct ← ∅
7
while ∃C|σt ∈ C|σt \ Ct with vbl(C|σt ) ∩ Vt 6= ∅ do
8
e ← {UpdTime(v, t + 1) | v ∈ vbl(C)}
/* (e, C) is an extended constraint */
9
Update Ct ← Ct ∪ {C|σt } and Vt ← Vt ∪ vbl(C|σt )
10
Update E ′ ← E ′ ∪ {e} and V ′ ← V ′ ∪ e
/* e is labeled by C */
11
end
12
foreach v ∈ (Vt ∩ M) \ {vit } do
13
FailedConstraints(UpdTime(v, t))
14
end
15 end
We have the following observation regarding Algorithm 6.
Lemma 4.15. Algorithm 6 halts always. Furthermore, if T ≥ 2n − 1 then
(1) for each (e, C) from Line 8 when we execute FailedConstraints(t),
(1a) it is an extended constraint,
C (v) in the UpdTime(v, t+
(1b) for each v ∈ vbl(C), the assignment on v is updated to ⋆ or σFalse
1)-th for iteration in Algorithm 5;
(2) each time we call FailedConstraints(t), t is already in V ′ ;
(3) H ′ is a connected sub-hypergraph of H ext ;
(4) there exists some e0 , e1 ∈ E ′ such that tmax (e0 ) ≥ −n and tmin (e1 ) < −T + n − 1.
22
Proof. Since UpdTime(v, t) < t for all v ∈ V and t ∈ {−T, . . . , 0}, Algorithm 6 always halts.
We prove Item (1) by induction on the calls of FailedConstraints(t). The first call t0 represents the final update of the assignment on u ∈ M, which results in σ0 (u) = ⋆.
• Item (1a) for t0 . Note that 0 > t0 ≥ 0−n ≥ −T +n−1. Then UpdTime(v, t0 +1) = t0 ≥ −T
for v = vit0 ; and UpdTime(v, t0 + 1) ≥ −T for all v 6= vit0 . This means −T − 1 ∈
/ e and thus
(e, C) is an extended constraint.
• Item (1b) for t0 . Since C|σt0 ∈ C|σt0 at Line 7, we know C(σt0 ) = False and, by (6), σt0 (v) ∈
C
σFalse (v), ⋆ for each v ∈ vbl(C). This means, if v 6= vit0 , the assignment on v is updated
to such value in the UpdTime(v, t0 ) = UpdTime(v, t0 + 1)-th for iteration in Algorithm 5.
Meanwhile if v = vit0 , then the assignment on v is updated to ⋆ in this t0 = UpdTime(v, t0 +1)th for iteration, resulting in σ0 (v) = ⋆.
To complete the induction, we note that each later call of FailedConstraints relies on some v from
Line 13 when we execute some FailedConstraints(UpdTime(v, t)). This means v ∈ M \ {vit } and
σt (v) = ⋆ and t ≥ −T +n−1. Thus the assignment on v is updated to ⋆ in the −T ≤ UpdTime(v, t)th for iteration in Algorithm 5. Then the argument above also goes through with almost no change.
For Item (2), note that V ′ is initialized as {t0 }. Upon Line 13, we have UpdTime(v, t) =
UpdTime(v, t + 1) since v 6= vit , which has been added into V ′ by Line 10 earlier.
Now we turn to Item (3). By Item (1), E ′ ⊆ E ext . Meanwhile by Line 10, E ′ are edges over
vertex set V ′ . Thus it suffices to show H ′ = (V ′ , E ′ ) is connected. By Item (2), we only need
to show vertices added during the while iterations in FailedConstraints(t) is connected to t.
This can be proved by induction: When we find C|σt satisfying the condition at Line 7, fix some
v ′ ∈ vbl(C|σt ) ∩ Vt . Then for the edge e constructed in the next step, each UpdTime(v, t + 1) ∈ e is
connected to UpdTime(v ′ , t + 1) ∈ e. Note that either (A) v ′ = vit and thus UpdTime(v ′ , t + 1) = t,
or (B) UpdTime(v ′ , t + 1) was added into V ′ in an earlier time and connected to t. Hence each
UpdTime(v, t + 1) ∈ e is connected to t as desired.
Finally we prove Item (4). Each time we call FailedConstraints(t), it implies the assignment
on vit ∈ M is updated to ⋆ in the t-th for iteration in Algorithm 5. This means Tokent = False
and SafeSampling is performed in Algorithm 5. By comparing Algorithm 1 and FailedConstraints,
we know vit is connected by falsified constraints to some v ∈ (Vt ∩ M) \ {vit } that σt (v) = ⋆. This
implies at least one round of Line 13 here will be executed. Thus the recursion of Algorithm 6
continues until t < −T + n − 1. By Item (2), there exists some t1 ∈ V ′ with t1 < −T + n − 1.
Meanwhile t0 ∈ V ′ and t0 ≥ −n. Hence by Item (3), there exists e0 , e1 ∈ E ′ such that t0 ∈ e0 and
t1 ∈ e1 ; and tmax (e0 ) ≥ t0 ≥ −n, tmin (e1 ) ≤ t1 < −T + n − 1.
Finally we complete the proof of Proposition 4.5.
Proof of the Coalescence Part of Proposition 4.5. Since σ0 is defined as the final returned assignment, we know event E (Defined in Proposition 4.5) is “there exists some u ∈ M such that
σ0 (u) = ⋆”. Using Algorithm 6 we obtain H ′ = (V ′ , E ′ ). By Fact 2.4 and Item (3) (4) of
e1 , C
e2 , . . . , C
eℓ such that
Lemma 4.15, there exists a path ee1 , ee2 , . . . , eeℓ ∈ E ′ ⊆ E ext with labels C
(a) eei ∩ eei+1 6= ∅ and thus tmin (e
ei ) ≤ tmax (e
ei+1 ) for all i ∈ [ℓ − 1];
(b) eei ∩ eej = ∅ for all i, j ∈ [ℓ] with |i − j| > 1;
(c) −n ≤ tmax (e
e1 ) < 0;
(d) −T ≤ tmin (e
eℓ ) < −T + n − 1.
23
Meanwhile
tmax (e
e1 ) = tmax (e
eℓ ) +
ℓ−1
X
i=1
(tmax (e
ei ) − tmax (e
ei+1 ))
≤ tmin (e
eℓ ) + n − 1 +
ℓ−1
X
i=1
(tmin (e
ei ) + n − 1 − tmax (e
ei+1 ))
(by Item (2a) of Fact 4.13)
≤ (−T + n − 2) + ℓ · (n − 1).
(by Item (a) (d) above)
Thus by Item (c) above, we have ℓ ≥ −2 + ⌈T /(n − 1)⌉. For convenience we truncate the tail of the
path so that ℓ is the largest even number no more than −2+⌈T /(n−1)⌉. Thus ℓ ≥ −3+⌈T /(n−1)⌉.
We remark that since the tail is truncated, Item (d) above is not necessarily true now.
Now for any fixed path e1 , . . . , eℓ ∈ E ext with labels C1 , . . . , Cℓ satisfying Item (a) (b) (c) above,
we bound the probability that this path, denoted by P , appears in H ′ . Assume
ℓ/2
Y
i=1
|vbl(C2i−1 )| ≤
ℓ/2
Y
i=1
|vbl(C2i )|
(8)
and the other case can be analyzed similarly. For each t ∈ {−T, . . . , −1} and C ∈ C with vit ∈ vbl(C)
C (v) in the t-th for iteration
we define Et,C as the event “ the assignment on vit is updated to ⋆ or σFalse
V
in Algorithm 5”. By Item (1b) of Lemma 4.15, ei , labeled by Ci , appearing in H ′ implies t∈ei Et,Ci
happens. Recall the definition of λ from (5), then we have12
Pr P appears in H ′ ≤ Pr e2i appears in H ′ with label C2i for all i ∈ [ℓ/2]
ℓ/2
ℓ/2
^
^
^
^
≤ Pr
Et,C2i = Pr
Et,C2i
i=1 t∈e2i
≤
ℓ/2
Y
i=1 t∈e2i ,vit ∈M
Y
i=1 v∈vbl(C2i )∩M
C
(v)) + (β − 1)(|Ωv | − 2)
β · Dv (σFalse
(since (e2i )i∈[ℓ/2] are pairwise disjoint and by Item (2) of Lemma 4.4)
≤
ℓ/2
Y
i=1
(4∆ |vbl(C2i )|)
−2
≤
ℓ
Y
i=1
(4∆ |vbl(Ci )|)−1 .
(since ∆2 λ ≤ 1/16 and by (8))
P
Now it suffices to union bound over all possible paths, i.e., Pr [E] ≤ P Pr [P appears in H ′ ]
where P is some fixed path e1 , . . . , eℓ ∈ E ext with labels C1 , . . . , Cℓ satisfying Item (a) (b) (c) above.
First by Item (c), there are at most n possible tmax (e1 ). Then by Item (1) of Fact 4.13 there are
at most d possible C1 given tmax (e1 ). By Item (2) of Definition 4.12 this determines (e1 , C1 ). Now
given (e1 , C1 ), we enumerate the rest of the path by a rooted tree T (e1 , C1 ) constructed as follows:
• T (e1 , C1 ) has depth 2(ℓ − 1) and the root is labeled with (e1 , C1 ).
• Given a node z with label (ei , Ci ), i ∈ [ℓ − 1], we construct the next two layers differently.
– For each Ci+1 ∈ C with vbl(Ci ) ∩ vbl(Ci+1 ) 6= ∅, we create a child node z ′ with label Ci+1
and link to z.
12
We remark that for the third inequality below, we do not assume any independence between Et,C . For example,
if t1 < t2 , then Pr [Et1 ,C4 ∧ Et2 ,C2 ] = Pr [Et1 ,C4 ] · Pr [Et2 ,C2 | Et1 ,C4 ] and then we apply Item (2) of Lemma 4.4 twice.
24
(z ′ has at most ∆ possibilities.)
– For each z ′ , assume its label is Ci+1 . We find some ei+1 such that ei ∩ ei+1 6= ∅
and (ei+1 , Ci+1 ) is an extended constraint, and then create a child node z ′′ with label (ei+1 , Ci+1 ) and link to z ′ .
(z ′′ , given z ′ , has at most 2 · |vbl(Ci+1 )| possibilities by Item (2c) of Fact 4.13.)
– We move to each z ′′ and repeat the construction.
Each leaf of T (e1 , C1 ) represents a path P which satisfies Item (a) (c) already, and
• either, P does not satisfy Item (b) and thus does not contribute to the union bound;
Q
• or, P is valid and Pr [P appears in H ′ ] ≤ ℓi=1 (4∆ |vbl(Ci )|)−1 as above.
Now we put weight on each internal node z of T (e1 , C1 ) as follows:
√
• If z has label (e, C), then its weight is w(z) = ( 2∆)−1 .
√
−1
.
• Otherwise z has label C, then its weight is w(z) = 2 2 · |vbl(C)|
Q
This means Pr [P appears in H ′ ] ≤ (4∆)−1 internal node z in P w(z) for each valid P in T (e1 , C1 )
where the (4∆)−1 factor is because there are only ℓ − 1 internal nodes for each case along the path.
Thus
Y
X
X
w(z)
Pr P appears in H ′ ≤
(4∆)−1
valid P in T (e1 ,C1 )
internal node z in P
valid P in T (e1 ,C1 )
−1
≤ (4∆)
X
Y
w(z)
P in T (e1 ,C1 ) internal node z in P
√
≤ ( 2)−2(ℓ−1) /(4∆) = 2−ℓ /(2∆),
√
where the last inequality can be proved by induction on the depth and noticing ( 2 · w(z))−1 is at
least the number of child nodes of z for each internal node z ∈ T (e1 , C1 ).
Putting everything together, we have
X
X
Pr [E] ≤
Pr P appears in H ′
valid (e1 ,C1 ) valid P in T (e1 ,C1 )
≤ nd · 2−ℓ /(2∆)
T
2− n−1
≤ n·2
4.2
((e1 , C1 ) has at most nd choices)
− Tn
≤ 4n · 2
(since ℓ ≥ −3 + ⌈T /(n − 1)⌉ and d ≤ ∆)
.
The Distribution after BoundingChain Subroutines
Recall in AtomicCSPSampling(Φ, M) (Algorithm 4), we keep doubling T and performing the corresponding BoundingChain(Φ, M, −T, r−T , . . . , r−1 ) until the returned assignment has no ⋆ on M.
Thus before we present the FinalSampling subroutine, we pause for now to analyze these BoundingChain calls in a whole.
Definition 4.16 (Projected Distribution). Given an atomic CSP Φ = V, (Ωv , Dv )v∈V , C and a
Q
V \M
and define the projected distribution µM ∈ RΛ by
marking M ⊆ V , let Λ =
v∈M Ωv × {⋆}
for all σ ∈ Λ.
µM (σ) = Pr σ ′ (M) = σ(M)
σ′ ∼µC
True
25
We will show the distribution after all BoundingChain subroutines is exactly µM .
Proposition 4.17. If eα∆ ≤ 1, e∆2 ρ ≤ 1/32, and ∆2 λ ≤ 1/16, then the while iterations in
AtomicCSPSampling(Φ, M) halts almost surely and the final assignment σ
e has distribution µM .
We introduce and analyze the following SystematicScan(Φ, M, σin, L, R, rL , . . . , rR ) algorithm,
then show it couples with BoundingChain.
Algorithm 7: The SystematicScan algorithm
Input: an atomic CSP Φ = V, (Ωv , Dv )v∈V , C , a marking M ⊆ V , an assignment
Q
V \M
, a starting time L, a stopping time R, and
σin ∈
v∈M Ωv × {⋆}
randomness tapes rL , .Q
. . , rR .
V \M
Output: an assignment σ ∈
.
v∈M Ωv × {⋆}
1 Initialize σ ← σin
2 for t = L to R do
/* Assume V = {v0 , . . . , vn−1 } */
3
it ← t mod n, and σ(vit ) ← ⋆
/* Update σ(vit ) in this round */
4
Φt = Vt , (Ωv |σ , Dv |σ )v∈Vt , Ct ← Component(Φ, M, σ, vit )
/* Ignore the returned Token since it is always True here */
5
if vit ∈ V \ M then Continue
6
else
7
σ ′ ← RejectionSampling(Φt , rt )
8
Update σ ← σ ′ (vit )
9
end
10 end
11 return σ
4.2.1
Convergence of SystematicScan
We first show SystematicScan converges to µM . We set up basic
chains.
for Markov
Q notations
V \M
Let Λ be a finite state space; for our purpose it will P
be
Ω
×
{⋆}
.
We
view any
v
v∈M
distribution µ over Λ as a horizontal vector in RΛ where a∈Λ µ(a) = 1 and µ(a) ≥ 0 holds for all
a ∈ Λ. We denote 1a ∈ RΛ as the point distribution of a ∈ Λ, i.e., 1a (b) = 1 iff a = b.
Λ×Λ
A Markov chain (Xt )t≥0 over
P Λ is given by its transition matrices (Pt )t≥0 where each Pt ∈ R
has non-negative entries and b∈Λ Pt (a, b) ≡ 1 for all a ∈ Λ. Then Xt = X0 P0 P1 · · · Pt−1 where
X0 ∈ RΛ is the starting distribution. In particular,
• if Pt ≡ P for all t ≥ 0, then (Xt )t≥0 is a time homogeneous Markov chain given by P;
• if (Pt )t≥0 are possibly different, then (Xt )t≥0 is a time inhomogeneous Markov chain.
Assume (Xt )t≥0 is a time homogeneous Markov chain over Λ given by transition matrix P. We
say P is
• irreducible if for any a, b ∈ Λ, there exists some integer t ≥ 0 such that Pt (a, b) > 0;
• aperiodic if for any a ∈ Λ, gcd integer t > 0 Pt (a, a) > 0 = 1;13
• stationary with respect to distribution µ if µP = µ;
13
gcd stands for greatest common divisor.
26
• reversible with respect to distribution µ if µ(a)P(a, b) = µ(b)P(b, a) holds for all a, b ∈ Λ.
Here we note the following two classical results.
Fact 4.18 (e.g., [LP17, Proposition 1.20]). If P is reversible with respect to µ, then P is also
stationary with respect to µ.
Theorem 4.19 (The Convergence Theorem, e.g., [LP17, Theorem 4.9]). Suppose (Xt )t≥0 is an
irreducible and aperiodic time homogeneous Markov chain over finite state space Λ with stationary
distribution µ and transition matrix P. Then for any X0 , we have limt→+∞ Xt = µ.14
Now we turn to SystematicScan and follow the notation
in Algorithm 7: V =
Q convention
V \M
×
{⋆}
.
Ω
{v0 , . . . , vn−1 } and it = t mod n. Recall we also fix Λ =
v∈M v
Definition 4.20 (One-step Transition Matrix). For any i ∈ {0, . . . , n − 1}, define the one-step
transition matrix on vi ∈ V as Pi ∈ RΛ×Λ where
Pi (σ1 , σ2 ) = Pr σ ′ (M) = σ2 (M) σ ′ (M \ {vi }) = σ1 (M \ {vi }) .
σ′ ∼µC
True
We first prove some useful facts and also connect one-step transition matrices to SystematicScan.
Proposition 4.21. If eα∆ ≤ 1, then the following holds.
(1) Each Pi is well-defined.
(2) For any i ∈ {0, . . . , n − 1} and σ1 , σ2 ∈ Λ, we have Pi (σ1 , σ2 ) ≡ 0 if σ1 (M \ {vi }) 6= σ2 (M \
{vi }); and Pi (σ1 , σ2 ) > 0 if otherwise.
(3) µM Pi1 · · · Pim = µM holds for any sequence i1 , i2 , . . . , im ∈ {0, . . . , n − 1} of finite length m.
(4) For any L ≤ R and σin , SystematicScan(Φ, M, σin , L, R, rL , . . . , rR ) halts almost surely over
random rL , . . . , rR and its output distribution is µ = 1σin PiL PiL+1 · · · PiR .
Proof. First we prove Item (1) (2). For any fixed i ∈ {0, . . . , n − 1} and σ1 ∈ Λ, define assignment
σ by setting σ(V \ {vi }) = σ1 (V \ {vi }) and σ(vi ) = ⋆. Then for any σ2 ∈ Λ we have
Pi (σ1 , σ2 ) = Pr σ ′ (M) = σ2 (M) ,
C|
σ
σ′ ∼µTrue
C|
σ
being well-defined, which
where C|σ is defined in Definition 2.5. Thus Item (1) is equivalent to µTrue
′
follows from Proposition 3.3. Since σ(M) has no ⋆, σ (M \ {vi }) ≡ σ(M \ {vi }) ≡ σ1 (M \ {vi }).
Thus the first part of Item (2) follows naturally. As for the second part, if σ1 (V \{vi }) = σ2 (V \{vi }),
then
Prσ′ ∼µC [σ ′ (M) = σ2 (M)]
True
.
(9)
Pi (σ1 , σ2 ) =
Prσ′ ∼µC [σ ′ (M \ {vi }) = σ1 (M \ {vi })]
True
C|σ
Thus Pi (σ1 , σ2 ) > 0 iff the enumerator is positive, which is equivalent to µTrue2 being well-defined
and is, again, guaranteed by Proposition 3.3.
Now we prove Item (3). Since µM Pi1 · · · Pim = µM Pi1 Pi2 · · · Pim , it suffices to show for
m = 1 and then apply induction. By Fact 4.18, it suffices to show for each i ∈ {0, . . . , n − 1}, Pi is
reversible with respect to µΦ|π , i.e.,
µM (σ1 )Pi (σ1 , σ2 ) = µM (σ2 )Pi (σ2 , σ1 ) for any σ1 , σ2 ∈ Λ.
14
The convergence is entry-wise.
27
(10)
By Item (2), we may safely assume σ1 (M \ {vi }) = σ2 (M \ {vi }) and observe that
µM (σ1 )Pi (σ1 , σ2 )
=
=
Pr
σ′ ∼µC
True
Pr
σ′ ∼µC
True
M
σ (M) = σ1 (M) ·
′
σ (M) = σ2 (M) ·
′
Prσ′ ∼µC [σ ′ (M) = σ2 (M)]
True
(by (9))
Prσ′ ∼µC [σ ′ (M \ {vi }) = σ1 (M \ {vi })]
True
Prσ′ ∼µC [σ ′ (M) = σ1 (M)]
True
Prσ′ ∼µC [σ ′ (M \ {vi }) = σ2 (M \ {vi })]
True
= µ (σ2 )Pi (σ2 , σ1 ).
Finally we turn to Item (4). By induction on R, it suffices to verify for L = R = t ∈ Z that
1σin Pit = Pit (σin , ·) is the distribution of σout ← SystematicScan(Φ, M, σin , t, t, rt ). Similarly as
above, define assignment σ
e by setting σ
e(V \ {vit }) = σin (V \ {vit }) and σ
e(vit ) = ⋆. Then for each
σ
b ∈ Λ we have
Pit (σin , σ
b) = Pr σ ′ (M) = σ
b(M) .
C|
σ
e
σ′ ∼µTrue
Now we consider two separate cases.
e(M). Thus Pit (σin , σ
b) equals 1 if σ
b = σin ; and equals 0 if
• vit ∈ V \ M. Then σ ′ (M) ≡ σ
otherwise. This agrees with σout ≡ σin from the algorithm.
• vi ∈ M. Let Φt = Vt , (Ωv |σe , Dv |σe )v∈Vt , Ct be from Line 4. Then by Item (2) of Proposition 3.3,
we have
(
Prσ′ ∼µCt [σ ′ (vit ) = σ
b(vit )] σ
b(V \ {vit }) = σ
e(V \ {vit }),
True
Pr [σout = σ
b] =
0
otherwise.
By Item (4) of Proposition 3.1, Prσ′ ∼µCt [σ ′ (vit ) = σ
b(vit )] = Pr
True
Thus
Pr [σout = σ
b] =
=
(
Pr
C|
σ
e
σ′ ∼µTrue
0
Pr
C|
σ
e
σ′ ∼µTrue
[σ ′ (vit ) = σ
b(vit )]
C|
σ
e
σ′ ∼µTrue
[σ ′ (vit ) = σ
b(vit )].
σ
b(V \ {vit }) = σ
e(V \ {vit }),
otherwise
σ ′ (M) = σ
b(M) (since σ
b ∈ Λ and σ ′ (M \ {vit }) ≡ σ
e(M \ {vit }))
= Pit (σin , σ
b).
Item (4) of Proposition 4.21 shows SystematicScan is a time inhomogeneous Markov chain.
General theory regarding time inhomogeneous Markov chains can be much more complicated but
luckily we can embed this one into a time homogeneous Markov chain.
Lemma 4.22. Assume eα∆ ≤ 1. Let L and σin ∈ Λ be arbitrary. Define µR = 1σin PiL · · · PiR .
Then limR→+∞ µR = µM .
Proof. Let F = PiL · · · PiL+n−1 . Since it = t mod n, the one-step transition matrices repeatedly
applied to 1σin has period n. Hence µR = 1σin Fm if R = L + m · n − 1 and m ≥ 1. Let Y0 = 1σin
and Yt = Y0 Fi for t ≥ 1, then (Yt )t≥0 is a time homogeneous Markov chain with transition matrix
F. Here we verify the following properties of F.
• Stationary with respect to µM . This follows immediately from Item (3) of Proposition 4.21.
28
• Aperiodic. By Item (2) of Proposition 4.21, for any i ∈ {0, . . . , n − 1} and any σ ∈ Λ we
have Pi (σ, σ) > 0. Thus F(σ, σ) > 0 which implies F is aperiodic.
• Irreducible. Let σ1 , σ2 ∈ Λ be arbitrary. For each j ∈ {0, . . . , n}, define σ j ∈ Λ by
(
σ2 (vi′ ) i′ ∈ {iL , . . . , iL+j−1 } ,
j
σ (vi′ ) =
σ1 (vi′ ) otherwise.
Then σ1 = σ 0 and σ2 = σ n . Moreover PiL+j (σ j , σ j+1 ) > 0 for all j ∈ {0, . . . , n − 1} by Item
(2) of Proposition 4.21. Thus F(σ1 , σ2 ) ≥ PiL (σ 0 , σ 1 )PiL+1 (σ 1 , σ 2 ) · · · PiL+n−1 (σ n−1 , σ n ) > 0.
Therefore by Theorem 4.19, limm→+∞ µL+m·n−1 = limt→+∞ Yt = µM . Since each Pi is stationary
with respect to µM by Item (3) of Proposition 4.21, for any finite integer o ≥ 0
lim µL+m·n−1+o =
lim µL+m·n−1 PiL · · · PiL+o−1 = µM PiL · · · PiL+o−1 = µM .
m→+∞
m→+∞
Hence limR→+∞ µR = µM .
4.2.2
Coupling from the Past and the Bounding Chain
We have showed SystematicScan converges to distribution µM , but to obtain a sample distributed
exactly according to µM we need to run for infinite time. The trick for making it finite is to
think backwards. That is the idea of coupling from the past [PW96]; then the bounding chain
[Hub98, HN99] is used to make the process more computationally efficient.
Let P ∈ RΛ×Λ be some transition matrix. We say f : Λ × [0, 1] → Λ is a coupling of P if for
all a, b ∈ Λ, Prr∼[0,1] [f (a, r) = b] = P(a, b). We use random function f r : Λ → Λ to denote the
coupling f with randomness r, i.e., f r (a) = f (a, r) for all a ∈ Λ.
Recall our definition of Pi from Definition 4.20 and it = t mod n from Algorithm 7.
Lemma 4.23 (Coupling from the Past). Let ft : Λ × [0, 1] → Λ be a coupling of Pit for all t ∈ Z.
Define random functions FL,R : Λ → Λ over random (rt )t∈Z for −∞ < L ≤ R < +∞ as
rR−1
· · · fLrL (a) · · ·
for all a ∈ Λ.
FL,R (a) = fRrR fR−1
Let M ≥ 1 be the smallest integer such that F−M,−1 is a constant function. Let A = F−M,−1 (Λ) be
the corresponding constant. Then F−M ′ ,−1 (Λ) ≡ A for any M ′ > M , and A is distributed as µM if
M < +∞ almost surely.
Proof. Since Pi = Pi+n , for any integer ℓ ≥ 1 and all a, b ∈ Λ we have
ℓ ⊤
Pr [F−ℓ·n,−1 (a, b)] = 1a (P−n P−n+1 · · · P−1 )ℓ 1⊤
b = 1a (P0 P1 · · · Pn−1 ) 1b = Pr [F0,ℓ·n−1 (a, b)] .
Thus for any a, b ∈ Λ, by Lemma 4.22 we have
lim Pr [F−ℓ·n,−1 (a) = b] = lim Pr [F0,ℓ·n−1 (a) = b] = µM (b).
ℓ→+∞
ℓ→+∞
On the other hand for M ′ > M , F−M ′ ,−1 (Λ) = F−M,−1 F−M ′ ,−M −1 (Λ) = A. If M < +∞ almost
surely, then we have
Pr [A = b] = lim Pr [F−ℓ·n,−1 (a) = b] = µM (b).
ℓ→+∞
29
Therefore to obtain a perfect sample from µM we only need to (1) design a coupling, then (2)
sample randomness tapes (rt )t≥0 , and lastly (3) find some M ′ ≥ 1 such that F−M ′ ,−1 is a constant
function and output the corresponding constant. Now we show in the following Algorithm 8 that
Algorithm 5 implicitly provides a coupling for (1) and an efficient way to check (3).
Q
Q
V \M
V \M
⋆
. We define Λ⋆ =
. Then we construct
Recall Λ =
v∈M Ωv × {⋆}
v∈M Ωv × {⋆}
⋆
⋆
gi : Λ × [0, 1] → Λ for each i ∈ {0, . . . , n − 1} as follows. We also remark that this coupling is
information-theoretic and is for analysis only.
Algorithm 8: A coupling gi : Λ⋆ × [0, 1] → Λ⋆ for each i ∈ {0, . . . , n − 1}
Input: an assignment σin ∈ Λ⋆ and a randomness tape r ∈ [0, 1].
Output: an assignment σout ∈ Λ⋆ .
Data: an atomic CSP Φ = V, (Ωv , Dv )v∈V , C , a marking M ⊆ V , and an index
i ∈ {0, . . . , n − 1}. Assume V = {v0 , . . . , vn−1 }.
1 Initialize σout ← σin and divide r into two independent parts r1 , r2
2 if vi ∈ V \ M then return σout
3 Recall β = β(Φ, M) from (3) and define distribution Dv⋆i over Ω⋆
vi by setting
(
max {0, 1 − β · (1 − Dvi (q))}
P
Dv⋆i (q) =
1 − q′ ∈Ωv Dv⋆i (q ′ )
i
4
5
6
7
8
Dv⋆i
q ∈ Ωvi ,
q = ⋆.
Sample c1 ∼
using r1 and update σout (vi ) ← c1
if c1 6= ⋆ then return σout
(Φ′ , Token) ← Component(Φ, M, σout , vi ) where Φ′ = V ′ , (Ωv |σout , Dv |σout )v∈V ′ , C ′
if Token = False then return σout
Define distribution Dv†i over Ωvi by setting Dv†i (q) = Prσ′ ∼µC′ [σ ′ (vi ) = q] for q ∈ Ωvi .
True
9
10
11
Define distribution
Dv′ i
over Ωvi by setting
Dv′ i (q)
=
Sample c2 ∼ Dv′ i using r2 and update σout (vi ) ← c2
return σout
Dv†i (q)−Dv⋆i (q)
Dv⋆i (⋆)
for q ∈ Ωvi
We say σ1 ∈ σ2 for some σ1 , σ2 ∈ Λ⋆ if σ2 (v) ∈ {σ1 (v), ⋆} for all v ∈ V .
Fact 4.24. When σ1 , σ2 ∈ Λ, σ1 ∈ σ2 iff σ1 = σ2 .
Now we verify the following properties of Algorithm 8.
Proposition 4.25. If eα∆ ≤ 1 then the following holds for gi : Λ⋆ × [0, 1] → Λ⋆ from Algorithm 8.
(1) All the distributions in Algorithm 8 are well-defined.
(2) gi (σ1 , r) ∈ gi (σ2 , r) holds for any r ∈ [0, 1] and σ1 , σ2 ∈ Λ⋆ with σ1 ∈ σ2 .
(3) gi restricted on Λ is a coupling of Pi .
(4) For any t ≡ i mod n, gi is the same update procedure as the t-th for iteration in Algorithm 5.
Proof. First we prove Item (1). Note that Dv⋆i is the same one as in SafeSampling(Φ, M, vi, ·).
Thus it is well-defined by Item (1) of Proposition 3.4. Now we assume Token = True to check Dv†i
′
and Dv′ i . By Item (2) of Proposition 3.3, Dv†i is well-defined since µCTrue is well-defined. By Item (3)
of Proposition 3.4, Dv⋆i (q) ≤ Dv†i (q) for any q ∈ Ωvi . Hence Dv′ i is also well-defined.
For Item (2), we have the following cases, each of which satisfies gi (σ1 , r) ∈ gi (σ2 , r).
30
• c1 6= ⋆. Then both σ1 (vi ) and σ2 (vi ) are updated to c1 .
• c1 = ⋆ and Token = False for σ2 . Then σ2 (vi ) is updated to ⋆.
• c1 = ⋆ and Token = True for σ2 . Since σ1 ∈ σ2 , Token also equals True for σ1 . Moreover they
get the same CSP from Line 6. Thus σ1 (vi ) and σ2 (vi ) are updated to the same value c2 .
Item (3) is obviously true if vi ∈ V \ M thus we assume vi ∈ M. Observe that Dv†i (q) =
Dv′ i (q) · Dv⋆i (⋆) + Dv⋆i (q) for any q ∈ Ωvi . Then Item (3) follows from Item (4) of Proposition 4.21
with L = R = i.
Finally we prove Item (4). Assume vi ∈ M since otherwise it is trivial. To obtain the pseudocode
in Algorithm 5, we reorganize Algorithm 8 by first set σout ← ⋆ and call Component(Φ, M, σout, vi ),
then based on the value of Token we either (A) sample c1 only then update σout (vi ) ← c1 , or (B)
sample both c1 and c2 then update σout (vi ) ← c1 if c1 6= ⋆; and σout (vi ) ← c2 if otherwise. The
former is SafeSampling, and the latter, executed jointly, is exactly RejectionSampling as we analyzed
for Item (3).
One more ingredient we need is the following well-known Borel-Cantelli theorem.
Theorem 4.26
P (Borel-Cantelli Theorem, e.g., [GS01, Section 7.3]). Let T be a non-negative random
variable. If +∞
i=0 Pr [T > i] < +∞ then T < +∞ almost surely.
Now we are ready to prove Proposition 4.17.
Proof of Proposition 4.17. Assume the while iterations in Algorithm 4 stop at T = TFinal or TFinal =
+∞ if iterations never end. Since each iteration halts almost surely by Proposition 4.5, TFinal is
well-defined almost surely.
Let V = (v0 , . . . , vn−1 ) and define it = t mod n for all t ∈ Z as in Algorithm 5. Define random
functions GL,R : Λ⋆ → Λ⋆ over random (rt )t∈Z for −∞ < L ≤ R < +∞ as
rR−1
rR
GL,R (a) = gR
gR−1
for all a ∈ Λ⋆ ,
· · · gLrL (a) · · ·
where gt = git : Λ⋆ × [0, 1] → Λ⋆ is from Algorithm 8 and gtr (·) = gt (·, r).
Let M1 ≥ 1 be the smallest integer such that G−M1 ,−1 is a constant function on Λ, i.e.,
G−M1 ,−1 (Λ) ≡ A. By Item (3) of Proposition 4.25 and Lemma 4.23, G−M1′ ,−1 (Λ) ≡ A for all
M1′ ≥ M1 , and A is distributed as µM if M1 < +∞ almost surely.
Let M2 ≥ 1 be the smallest integer such that G−M2 ,−1 (⋆V ) ∈ Λ. Iteratively applying Item
(2) of Proposition 4.25, we know G−M2 ,−1 (Λ) ∈ G−M2 ,−1 (⋆V ). Then by Fact 4.24, G−M2 ,−1 (Λ) is
constant. Thus M2 ≥ M1 and G−M2 ,−1 (⋆V ) = A.
By Item (4) of Proposition 4.25, BoundingChain(Φ, M, −T, r−T , . . . , r−1 ) equals G−T,−1 (⋆V ),
which means TFinal ≥ M2 . Thus the final assignment σ
e equals A and has distribution µM provided
TFinal < +∞ almost surely.
Now we only need to show TFinal < +∞ almost surely. Note that either TFinal = 1 or, by
Algorithm 4, BoundingChain(Φ, M, −TFinal /2, r−TFinal /2 , . . . , r−1 ) = G−TFinal /2,−1 (⋆V ) ∈
/ Λ. Thus
TFinal ≤ 2 · M2 , which means it suffices to show M2 < +∞ almost surely. By Item (3) of
Proposition 4.25 and the analysis above, G−i,−1 (⋆V ) = A ∈ Λ for all i ≥ M2 ; thus
+∞
X
i=0
Pr [M2 > i] ≤ 2n − 1 +
+∞
X
i=2n−1
Pr G−i,−1 (⋆V ) ∈
/Λ
31
= 2n − 1 +
≤ 2n − 1 +
+∞
X
i=2n−1
+∞
X
i=2n−1
Pr [BoundingChain(Φ, M, −i, r−i , . . . , r−1 ) ∈
/ Λ]
i
4n · 2− n < +∞,
(by Proposition 4.5)
as desired by Theorem 4.26.
4.3
The FinalSampling Subroutine
Finally we give the missing FinalSampling(Φ, M, e
σ) subroutine, which simply completes the assignment on V \ M for σ
e.
Algorithm 9: The FinalSampling subroutine
Input: an atomic CSP Φ = V, (Ωv , Dv )v∈V , C , a marking M ⊆ V , and an assignment
Q
V \M
σ
e∈
v∈M Ωv × {⋆}
C
Output: an assignment σ ∈ σTrue
e, v) for all v ∈ V
1 Φv = Vv , (Ωv |σ
e , Dv |σ
e )v∈Vv , Cv ← Component(Φ, M, σ
/* Ignore the returned Token since it is always True here */
2 Initialize V ′ ← ∅
3 while ∃v ∈ V \ V ′ do
4
σv ← RejectionSampling(Φv , rv )
/* rv is a fresh new randomness tape */
′
5
Assign σ(Vv ) ← σv (Vv ) and update V ← V ′ ∪ Vv
6 end
7 return σ
We observe the following results regarding Algorithm 9.
Lemma 4.27. If eα∆ ≤ 1, then the following holds for FinalSampling(Φ, M, e
σ).
C|
σ
e
• It halts almost surely, and outputs σ ∼ µTrue
when it halts.
P
• Its expected total running time is at most O kdQ v∈V (1 − eα)−|Cv | .
Proof. All the Φv can be easily constructed with one pass of C and V which takes time O (k|C| + |V |).
Cv
By Proposition
3.3, each iteration of Line 4 halts almost surely and generates σv ∼ µTrue in
expected time O (kQ |Cv | + Q) · (1 − eα)−|Cv | . Therefore FinalSampling halts almost surely and
its expected total running time is at most
X
O k|C| + |V | +
(kQ |Cv | + Q) · (1 − eα)−|Cv |
v needed by Line 4
!
X kQ |Cv | + Q
−|Cv |
· (1 − eα)
= O k|C| + |V | +
|Vv |
v∈V
!
X
−|Cv |
.
(since |Cv | ≤ d |Vv | and |C| ≤ d|V |)
≤ O kdQ
(1 − eα)
v∈V
32
Moreover when FinalSampling halts, by iteratively applying Item (4) of Proposition 3.1 we have
Y
C|σe
v
σ∼
= µTrue
.
µCTrue
v needed by Line 4
Combining Proposition 4.17, we analyze the performance of Algorithm 9 in Algorithm 4.
Proposition 4.28. If eα∆ ≤ 1, e∆2 ρ ≤ 1/32, and ∆2 λ ≤ 1/16, then the following holds for the
FinalSampling in Algorithm 4.
• It halts almost surely, and outputs σ ∼ µCTrue when it halts.
• Its expected running time is at most O d2 kQ∆|V | .
C . Define σ
e by
Proof. By Item (1) of Lemma 4.27, it halts almost surely. Fix an arbitrary σ ′ ∈ σTrue
′
V
\M
setting σ
e(M) = σ (M) and σ
e(V \ M) = ⋆
. Combining Proposition 4.17 and Definition 4.16,
we have
C|σe ′
C|σe
b(M) = σ ′ (M) · µTrue
(σ )
Pr σ = σ ′ = µM (e
σ ) · µTrue
(σ ′ ) = Pr σ
=
σ
b∼
=
QPr
v∈V
Prσb∼
Q
Dv
σ
b∼µC
True
C
σ
b(M) = σ ′ (M) σ
b ∈ σTrue
·
σ ′ (M), σ
b
σ
b(M) =
∈
Q
C
Prσb∼ v∈V Dv σ
b ∈ σTrue
v∈V
Dv
σ′′ ∼
C
σTrue
·
QPr
v∈V
Dv |σe
h
i
C|σe
σ ′′ = σ ′ σ ′′ ∈ σTrue
Prσ′′ ∼Qv∈V
Prσ′′ ∼Qv∈V
Dv |σe
Dv |σe
h
[σ ′′ = σ ′ ]
C|
σ
e
σ ′′ ∈ σTrue
i.
′
By Definition
v if v ∈ V \ M; and equals the point distribution of σ (v) if v ∈ M.
Q 2.5, Dv |σe equals DQ
Let σ1 ∼ v∈V \M Dv and σ2 ∼ v∈M Dv be independent. Then
Prσ′′ ∼Qv∈V
Prσ′′ ∼Qv∈V
Dv |σe
Dv |σe
h
[σ ′′ = σ ′ ]
C|
σ
e
σ ′′ ∈ σTrue
i=
=
Prσ1 [σ1 = σ ′ (V \ M)]
h
i
C|σe
Prσ1 σ1 ◦ σ ′ (M) ∈ σTrue
(◦ represents vector concatenation)
Prσ1 ,σ2 [σ1 = σ ′ (V \ M), σ2 = σ ′ (M)]
h
i
C|σe
, σ2 = σ ′ (M)
Prσ1 ,σ2 σ1 ◦ σ ′ (M) ∈ σTrue
(since σ1 , σ2 are independent)
=
=
=
σ′ ]
Pr
[σ ◦ σ2 =
h σ1 ,σ2 1
i
C|σe
, σ2 = σ ′ (M)
Prσ1 ,σ2 σ1 ◦ σ2 ∈ σTrue
Prσb∼Qv∈V Dv [b
σ = σ′]
h
i
C|σe
Prσb∼Qv∈V Dv σ
b ∈ σTrue
,σ
b(M) = σ ′ (M)
Prσb∼Qv∈V Dv [b
σ = σ′]
C ,σ
Prσb∼Qv∈V Dv σ
b ∈ σTrue
b(M) = σ ′ (M)
C|
σ
e
where the last equality is because when σ
b(M) = σ ′ (M), we have σ
b(M) = σ
e(M) and thus σ
b ∈ σTrue
C . Hence in all, we have
iff σ
b ∈ σTrue
′
Pr σ = σ =
as desired.
[b
σ = σ′]
= QPr
C
σ
b∼ v∈V
b ∈ σTrue
Dv σ
Prσb∼Qv∈V
Prσb∼Qv∈V
Dv
33
Dv
C
σ
b = σ′ σ
b ∈ σTrue
= µCTrue (σ ′ )
Note that µM is a stationary distribution for Pi by Item (3) Proposition 4.21. Meanwhile by
Item (4) of Proposition 4.25, the t-th for iteration in BoundingChain is a coupling of Pit . Thus
in FinalSampling, upon receiving σ
e which has distribution µM , we can execute |V | more rounds
e using fresh randomness, and the resulted assignment still has
of Line 3-11 in Algorithm 5 on σ
M
distribution µ . In other words, we may safely assume the last |V | rounds of update in the
final BoundingChain procedure are all using RejectionSampling. Thus each |Cv | in Lemma 4.27 also
satisfies the concentration bound in Proposition 4.11. By a similar calculation in the proof of the
efficiency part of Proposition 4.5 and noticing |C| ≤ d|V |, the expected running time here is at most
!
+∞
X
1 ℓ−1
ℓ+1
2
= O d2 kQ∆|V | .
·∆·4
O d kQ|V |
32
ℓ=1
4.4
Putting Everything Together
Now we put everything together to prove our main theorem.
Proof of Theorem 4.1. The correctness part follows immediately from Proposition 4.28 and Proposition 4.17.
Thus recall measures defined in Definition 2.1 and we focus on the efficiency part.
• Let X be the total running time of AtomicCSPSampling(Φ, π).
• Let A be the time for computing β(Φ, M) in (3). Then A = O(k|C|) ≤ O(dk|V |).
• For integer i ≥ 1 and j ∈ [i], let Xi,j be the running
for iteration in
h
i time of the (−j)-th
2
2
5
2
BoundingChain(Φ, M, −i, r−i, . . . , r−1 ). Then E Xi,j = O dk ∆ Q by Proposition 4.5.
• Let TFinal be the T when the while iterations stop.
Pr [TFinal ≥ t] ≤ 4|V | · 2−t/|V | for t ≥ 2|V | − 1.
Then by Proposition 4.5, we have
• Let Y be the running time of the FinalSampling in the end. Then by Proposition 4.28 we have
E[Y ] = O d2 kQ∆|V | .
Therefore we have X = A +
Plog(TFinal ) P2t
j=1 X2t ,j
t=0
+ Y 15 and
t
+∞ X
2
X
E X2t ,j · [TFinal ≥ 2t ]
E[X] = E[A] + E[Y ] +
t=0 j=1
≤ E[A] + E[Y ] +
+∞ X
2t r
X
t=0 j=1
t
≤ E[A] + E[Y ] +
m X
2
X
t=0 j=1
≤ O d2 kQ∆|V | +
p
i
h
E X22t ,j Pr [TFinal ≥ 2t ]
(by Cauchy-Schwarz inequality)
r h
+∞ X
2t r h
i
i
X
2
E X2t ,j +
E X22t ,j Pr [TFinal ≥ 2t ]
dk2 ∆5 Q2 ·
t=m+1 j=1
(m ≥ ⌊log(2|V | − 1)⌋ to be determined later)
!!
+∞
X
p
2t
− |V
m
t
2 + |V |
.
2 ·2 |
t=m+1
15
Technically we also need to initialize the randomness at Line 1 of Algorithm 4, and check for Line 5 in Algorithm 4,
and initialize the assignment at Line 1 of Algorithm 5. However these can be done on the fly and their cost will be
minor compared with the parts we listed.
34
We pick m = ⌈log(|V |) + log log(|V |) + 10⌉ then
m
2 +
p
|V |
+∞
X
t
2
t− |V
|
2
t=m+1
m
≤2 +
p
p
|V |
Z
+∞
x
2
x− |V
|
2
2x
(2x− n is decreasing when 2x ≥
dx
m
m
|V |
−2
· 2 |V |
2
ln (2)
= O (|V | log(|V |)) .
Since d ≤ ∆, we have E[X] = O kQ∆3 |V | log(|V |) .
= 2m +
|V | ·
(since
−n
ln2 (2)
2x
· 2− n
′
n
ln(2) )
2x
= 2x− n )
Remark 4.29. Computing higher moments of Xi,j , Y and using possibly stronger assumption, one
can improve the dependency on k, ∆, Q in the expected running time. However we view these as
constants compared with |V |. Thus we do not make the effort here.
5
Applications
Here we instantiate Theorem 4.1 to special CSPs. We will use the following algorithmic Lovász
local lemma for constructing the marking M.
C
Theorem 5.1 ([MT10]). Let Φ = V, (Ωv , Dv )v∈V , C be a CSP. If ep∆ ≤ 1, then σTrue
6= ∅ and
C
there exists a randomized algorithm which outputs some σ ∈ σTrue in time O(k∆|V |) with probability
at least 0.99.
We define the smooth parameter of a CSP Φ by
κ = κ(Φ) = max max
′
v∈V q,q ∈Ωv
Dv (q)
.
Dv (q ′ )
(11)
Note that κ(Φ) ≥ 1 always. When context is clear, we will simply write κ.
5.1
Binary Domains
Let M ⊆ V be a marking. We specialize α, β, λ when all domains are of size 2.
• α = α(Φ, M) = maxC∈C α(Φ, M, C) where
α(Φ, M, C) =
Y
v∈vbl(C)\M
C
Dv (σFalse
(v));
• When eα ≤ 1, define
– β = β(Φ, M) = (1 − eα)−d ≤ (1 − eα)−∆ ;
– λ = λ(Φ, M) = maxC∈C λ(Φ, M, C) where
λ(Φ, M, C) = |vbl(C)|2 · β |vbl(C)∩M| ·
Y
v∈vbl(C)∩M
By Remark 4.2, we will use the following version of Theorem 4.1.
35
C
Dv (σFalse
(v)).
Theorem 5.2 (Theorem 4.1, Binary Domains). Let Φ = V, (Ωv , Dv )v∈V , C be an atomic CSP
such that |Ωv | = 2 for all v ∈ V . Let M ⊆ V be a marking. If
e · α(Φ, M, C) · ∆ ≤ 1
and
∆2 · λ(Φ, M, C) ≤ 1/100
for all C ∈ C,
then the following holds for AtomicCSPSampling(Φ, M).
• Correctness. It halts almost surely and outputs σ ∼ µCTrue when it halts.
• Efficiency. Its expected total running time is O k∆3 |V | log(|V |) .
We now construct a valid marking when the underlying distributions are arbitrary.
Lemma 5.3. Let Φ = V, (Ωv , Dv )v∈V , C be an atomic CSP such that |Ωv | = 2 for all v ∈ V . For
any ζ ∈ (0, 1), if pγ · ∆ ≤ 0.01 · ζ/κ where
q
3 − 9ζ + ln(κ + 1) − ln2 (κ + 1) + 6 · (1 − 3ζ) ln(κ + 1)
,
γ=
9
then there exists a marking M ⊆ V such that
e · α(Φ, M, C) · ∆ ≤ 1
and
∆2 · λ(Φ, M, C) ≤ 1/100
for all C ∈ C.
Moreover M can be constructed in time O(k∆|V |) with success probability at least 0.99.
Q
C (v)). Then p = p(Φ) = max
Proof. For each C ∈ C, define pC = v∈vbl(C) Dv (σFalse
C∈C pC . Mean|vbl(C)|
|vbl(C)|
while by the definition of κ, we know (1/(κ + 1))
≤ pC ≤ (κ/(κ + 1))
. Thus
ln(1/pC )
.
ln(1 + 1/κ)
|vbl(C)| ≤
(12)
Since xζ ≥ ζ ln(x) holds for any x > 0, we also have
ln(1/pC ) ≤ p−ζ
C /ζ.
(13)
Let η, τ ∈ (0, 1) be parameters and M be the marking to be determined later. We will ensure
1 − η − τ ≥ 0 and η − τ − 3ζ ≥ 0. For each C ∈ C, let EC be the event (i.e., constraint)
Y
C
EC = “ ln
Dv (σFalse
(v)) − η ln(pC ) > τ ln(1/pC ) ”.
v∈vbl(C)∩M
Now we check e · α(Φ, M, C) · ∆ ≤ 1 and ∆2 · λ(Φ, M, C) ≤ 1/100 assuming no EC happens. Since
vbl(C) is the disjoint union of vbl(C) ∩ M and vbl(C) \ M, if EC does not happen, then
α(Φ, M, C) ≤ p1−η−τ
≤ p1−η−τ .
C
Since (1 − x)−1/x ≤ 4 holds for any x ∈ (0, 1/2], we have
β ≤ 1−e·p
1−η−τ −∆
e·p1−η−τ ·∆
≤4
≤
κ+1
κ
ζ
36
if
2e ln(2)·p1−η−τ ·∆ ≤ ζ ·ln (1 + 1/κ) . (14)
Note that 2e ln(2) · p1−η−τ · ∆ ≤ ζ · ln (1 + 1/κ) already implies e · α(Φ, M, C) · ∆ ≤ 1. Combining
(12), (13), and (14), we also have
≤
λ(Φ, M, C) ≤ |vbl(C)|2 · β |vbl(C)| pη−τ
C
−ζ
−3ζ
ln2 (1/pC ) · pη−τ
pη−τ
pη−τ −3ζ
C
C
≤
≤
.
ln2 (1 + 1/κ)
ζ 2 ln2 (1 + 1/κ)
ζ 2 ln2 (1 + 1/κ)
Therefore ∆2 · λ(Φ, M, C) ≤ 1/100 is reduced to ∆2 · pη−τ −3ζ ≤ 0.01 · ζ 2 · ln2 (1 + 1/κ). In all, it
suffices to make sure 1 − η − τ ≥ 0, η − τ − 3ζ ≥ 0, and
2e ln(2) · p1−η−τ · ∆ ≤ ζ · ln(1 + 1/κ),
and ∆2 · pη−τ −3ζ ≤ 0.01 · ζ 2 · ln2 (1 + 1/κ).
(15)
Now we show how to set η, τ and construct M to make sure no EC happens. We put each
v ∈ V into M independently with probability η. For each v ∈ V , let xv ∈ {0, 1} be the indicator
for whether v is in M. Then
X
X
C
C
(v)) > τ ln(1/pC ) ”.
xv ln 1/Dv (σFalse
(v)) − E
xv ln 1/Dv (σFalse
EC = “
v∈vbl(C)
v∈vbl(C)
By Hoeffding’s inequality [Hoe94, Theorem 2], we have
(
)
−2τ 2 ln2 (1/pC )
Pr [EC ] ≤ 2 exp P
2
C
v∈vbl(C) ln 1/Dv (σFalse (v))
)
(
−2τ 2 ln2 (1/pC )
P
(by the definition of κ)
≤ 2 exp
C (v))
ln(κ + 1) · v∈vbl(C) ln 1/Dv (σFalse
P
−2τ 2 ln(1/pC )
C (v)) )
(since ln(pC ) = v∈vbl(C) ln Dv (σFalse
= 2 exp
ln(κ + 1)
2τ 2 / ln(κ+1)
= 2 · pC
≤ 2 · p2τ
2 / ln(κ+1)
.
Since vbl(EC ) = vbl(C) and thus it correlates with ∆ many EC ′ (including itself), by Theorem 5.1
we can construct M (i.e., (xv )v∈V ) to avoid all EC in time O(k∆|V |) with probability at least 0.99
as long as
2
2e · p2τ / ln(κ+1) · ∆ ≤ 1.
Now we set η = (2 − τ + 3ζ)/3 and
q
− ln(κ + 1) + ln2 (κ + 1) + 6 · (1 − 3ζ) ln(κ + 1)
1 − 3ζ
<
τ=
6
2
so that 1 − η − τ , (η − τ − 3ζ)/2, and 2τ 2 / ln(κ + 1) all equal
q
− ln(κ + 1) + ln2 (κ + 1) + 6 · (1 − 3ζ) ln(κ + 1)
1
> 0.
γ = −ζ −
3
9
Then all the conditions in (15) boil down to pγ · ∆ ≤ 0.1 · ζ · ln(1 + 1/κ). Since κ ≥ 1 and
ln(1 + x) ≥ 0.1x for all x ∈ [0, 1], we can safely replace ln(1 + 1/κ) with 0.1/κ as desired in the
statement.
37
Note that whether M satisfies the conditions in Theorem 5.2 can be easily checked in time
O(k|C|) = O(k∆|V |) by computing α(Φ, M, C) and λ(Φ, M, C) for each C ∈ C. Thus we can keep
performing Lemma 5.3 until the marking M is valid and then we run AtomicCSPSampling(Φ, M).
This provides a Las Vegas algorithm as below.
Corollary 5.4. There
exists a Las Vegas algorithm which takes as input an atomic CSP Φ =
V, (Ωv , Dv )v∈V , C and a parameter ζ ∈ (0, 1) such that the following holds.
If |Ωv | = 2 holds for all v ∈ V and pγ · ∆ ≤ 0.01 · ζ/κ where
q
3 − 9ζ + ln(κ + 1) − ln2 (κ + 1) + 6 · (1 − 3ζ) ln(κ + 1)
,
γ=
9
then the algorithm outputs a random solution of Φ distributed perfectly as µCTrue in expected time
O k∆3 |V | log(|V |) .
Remark 5.5. One natural choice of the underlying distributions is the uniform distribution. In
this case κ = 1. By setting ζ → 0, we have
q
3 + ln(2) − ln2 (2) + 6 ln(2)
> 0.171.
γ→
9
For example we can set ζ = 10−10 and the local lemma condition is simply p0.171 · ∆ ≤ 10−12 /κ. In
Lemma 5.15 we will optimize it to 0.175 by a tighter concentration bound.
5.2
Large Domains: State Tensorization
Here we formally introduce the state tensorization technique, generalizing state compression from
[FHY20]. This, as emphasized in Subsection 1.3, allows us to transform a large domain into a
product of binary domains.
Let Ω be a finite domain of size at least 2 and D be a distribution supported on Ω. A state
tensorization for (Ω, D) (See Figure 2 for a concrete example) is a rooted tree T where
• T has |Ω| leaves and each internal node of T has at least two child nodes;
• the leaves of T have a one-to-one correspondence with elements in Ω.
For each node z ∈ T , let leafs(z) be the set of leaves in the sub-tree of z. Then leafs(rt) = Ω for
root rt. For each internal node z, we use childs(z) to denote its child nodes. For any z ′ ∈ childs(z),
we use z → z ′ to denote the edge from z to z ′ . Moreover, we define the weight of z → z ′ as
P
q∈leafs(z ′ ) D(q)
′
W (z → z ) = P
.
(16)
q∈leafs(z) D(q)
It is easy to see the total weight of outgoing edges of any internal node is 1.
If |Ω| = 1 and thus D is the point distribution, then the state tensorization T for (Ω, D) has
two nodes z and z ′ where z is the root and z ′ is the only leaf and W (z → z ′ ) = 1.
For each q ∈ Ω, we use path(q, T ) to denote the set of internal nodes in T on the path from the
root to the leaf z that corresponds to q. For example in Figure 2, path(b, T ) = {z0 , z1 , z3 }.
We first observe the following fact regarding edge weights in T .
Fact 5.6. Let q ∈ Ω be arbitrary. Let path(q, T ) = {z0 , . . . , zℓ } and zℓ+1 beQthe leaf node corresponding to q. Assume z0 , . . . , zℓ+1 is in the root-to-leaf order. Then D(q) = ℓi=0 W (zi → zi+1 ).
38
z0
3
5
2
5
z1
5
6
State tensorization T for (Ω, D)
z3
1
3
a
1
3
1
3
z2
1
6
1
4
d
e
3
4
f
c
b
Figure 2: One example of T for (Ω, D) where Ω = {a, b, c, d, e, f } and D(a) = D(b) = D(c) =
1/6, D(d) = D(e) = 1/10, D(f ) = 3/10. We omit the leaf nodes.
Proof. By (16), we have
ℓ
Y
i=0
W (zi → zi+1 ) =
ℓ
Y
P
′
q ′ ∈leafs(zi+1 ) D(q )
P
′
q ′ ∈leafs(zi ) D(q )
i=0
P
′
q ′ ∈leafs(zℓ+1 ) D(q )
′
q ′ ∈leafs(z0 ) D(q )
= P
= D(q)
Now we move to atomic CSPs and show formally how state tensorization helps reduce domain
sizes.
Definition 5.7 (Tensorized Atomic Constraint Satisfaction Problem). Let Φ = V, (Ωv , Dv )v∈V , C
be an atomic CSP. Let (Tv )v∈V be state tensorizations
where each Tv is a state tensorization for
⊗
⊗
16
as the tensorized atomic constraint satisfac(Ωv , Dv ). We construct Φ = Z, (Ωz , Dz )z∈Z , C
tion problem:17
• Z is the set of internal nodes of all Tv .
• For each z ∈ Z, Ωz = childs(z) and Dz is a distribution supported on Ωz by setting Dz (z ′ ) =
W (z → z ′ ) for all z ′ ∈ childs(z).
• For each C ∈ C, we construct C ⊗ ∈ C ⊗ by setting
[
C
vbl(C ⊗ ) =
path(σFalse
(v), Tv )
v∈vbl(C)
C
C (v)) where
and C ⊗ (σ) = False iff σ(z) = z(σFalse
(v)) for all v ∈ vbl(C) and z ∈ path(σFalse
18
z(q) ∈ childs(z) is the child node of z such that q ∈ leafs(z(q)).
For example in Figure 2, we have Ωz0 = {z1 , z2 } and Dz0 (z1 ) = 3/5, Dz0 (z2 ) = 2/5. Assume
constraint C is false iff d is assigned. Then C ⊗ will have vbl(C ⊗ ) = path(d, T ) = {z0 , z1 } and, for
σ ∈ Ωz0 × Ωz1 × Ωz2 × Ωz3 , C ⊗ (σ) iff σ(z0 ) = z1 and σ(z1 ) equals the leaf node corresponding to d.
Recall measures defined in Definition 2.1. Here we note some basic facts for Definition 5.7.
16
We require each Tv uses different set of tree nodes. So there will be no confusion when using z without explicitly
providing Tv ∋ z.
17
Technically Φ⊗ depends on (Tv )v∈V which we omit here for simplicity. In addition we remark Φ⊗ may have
redundant variables that do not appear in any constraint; this is indeed consistent with our definition of CSPs as we
never require them to shave redundant variables.
18
Assume path(q, Tv ) = {z0 , . . . , zℓ } where z0 , . . . , zℓ is in the top-down order of Tv . Let zℓ+1 be the leaf node
corresponding to q. Then zi (q) = zi+1 for each i ∈ {0, . . . , ℓ}.
39
Fact 5.8. The following holds for Φ⊗ .
(1) Φ⊗ is an atomic CSP where |C ⊗ | = |C| and |Z| =
(2)
Q(Φ⊗ )
= maxz∈Z |childs(z)| and
k(Φ⊗ )
P
v∈V
| {internal nodes in Tv } | ≤ Q(Φ)|V |.
C (v), T )
≤ k(Φ) · maxC∈C,v∈vbl(C) path(σFalse
v
(3) ∆(Φ⊗ ) = ∆(Φ), d(Φ⊗ ) = d(Φ), and p(Φ⊗ ) = p(Φ). Moreover for all C ∈ C we have
⊗ ′
[C(σ) = False] =
C (σ ) = False .
Q Pr
Q Pr
σ∼
v∈vbl(C)
σ′ ∼
Dv
z∈vbl(C ⊗ )
Dz
Proof. Item (1) (2) are obvious. Now we focus on Item (3). Note that for any C ∈ C and v ∈ V ,
v ∈ vbl(C) iff the root of Tv is in vbl(C ⊗ ). On the other hand if some internal node z is in vbl(C ⊗ )
then all the ancestors of z are also in vbl(C ⊗ ). Thus ∆(Φ⊗ ) = ∆(Φ) and d(Φ⊗ ) = d(Φ). The
“moreover” part and p(Φ⊗ ) = p(Φ) follow from Fact 5.6.
Now Q
we formally describe how to Q
translate an assignment for Φ⊗ into an assignment for Φ. For
Trans
any σ ∈ z∈Z Ωz , we define σ
∈ v∈V Ωv by the following process for each v ∈ V :
• We start from the root of Tv .
• If we are at an internal node z of Tv , then proceed to its child node σ(z) and repeat.
• Otherwise we are at a leaf node, then set σ Trans (v) by its corresponding value in Ωv .
Finally we prove the following simple but powerful reduction result.
⊗
Proposition 5.9. If σ ∼ µCTrue , then σ Trans ∼ µCTrue .
Therefore to obtain a random solution of Φ distributed perfectly as µCTrue , it suffices to have a
⊗
random solution of Φ⊗ distributed perfectly as µCTrue and then perform Trans operation.
Proof. Recall RejectionSampling(Φ⊗, ·) in Algorithm 2:
Q
(1) Sample σ ∼ z∈Z Dz .
(2) If C ⊗ (σ) = True for all C ⊗ ∈ C ⊗ , then accept σ; otherwise resample σ.
⊗
Obviously σ ∼ µCTrue . By the definition of C ⊗ in Definition 5.7 and the definition of Trans above,
we have C ⊗ (σ) = True iff C(σ Trans ) = True. Thus we can safely replace Step (2) by
(2a) If C(σ Trans ) = True for all C ∈ C, then accept σ; otherwise resample σ.
On the other hand for Step (1), we can first sample the σ Trans part and then complete it to σ:
(1a) For each v ∈ V , we start from the root of Tv .
– If we are at an internal node z of Tv , then sample σ(z) ∼ Dz and proceed to σ(z).
– Otherwise we are at a leaf node, then move to the next variable in V .
(1b) For each z ∈ Z that σ(z) is not sampled in Step (1a), we complete it by σ(z) ∼ Dz .
Q
By Fact 5.6 and the definition of Trans, Step (1a) is equivalent to sample σ Trans ∼ v∈V Dv and
σ Trans does not depend on the values sampled in Step (1b). More intuitively, we express this hybrid
argument on rejection sampling through the following equivalence of flow charts:
equals
equals
equals
Step (1) ⇆ Step (2) → σ Trans
Step (1a) (1b) ⇆ Step (2a) → σ Trans
Step (1a) ⇆ Step (2a) → Step (1b) → σ Trans
Step (1a) ⇆ Step (2a) → σ Trans ,
where the last line is exactly RejectionSampling(Φ, ·). Thus σ Trans ∼ µCTrue .
40
5.3
General Atomic Constraint Satisfaction Problem: Arbitrary Distribution
Now we deal with general atomic CSPs where domains may be large. Firstly we prove the following
lemma which describes a balanced way to construct state tensorization.
Lemma 5.10. There exists a deterministic algorithm such that the following holds. Let Ω be a finite
domain of size at least 2 and D be a distribution supported on Ω. Let κ = maxq,q′ ∈Ω D(q)/D(q ′ ).
The algorithm constructs a state tensorization T for (Ω, D) where
(1) the algorithm runs in time O(|Ω| log(|Ω|)) and T is a binary tree;
(2) for any internal node z ∈ T and {z1 , z2 } = childs(z), we have
W (z→z1 )
W (z→z2 )
≤ max {κ, 2}.
Proof. T is constructed like a Huffman tree as follows:
• For each q ∈ Ω create a node zq with value val(zq ) = D(q). Initialize set S = {zq | q ∈ Ω}.
• While |S| ≥ 2,
– select two distinct nodes z1 , z2 ∈ S with minimum value, i.e., val(z1 ), val(z2 ) are the
smallest among all nodes in S,
– create a parent node z of z1 and z2 ; then set val(z) = val(z1 ) + val(z2 ) and update
val(z2 )
1)
S ← (S ∪ {z}) \ {z1 , z2 }. Note that W (z → z1 ) = val(z
val(z) and W (z → z2 ) = val(z) .
• The final node in S when |S| = 1 is the root of T .
Then Item (1) is obvious if S is implemented as a balanced binary search tree or a heap.
Now we turn to Item (2). Define κ(S) = maxz,z ′ ∈S val(z)/val(z ′ ) when |S| ≥ 2. Then each time
W (z→z1 )
we select z1 , z2 ∈ S and link them to z, we have W
(z→z2 ) ≤ κ(S). Therefore it suffices to show
κ(S) ≤ max {κ, 2} throughout the construction.
• Initialization. Then we simply have κ(S) = κ.
• Afterwards. Assume S = {z1 , z2 , . . . , zℓ } for ℓ ≥ 2 where val(z1 ) ≤ val(z2 ) ≤ · · · ≤ val(zℓ ).
Let S ′ = {z, z3 , . . . , zℓ } be S after the update where val(z) = val(z1 ) + val(z2 ). Now we have
two possible cases.
– If val(z) ≤ val(zℓ ), then
′
κ(S ) = max
val(zℓ ) val(zℓ )
,
val(z) val(z3 )
≤
val(zℓ )
= κ(S) ≤ max {κ, 2} .
val(z1 )
– If val(z) > val(zℓ ), then
κ(S ′ ) =
val(z)
val(z1 ) + val(z2 )
=
≤ 2.
val(z3 )
val(z3 )
By Lemma 5.10, Proposition 5.9, and Corollary 5.4, we have the following theorem.
Theorem 5.11. There
exists a Las Vegas algorithm which takes as input an atomic CSP Φ =
V, (Ωv , Dv )v∈V , C and a parameter ζ ∈ (0, 1) such that the following holds.
κ where
Let κ
e = max {κ, 2} (Recall κ = κ(Φ) from (11)). If pγ · ∆ ≤ 0.01 · ζ/e
q
3 − 9ζ + ln(e
κ + 1) − ln2 (e
κ + 1) + 6 · (1 − 3ζ) ln(e
κ + 1)
,
γ=
9
then the algorithm outputs
a random solution of Φ distributed perfectly as µCTrue in expected time
O k∆3 Q2 |V | log(Q|V |) .
41
Proof. By fixing the variable which has domain size 1, we may safely assume |Ωv | ≥ 2 for all v ∈ V .
We use Lemma 5.10 to construct Tv for each (Ωv , Dv ). This takes time O(Q(Φ) log(Q(Φ))) · |V |.
Then let Φ⊗ = Z, (Ωz , Dz )z∈Z , C ⊗ be the tensorized atomic CSP (Defined in Definition 5.7). By
Item (2) of Lemma 5.10, we have κ(Φ⊗ ) ≤ max {κ(Φ), 2}. Also by Fact 5.8, we have ∆(Φ⊗ ) =
∆(Φ), p(Φ⊗ ) = p(Φ), k(Φ⊗ ) ≤ k(Φ) · Q(Φ)19 , and |Z| ≤ Q(Φ)|V |. By Proposition 5.9, it suffices
⊗
to obtain a random solution of Φ⊗ distributed perfectly as µCTrue , which, by Corollary 5.4, gives the
claimed bounds.
Remark 5.12. Applying Theorem 5.11 to the uniform distributions, i.e., κ = 1, and setting ζ → 0,
we have
q
3 + ln(3) − ln2 (3) + 6 ln(3)
> 0.145.
γ→
9
This already beats 1/7 from [JPV20] and 0.142 from [JPV21]. In Corollary 5.18 we will further
optimize it to 0.175 with a more refined construction.
5.4
Hypergraph Coloring
The previous bounds can be improved if the domains are large enough and the underlying distributions are smooth. Here we take the hypergraph coloring problem as an example.
Definition 5.13 (Hypergraph Coloring). Let Q and k be positive integers. Let H = (V, E) be a
k-uniform hypergraph, i.e., each edge e ∈ E contains exactly k distinct variables. We associate it
with an atomic CSP Φ = Φ(H, Q) = V, ([Q], U )V , C where U is the uniform distribution over [Q]
and C = Ce,i : [Q]V → {True, False} e ∈ E, i ∈ [Q] and Ce,i (σ) = False iff σ(v) = i for all v ∈ e.
A solution to Φ is called a proper coloring for H.
To avoid confusion, k, d, ∆ will only be referred to k(H), d(H), ∆(H). It is easy to see k(Φ) = k,
d(Φ) = Q · d, ∆(Φ) = Q · ∆, Q(Φ) = Q, and p(Φ) = Q−k .
Theorem 5.14. There exists a Las Vegas algorithm which takes as input a k-uniform hypergraph
H = (V, E) and an integer Q. If ∆ = ∆(H) ≤ Q(1/3−oQ,k (1))k , then the algorithm outputs
a perfect
uniform random proper coloring for H in expected time O k∆3 Q4 log(Q)|V | log(Q|V |) .
Proof. For each v ∈ V , we construct the state tensorization Tv for (Ωv , Dv ) = ([Q], U ) as a complete
binary tree, i.e., Tv has depth D = ⌈log(Q)⌉ and Tv has 2i nodes at level i for all 0 ≤ i < ⌈log(Q)⌉.
Given the state tensorizations, we obtain the tensorized atomic CSP Φ⊗ = Z, (Ωz , Dz )z∈Z , C ⊗
⊗
for Φ = Φ(H, Q). By Definition 5.13 and Proposition 5.9 it suffices to obtain σ ∼ µCTrue .
Define R = ⌊ 23 log(Q)⌋. We construct the marking M for Φ⊗ by putting all internal nodes in
Tv of level at least R into M for each v ∈ V . To apply Theorem 5.2 to (Φ⊗ , M), we compute the
constants α, β, λ (See in Subsection 5.1) for (Φ⊗ , M).
C (v), T ) = {z , . . . , z } where z , . . . , z is
Fix any C ⊗ ∈ C ⊗ and v ∈ vbl(C). Assume path(σFalse
v
0
0
ℓ
ℓ
in the top-down order of Tv . Then ℓ ∈ {D − 2, D − 1} and zR , . . . , zℓ ∈ M. Since Tv is a complete
binary tree, by (16) and the definition of Dz in Definition 5.7, if ℓ ≥ R then we have
ℓ
Y
i=R
⊗
C
Dzi (σFalse
(zi )) =
1
≤ 2−(ℓ−R) ≤ 2−(D−2−R) ≤ 4/Q1/3 .
|leafs(zR )|
) by analyzing the depth
It is possible to get a better bound on k(Φ⊗ ) (for example k(Φ⊗ ) ≤ k(Φ) · ln(κ(Φ)Q(Φ)+1)
ln(1+1/κ(Φ))
of each Tv . This is because the construction of Tv is “balanced” and the support of Dv has size |Ωv | ≤ Q(Φ) only.
However this only slightly improves the bound on the running time which is not our main focus.
19
42
Thus
k
α(Φ⊗ , M, C ⊗ ) ≤ 4/Q1/3 .
(17)
By Item (3) of Fact 5.8, ∆(Φ⊗ ) = ∆(Φ) = Q · ∆. Since (1 − x)−1/x ≤ 4 holds for any x ∈ (0, 1/2],
we have
k
1
1
1/3 k
.
(18)
β ≤ 4e·(4/Q ) ·Q∆ ≤ 4 kD if e · 4/Q1/3 · Q∆ ≤
kD
Meanwhile
R−1
Y
2ℓ−R+2
2D−R+1
|leafs(zR )|
C⊗
≤
≤
≤ 8/Q2/3 .
Dzi (σFalse
(zi )) =
Q
Q
Q
i=0
By Item (2) of Fact 5.8, k(Φ⊗ ) ≤ k(Φ) · D = kD. Thus combining (18), we have
k
λ(Φ⊗ , M, C ⊗ ) ≤ k2 D 2 · 4 · 8/Q2/3 .
(19)
Therefore, it suffices to make sure
D − 2 ≥ R,
k
and k2 D 2 · 4 · 8/Q2/3 · (Q∆)2 ≤ 1/100.
k
1
,
e · 4/Q1/3 · Q∆ ≤
kD
Since D = ⌈log(Q)⌉ and R = ⌊ 23 log(Q)⌋, it suffices to ensure
Q≥5
and ∆ ≤
Q1/3
4
!k
·
1
= Q(1/3−oQ,k (1))k .
40kQ log(Q)
Now we compute the running time. Note that the reduction to Φ⊗ and the construction of
⊗
M only take O(Q|V |) time. Since |Z| ≤ Q|V |, k(Φ⊗ ) = O(k log(Q)),
and ∆(Φ ) = Q∆, by
3
4
Theorem 5.2 the algorithm runs in time O k∆ Q log(Q)|V | log(Q|V |) .
5.5
General Atomic Constraint Satisfaction Problem: Uniform Distribution
The analysis in Subsection 5.3 is purely a black-box reduction from large domains to binary domains: the construction of the marking does not use the fact that the CSP after reduction is actually
a tensorized atomic CSP. Here we provide a unified construction for the state tensorizations and
the marking when the underlying
distributions are the uniform distribution.
Let Φ = V, (Ωv , Dv )v∈V , C be an atomic CSP such that |Ωv | ≥ 2 and Dv is the uniform
distribution for all v ∈ V . Our strategy is similar as before:
(1) Construct state tensorizations (Tv )v∈V to obtain Φ⊗ = Z, (Ωz , Dz )z∈Z , C ⊗ as the tensorized
atomic CSP for Φ. We will make sure each Ωz is a binary domain, i.e., Q(Φ⊗ ) = 2, to apply
Theorem 5.2.
(2) Construct a marking M ⊆ Z to satisfy
e · α(Φ⊗ , M, C ⊗ ) · ∆(Φ⊗ ) ≤ 1
and ∆(Φ⊗ )2 · λ(Φ⊗ , M, C ⊗ ) ≤ 1/100
for all C ⊗ ∈ C ⊗ ,
where we recall α, β, λ from Subsection 5.1.
(3) Apply Theorem 5.2 to Φ⊗ and M. By Proposition 5.9, this provides a perfect uniform solution
to Φ after performing Trans operation.
43
Q
Q
−1
⊗
⊗
C (v)) =
For each C ∈ C, define pC = v∈vbl(C) Dv (σFalse
v∈vbl(C) |Ωv | . For each C ∈ C , define
Q
C ⊗ (z)). To avoid confusion and by Item (3) of Fact 5.8, we will use ∆ for
pC ⊗ = z∈vbl(C ⊗ ) Dz (σFalse
both ∆(Φ⊗ ) and ∆(Φ); use p for both p(Φ⊗ ) and p(Φ), and use pC for both pC and pC ⊗ .
We construct the state tensorization Tv for each (Ωv , Dv ) in a random and independent way.
The marking M ⊆ Z will be constructed along with each Tv . This will be similar as Lemma 5.3:
(Tv )v∈V and M are constructed with high success probability using Theorem 5.1.
Recall path(·, ·) from Subsection 5.2 and z(·) from Definition 5.7. For each v ∈ V and q ∈ Ωv ,
define
Y
X(v, q) = log
Dz (z(q)) .
z∈path(q,Tv )∩M
By Fact 5.6 and noticing Dv is uniform, we know
X(v, q) = log |Ωv |−1 − log
Y
z∈path(q,Tv )\M
Dz (z(q)) .
Recall the definition of C ⊗ from Definition 5.7, then we have
X
C
log α(Φ⊗ , M, C ⊗ ) = log(pC ) −
X(v, σFalse
(v))
(20)
v∈vbl(C)
−∆
and β = β(Φ⊗ , M) ≤ (1 − e · maxC ⊗ ∈C ⊗ α (Φ⊗ , M, C ⊗ )) . Then we also have
X
⊗
2
C
(v)).
log λ(Φ⊗ , M, C ⊗ ) ≤ log vbl(C ⊗ ) · β |vbl(C )| +
X(v, σFalse
v∈vbl(C)
Let η, τ1 , τ2 , ζ ∈ (0, 1) be parameters to be determined later. We will ensure 1 − η − τ1 ≥ 0 and
(1)
η − τ2 − 3ζ ≥ 0. For each C ∈ C, let EC be the event (i.e., constraint)
X
(1)
C
(v)) − η log(pC ) < −τ1 log(1/pC ) ”
X(v, σFalse
EC = “
v∈vbl(C)
(2)
and let EC be
(2)
EC = “
X
v∈vbl(C)
C
X(v, σFalse
(v)) − η log(pC ) > τ2 log(1/pC ) ”.
(1)
(2)
Now we check e · α(Φ⊗ , M, C ⊗ ) · ∆ ≤ 1 and ∆2 · λ(Φ⊗ , M, C ⊗ ) ≤ 1/100 assuming no EC or EC
1−η−τ1
happens. Then firstly α(Φ⊗ , M, C ⊗ ) ≤ pC
≤ p1−η−τ1 . Since (1 − x)−1/x ≤ 4 holds for any
x ∈ (0, 1/2], we have
β ≤ 4e·p
1−η−τ1 ·∆
≤ (3/2)ζ
if
2e ln(2) · p1−η−τ1 · ∆ ≤ ζ · ln(3/2).
Note that 2e ln(2) · p1−η−τ1 · ∆ ≤ ζ · ln(3/2) already implies e · α(Φ⊗ , M, C ⊗ ) · ∆ ≤ 1.
Now we turn to λ. Since xζ ≥ ζ ln(x) holds for any x > 0, we have ln(1/pC ) ≤ p−ζ
C /ζ. Our
⊗ )|
|vbl(C
⊗
and
state tensorizations will have κ(Φ ) ≤ 2 (Recall κ(·) from (11)), thus pC ≤ (2/3)
2
λ(Φ⊗ , M, C ⊗ ) ≤ vbl(C ⊗ ) · β |vbl(C
⊗ )|
η−τ2
pC
≤
η−τ2 −3ζ
η−τ2 −ζ
pC
ln2 (1/pC ) · pC
pη−τ2 −3ζ
≤
≤
.
ζ 2 ln2 (3/2)
ζ 2 ln2 (3/2)
ζ 2 ln2 (3/2)
44
Therefore ∆2 · λ(Φ⊗ , M, C ⊗ ) ≤ 1/100 is reduced to ∆2 · pη−τ2 −3ζ ≤ 0.01 · ζ 2 · ln2 (3/2).
In all, it suffices to make sure we can construct the state tensorizations and the marking to
(2)
(1)
avoid all EC and EC with the following additional conditions:
2e ln(2) · p
1−η−τ1
1 − η − τ1 ≥ 0,
· ∆ ≤ ζ · ln(3/2),
η − τ2 − 3ζ ≥ 0,
and ∆2 · pη−τ2 −3ζ ≤ 0.01 · ζ 2 · ln2 (3/2).
(21)
We first consider the simple case where all domains are binary to verify Remark 5.5.
Lemma 5.15. Assume |Ωv | = 2 for all v ∈ V . If p0.175 · ∆ ≤ 10−7 , then Φ⊗ and M can be
constructed in time O(k(Φ) · ∆|V |) with success probability at least 0.99 such that Q(Φ⊗ ) = 2,
κ(Φ⊗ ) = 1, and
e · α(Φ⊗ , M, C ⊗ ) · ∆ ≤ 1
and
∆2 · λ(Φ⊗ , M, C ⊗ ) ≤ 1/100
for all C ⊗ ∈ C ⊗ .
Proof. Here we simply let each Tv have one root with two child nodes. Then Φ⊗ is essentially Φ
itself. We put each variable into M with probability η independently. Then for each v ∈ V and
q ∈ Ωv , we have
(
−1 with probability η,
1
X(v, q) = log
· [v ∈ M] =
2
0
with probability 1 − η.
Hence for any t ∈ R, we have E et·X(v,q) = 1 − η + η · e−t . Note that pC = 2−|vbl(C)| . Let t1 , t2 > 0
be some parameters to be optimized soon. Then we have
i
h
X
(1)
C
Pr EC = Pr −t1 ·
X(v, σFalse
(v)) > (η + τ1 ) · t1 · |vbl(C)|
v∈vbl(C)
X
C
= Pr exp −t1 ·
(v)) > exp {(η + τ1 ) · t1 · |vbl(C)|}
X(v, σFalse
v∈vbl(C)
X
C
≤ E exp −t1 ·
X(v, σFalse
(v)) · e−(η+τ1 )·t1 ·|vbl(C)| (by Markov’s inequality)
v∈vbl(C)
=
=
|vbl(C)|
1 − η + η · et1 · e−(η+τ1 )·t1
(since X(v, q)’s are independent for different v)
!
1−η−τ1
η+τ1 |vbl(C)|
η
1−η
1)
(setting et1 = (1−η)(η+τ
η·(1−η−τ1 ) )
1 − η − τ1
η + τ1
= 2−|vbl(C)|·KL(η+τ1 kη) ≤ pKL(η+τ1 kη) ,
where KL(akb) = a log
h
(2)
Pr EC
i
a
b
+ (1 − a) log
= Pr t2 ·
≤
X
v∈vbl(C)
1−a
1−b
(since 2−|vbl(C)| = pC ≤ p)
is the Kullback-Leibler divergence. Similarly
C
(v)) > −(η − τ2 ) · t2 · |vbl(C)|
X(v, σFalse
|vbl(C)|
1 − η + η · e−t2 · e(η−τ2 )·t2
45
=
1−η
1 − η + τ2
1−η+τ2
≤ pKL(η−τ2 kη) .
(1)
η
η − τ2
η−τ2 !|vbl(C)|
(setting e−t2 =
(1−η)(η−τ2 )
η·(1−η+τ2 ) )
(2)
Let EC = EC ∨ EC . Since each variable is put into M independently, EC depends only on the
constructions over vbl(C) and thus correlates with at most ∆ many EC ′ (including itself) and
Pr [EC ] ≤ 2 · pmin{KL(η+τ1 kη),KL(η−τ2 kη)} .
To apply Theorem 5.1, it suffices to make sure
2e · pmin{KL(η+τ1 kη),KL(η−τ2 kη)} · ∆ ≤ 1.
(22)
Combining (21) and (22), we can pick η, τ1 , τ2 , ζ satisfying 1 − η − τ1 ≥ 0, η − τ2 − 3ζ ≥ 0; then let
η − τ2 − 3ζ
, KL(η + τ1 kη), KL(η − τ2 kη)
γ = min 1 − η − τ1 ,
2
and all the conditions boil down to pγ · ∆ ≤ 0.01 · ζ.
Numerically maximizing γ, we have γ = 0.175, together with η = 0.595, τ1 = 0.23, τ2 =
0.245 − 3 · 10−5 , and ζ = 10−5 .
Now we proceed to the general case. Restricted by the binary case, we cannot hope for a
better bound than p0.175 · ∆ . 1. Therefore our goal is to show this bound is obtainable using the
numerical constants η, τ1 , τ2 , ζ determined above.
Lemma 5.16. Assume |Ωv | ≥ 2 for all v ∈ V . If p0.175 · ∆ ≤ 10−7 , then Φ⊗ and M can be
constructed in time O(k(Φ)Q(Φ) · ∆|V |) with success probability at least 0.99 such that Q(Φ⊗ ) = 2,
κ(Φ⊗ ) ≤ 2, and
e · α(Φ⊗ , M, C ⊗ ) · ∆ ≤ 1
and
∆2 · λ(Φ⊗ , M, C ⊗ ) ≤ 1/100
for all C ⊗ ∈ C ⊗ .
Proof. Note that the choice of η, τ1 , τ2 , ζ above and the assumption p0.175 · ∆ ≤ 10−7 already ensure
(21), which guarantees e · α(Φ⊗ , M, C ⊗ ) · ∆ ≤ 1 and ∆2 · λ(Φ⊗ , M, C ⊗ ) ≤ 1/100 for all C ⊗ ∈ C ⊗ .
Thus in the following we only need to show how to construct the state tensorizations (and thus
(2)
(1)
Φ⊗ ) and the marking M so that Q(Φ⊗ ) = 2, κ(Φ⊗ ) ≤ 2, and no EC or EC happens.
For each C ∈ C and integer m ≥ 2, define S(C, m) = {v ∈ vbl(C) | |Ωv | = m}. Then pC =
Q+∞
−|S(C,m)| . Let t , t > 0 be the constants used in the proof of Lemma 5.15, i.e.,
1 2
m=2 m
(1 − η)(η + τ1 )
η · (1 − η + τ2 )
t1 = ln
∈ [1.1659, 1.1660] and t2 = ln
∈ [1.0035, 1.0036] .
η · (1 − η − τ1 )
(1 − η)(η − τ2 )
Our construction will satisfy the following proposition. We will provide its detail and proof
soon. The key part is Item (3) which intuitively says the simple binary case is actually the worst
case.
Proposition 5.17. For each v ∈ V , the construction of the state tensorization Tv and the marking
on internal nodes of Tv is randomized and satisfies the following properties.
(1) For each v ∈ V , the construction is independent and takes O(|Ωv |) = O(Q(Φ)) time.
46
(z→z1 )
(2) Each possible Tv is a binary tree where W
W (z→z2 ) ≤ 2 (Recall W (·) from (16)) holds for any
internal node z ∈ Tv and {z1 , z2 } = childs(z).
(3) For each v ∈ V and q ∈ Ωv , we have
i
h
E e−t1 ·X(v,q)−(η+τ1 )·t1 ·log(|Ωv |) ≤ |Ωv |−KL(η+τ1 kη)
and
i
h
E et2 ·X(v,q)+(η−τ2 )·t2 ·log(|Ωv |) ≤ |Ωv |−KL(η−τ2 kη) .
We first finish the proof assuming Proposition 5.17. Note that both Q(Φ⊗ ) = 2 and κ(Φ⊗ ) ≤ 2
(2)
(1)
follow from Item (2) of Proposition 5.17. Thus we focus on how to make sure no EC or EC
happens. Similarly as in the proof of Lemma 5.16, we have
+∞
+∞
i
h
X
X
X
(1)
C
|S(C, m)| · log(m)
Pr EC = Pr −t1 ·
X(v, σFalse
(v)) > (η + τ1 ) · t1 ·
+∞
Y
Y
≤ E exp
=
−t1 ·
m=2 v∈S(C,m)
≤
+∞
Y
m=2
m=2 v∈S(C,m)
Y
+∞
X
X
m=2 v∈S(C,m)
C
(v)) − (η + τ1 ) · t1 ·
X(v, σFalse
+∞
X
m=2
i
h
C
E e−t1 ·X(v,σFalse (v))−(η+τ1 )·t1 ·log(m)
(by Item (1) of Proposition 5.17)
m−KL(η+τ1 kη)
(by Item (3) of Proposition 5.17)
m=2 v∈S(C,m)
KL(η+τ1 kη)
= pC
≤ pKL(η+τ1 kη) .
(since pC =
We also have
h
(2)
Pr EC
|S(C, m)| · log(m)
i
= Pr t2 ·
≤
+∞
Y
+∞
X
X
m=2 v∈S(C,m)
Y
m=2 v∈S(C,m)
C
X(v, σFalse
(v)) > −(η − τ2 ) · t2 ·
i
h
C
E et2 ·X(v,σFalse (v))+(η−τ2 )·t2 ·log(m)
+∞
X
m=2
Q+∞
m=2 m
−|S(C,m)| )
|S(C, m)| · log(m)
≤ pKL(η−τ2 kη) .
(1)
(2)
Let EC = EC ∨ EC . By Item (1) of Proposition 5.17, the constructions are independent for each
variable. Therefore EC depends only on the constructions over vbl(C) and thus correlates with at
most ∆ many EC ′ (including itself) and
Pr [EC ] ≤ 2 · pmin{KL(η+τ1 keta),KL(η−τ2 kη)} ≤ 2 · p0.175 .
By our assumption p0.175 · ∆ ≤ 10−7 , we apply Theorem 5.1 to find a construction of the state
(2)
(1)
tensorizations (and thus Φ⊗ ) and the marking M to avoid all EC = EC ∨ EC . The running time
is O(k(Φ)∆|V |) · O(Q(Φ)) where O(Q(Φ)) is from Item (1) of Proposition 5.17.
Putting everything together, we have the following corollary which justifies Remark 5.12.
47
Corollary 5.18. There exists a Las Vegas algorithm which takes as input an atomic CSP φ =
V, (Ωv , Dv )v∈V , C such that the following holds.
If each Dv is the uniform distribution and p0.175 · ∆ ≤ 10−7 , then the algorithm outputs a perfect
uniform random solution of Φ in expected time O k∆3 Q2 |V | log(Q|V |) .
Proof. By fixing the variable which has domain size 1, we may safely assume
|Ωv | ≥ 2 for all v ∈ V .
We keep using Lemma 5.16 to construct Φ⊗ = Z, (Ωz , Dz )z∈Z , C ⊗ and M until they satisfy
the conditions in Theorem 5.2. Then we run the algorithm in Theorem 5.2 to obtain a random
⊗
solution of Φ⊗ distributed perfectly as µCTrue . Then by Proposition 5.9, we perform Trans operation
to obtain σ Trans ∼ µCTrue which is just a perfect uniform random solution of Φ.
To check the running time, by Fact 5.8 we have ∆(Φ⊗ ) = ∆(Φ), p(Φ⊗ ) = p(Φ), k(Φ⊗ ) ≤
k(Φ)·Q(Φ)20 , and |Z| ≤ Q(Φ)|V |. Therefore by Lemma 5.16 and Theorem 5.2, it takes O(kQ∆|V |)
time in expectation to construct Φ⊗ and M, and O k∆3 Q2 |V | log(Q|V |) time in expectation to
do the sampling.
Before proving Proposition 5.17, we set up some technical lemmas.
Fact 5.19 (e.g., [Hoe94, Lemma 1 and Equation (4.16)]). Let X be an arbitrary random variable
with a ≤ X ≤ b almost surely. Let c = E[X]. Then for any t ∈ R, we have
b − c a·t c − a b·t
(b−a)2 t2
· ec·t .
E et·X ≤
·e +
·e ≤e 8
b−a
b−a
Fact 5.20. Our choice of η, τ1 , τ2 , t1 , t2 satisfies
log2 (3)t22
log2 (3)t21
KL(η+τ1 kη)
≤
τ
t
−
≤
τ 2 t2 −
log(x)
and
1
1
8
8
log(e)
KL(η−τ2 kη)
log(e)
and
t21
8
and
≤ τ 1 t1 −
log2 (5/2)t21
8
KL(η+τ1 kη)
log(e)
≤ τ 1 t1 −
log(x) and
KL(η+τ1 kη)
log(e)
t22
8
≤ τ 2 t2 −
log(x) and
KL(η−τ2 kη)
log(e)
log2 (5/2)t22
8
log(x) for all x ≥ 8,
log(x) for all x ∈ {3, 4, 6, 7} ,
≤ τ 2 t2 −
KL(η−τ2 kη)
log(e)
log(x) for x = 5.
Proof. Since log(x) is an increasing function for x > 0, it only needs to verify the first two inequalities at x = 8 and the middle two at x = 3. Then all of them can be verified numerically.
Now we give the proof of Proposition 5.17.
Proof of Proposition 5.17. Our construction will be independent for each v ∈ V and depend only
on |Ωv |. Item (1) (2) will be evident as we describe the construction.
Let N = |Ωv | ≥ 2. The simplest case N = 2 is essentially Lemma 5.15: Tv has one root with
two leaf nodes, and we put the root into M with probability η. By the choice of t1 , t2 (See the
calculation in Lemma 5.15), the two inequalities in Item (3) are actually equality.
With foresight, our construction for N ≥ 3 will satisfy the following conditions.
(A) E [X(v, q)] = η log(1/N ) for each q ∈ Ωv .
20
Actually we can upper bound k(Φ⊗ ) by k(Φ) · O(log(Q(Φ))). This is because the state tensorizations here are all
“balanced” binary trees of depth O(log(Q(Φ))). However this only slightly improves the bound on the running time
which is not our main focus.
48
(B) If N = 5, then X(v, q) ∈ [aN , bN ] always holds for all q ∈ Ωv where bN − aN ≤ log(5/2).
(C) If N ∈ {3, 4, 6, 7}, then X(v, q) ∈ [aN , bN ] always holds for all q ∈ Ωv where bN − aN ≤ 1.
(D) If N ≥ 8, then X(v, q) ∈ [aN , bN ] always holds for all q ∈ Ωv where bN − aN ≤ log(3).
Then we verify Item (3) given Item (A) (B) (C) (D) here:
h
i
h
i
E e−t1 ·X(v,q)−(η+τ1 )·t1 ·log(N ) = E e−t1 ·(X(v,q)−η log(1/N ))−τ1 ·t1 ·log(N )
n
o
(bN −aN )2 t21
≤ exp
−
τ
·
t
·
log(N
)
(by Fact 5.19 and Item (A))
1
1
8
o
n
)
= N −KL(η+τ1 kη)
≤ exp −KL(η + τ1 kη) · log(N
log(e)
(by Item (B) (C) (D) and Fact 5.20)
and similarly
i
i
h
h
t2 ·(X(v,q)−η log(1/N ))−τ2 ·t2 ·log(N )
t2 ·X(v,q)+(η−τ2 )·t2 ·log(N )
=E e
E e
n
o
(bN −aN )2 t22
≤ exp
−
τ
·
t
·
log(N
)
2
2
8
≤ N −KL(η−τ2 kη) .
(by Fact 5.19 and Item (A))
(by Item (B) (C) (D) and Fact 5.20)
Now we give the construction for N ≥ 3 and check Item (A) (B) (C) (D).
N = 5.
Let p5 ∈ [0, 1] be a constant to be determined. The construction for N = 5 is as follows.
• In Figure 3, we select the left tree with probability p5 and the right with probability 1 − p5 .
• After fixing the tree, we assign elements in Ωv to the leaf nodes uniformly to obtain Tv .
• Finally we put the internal nodes boxed by dashed squares into the marking M.
z0
z1
z0
z2
z1
z3
z2
z3
Figure 3: The construction for N = 5. Internal nodes boxed by dashed squares are put in M.
P
Recall X(v, q) =
z∈path(q,Tv )∩M log(Dz (z(q))) where z(q) ∈ childs(z) is the child node of z
such that q ∈ leafs(z(q)). Since the leaf nodes are uniform, for any fixed q ∈ Ωv we have
E [X(v, q) | the left tree] = (4 log(2/5) + log(1/5)) /5 < η log(1/5)
and
E [X(v, q) | the right tree] = (3 log(1/3) + 2 log(1/2)) /5 > η log(1/5).
Thus we can easily set p5 to make sure
E [X(v, q)] = p5 · E [X(v, q) | the left tree] + (1 − p5 ) · E [X(v, q) | the right tree] = η log(1/5),
which proves Item (A). As for Item (B), it suffices to observe
X(v, q) ∈ {log(1/5), log(1/3), log(2/5), log(1/2)} ⊆ [log(1/5), log(1/2)].
49
N ∈ {3, 4, 6, 7}. The construction for N ∈ {3, 4, 6, 7} is the same as N = 5 above except that
we now use Figure 4: we will mix the left and right trees to make sure E [X(v, q)] = η log(1/N ) as demanded by Item (A). To do so it suffices to check η log(1/N ) is sandwiched between E [X(v, q) | the left tree]
and E [X(v, q) | the right tree].
z0
z0
z1
z0
z1
z3
z1
(a) The construction for N = 3.
z0
z2
z4
z1
z3
z4
(b) The construction for N = 6.
z0
z0
z1
z1
z0
z2
z1
z2
z3
z2
(c) The construction for N = 4.
z0
z2
z4
z1
z5
z3
z2
z4
z5
(d) The construction for N = 7.
Figure 4: The construction for N ∈ {3, 4, 6, 7}.
For N = 3 we use Figure 4a. Then X(v, q) ∈ {log(1/3), log(2/3)} which proves Item (C). Also
E [X(v, q) | the left tree] = log(1/3) < η log(1/3)
and
E [X(v, q) | the right tree] = (log(1/3) + 2 log(2/3)) /3 > η log(1/3).
For N = 4 we use Figure 4c. Then X(v, q) ∈ {log(1/2), log(1/4)} which proves Item (C). Also
E [X(v, q) | the left tree] = log(1/2) > η log(1/4)
and
E [X(v, q) | the right tree] = log(1/4) < η log(1/4).
For N = 6 we use Figure 4b. Then X(v, q) ∈ {log(1/3), log(1/2)} which proves Item (C). Also
E [X(v, q) | the left tree] = log(1/3) < η log(1/6)
and
E [X(v, q) | the right tree] = log(1/2) > η log(1/6).
For N = 7 we use Figure 4d. Then X(v, q) ∈ {log(2/7), log(3/7), log(1/4), log(1/3)} which
proves Item (C). Also
E [X(v, q) | the left tree] = (4 log(2/7) + 3 log(3/7)) /7 > η log(1/7)
and
E [X(v, q) | the right tree] = (4 log(1/4) + 3 log(1/3)) /7 < η log(1/7).
50
N ≥ 8. For each x ∈ [N ] define A(N, x) = ⌊N/x⌋ · (x + 1) − N and B(N, x) = N − ⌊N/x⌋ · x.
Let R = ⌊N 1−η ⌋. The proof of the following technical result is deferred to Appendix A.
Fact 5.21. If x ∈ {R − 1, R, R + 1}, then 1 ≤ x ≤ N and A(N, x), B(N, x) ≥ 0.
For each x ∈ {R − 1, R, R + 1} we have the following construction (See Figure 5 for an intuition), the correctness of which is guaranteed by Fact 5.21.
• Let T1 , . . . , TA(N,x) (resp., TA(N,x)+1 , . . . , TA(N,x)+B(N,x) ) be balanced binary trees that each
has x (resp., x + 1) leaf nodes.21
• Define distribution Dx supported on T1 , . . . , TA(N,x)+B(N,x) by setting
(
x/N
Dx (Ti ) =
(x + 1)/N
i ≤ A(N, x),
i > A(N, x).
Then construct a binary tree on top of T1 , . . . , TA(N,x)+B(N,x) using Lemma 5.10 and Dx .22
• Assign elements in Ωv uniformly to A(N, x) · x + B(N, x) · (x + 1) = N leaf nodes to get Tv .
• Finally we put all the nodes on top of T1 , . . . , TA(N,x)+B(N,x) into the marking M.
T1
TA(N,x)+B(N,x)
T2
Figure 5: The construction for N ≥ 8 and x ∈ {R − 1, R, R + 1}. Nodes inside the dashed square
are put in M.
Observe that for any fixed x ∈ {R − 1, R, R + 1} and any q ∈ Ωv , the construction gives
X(v, q) ∈ {log(x/N ), log((x + 1)/N )} and
x B(N, x) · (x + 1)
x+1
x+1
x
A(N, x) · x
+
, log
log
log
∈ log
.
E [X(v, q) | x] =
N
N
N
N
N
N
Therefore
E [X(v, q) | x = R + 1] ≥ log((R + 1)/N ) = log
and
⌊N 1−η ⌋ + 1 /N ≥ η log(1/N )
E [X(v, q) | x = R − 1] ≤ log(R/N ) = log ⌊N 1−η ⌋/N ≤ η log(1/N ).
Now we have two cases:
21
Here “balanced” simply means the sub-trees of sibling nodes have size difference at most 1. In particular, this
guarantees Item (2) of Proposition 5.17.
22
Note that κ(Dx ) ≤ (x + 1)/x ≤ 2. By Item (2) of Lemma 5.10, Item (2) of Proposition 5.17 is still preserved.
Meanwhile, since A(N, x) + B(N, x) ≤ N/x = O(N η ), this step takes O(N η log(N )) = O(N ) time.
51
• If E [X(v, q) | x = R] ≥ η log(1/N )23 , we mix the constructionof x = R−1
make
and Rx= R toR+1
R−1
sure E [X(v, q)] = η log(1/N ) for Item (A). Then X(v, q) ∈ log N , log N , log N
which verifies Item (D) since R ≥ 2.
• Otherwise, we mix
ofx = R and x = R + 1 in the similar way. Then
the construction
R
R+2
X(v, q) ∈ log N
, log R+1
,
log
which also verifies Item (D).
N
N
Acknowledgement
KH wants to thank Weiming Feng for helpful discussion. KW wants to thank Chao Liao, Pinyan
Lu, Jiaheng Wang, Kuan Yang, Yitong Yin, Chihao Zhang for helpful discussion on related topics
in summer 2020. KW also wants to thank Kuan Yang for the help with Mathematica. Finally we
thank Weiming Feng, Chunyang Wang, and Kuan Yang for helpful comments on an earlier version
of the paper.
References
[ACG12] Ittai Abraham, Shiri Chechik, and Cyril Gavoille. Fully dynamic approximate distance
oracles for planar graphs via forbidden-set distance labels. In Proceedings of the fortyfourth annual ACM symposium on Theory of computing, pages 1199–1218. ACM,
2012. 4
[Alo91] Noga Alon.
A parallel algorithmic version of the local lemma.
Random
Struct. Algorithms, 2(4):367–378, 1991.
(Conference version in FOCS ’91).
doi:10.1002/rsa.3240020403. 2
[BC20] Siddharth Bhandari and Sayantan Chakraborty. Improved bounds for perfect sampling of k-colorings in graphs. In Konstantin Makarychev, Yury Makarychev, Madhur
Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL,
USA, June 22-26, 2020, pages 631–642. ACM, 2020. doi:10.1145/3357713.3384244.
4
[Bec91] József Beck. An algorithmic approach to the Lovász local lemma. Random Struct.
Algorithms, 2(4):343–365, 1991. doi:10.1002/rsa.3240020402. 2
[BGG+ 19] Ivona Bezáková, Andreas Galanis, Leslie A. Goldberg, Heng Guo, and Daniel
Štefankovič. Approximation via correlation decay when strong spatial mixing fails.
SIAM J. Comput., 48(2):279–349, 2019. doi:10.1137/16M1083906. 2
[BSVV08] Ivona Bezáková, Daniel Stefankovic, Vijay V. Vazirani, and Eric Vigoda. Accelerating
simulated annealing for the permanent and combinatorial counting problems. SIAM
J. Comput., 37(5):1429–1454, 2008. doi:10.1137/050644033. 5
[CS00] Artur Czumaj and Christian Scheideler.
Coloring nonuniform hypergraphs:
a new algorithmic approach to the general Lovász local lemma.
Random Struct. Algorithms,
17(3-4):213–237,
2000.
doi:10.1002/1098-2418(200010/12)17:3/4<213::AID-RSA3>3.0.CO;2-Y. 2
23
We emphasize that E [X(v, q) | x] has the same value for all q ∈ Ωv .
52
[EL75] Paul Erdös and László Lovász. Problems and results on 3-chromatic hypergraphs and
some related questions. Infinite and finite sets, 11:609–627, 1975. 2, 4, 11
[FGYZ20] Weiming Feng, Heng Guo, Yitong Yin, and Chihao Zhang. Fast sampling and counting
k-SAT solutions in the local lemma regime. In STOC, pages 854–867. ACM, 2020.
doi:10.1145/3357713.3384255. 2, 3, 4, 5, 6, 19
[FHY20] Weiming Feng, Kun He, and Yitong Yin. Sampling constraint satisfaction solutions
in the local lemma regime. arXiv preprint arXiv:2011.03915, 2020. 2, 3, 4, 5, 6, 9, 38
[Fil97] James Allen Fill. An interruptible algorithm for perfect sampling via Markov chains.
In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC),
pages 688–695, 1997. 4
[FMMR00] James Allen Fill, Motoya Machida, Duncan J Murdoch, and Jeffrey S Rosenthal.
Extension of Fill’s perfect rejection sampling algorithm to general chains. Random
Structures & Algorithms, 17(3-4):290–316, 2000. 4
[FVY19] Weiming Feng, Nisheeth K. Vishnoi, and Yitong Yin. Dynamic sampling from graphical models. In STOC, pages 1070–1081. ACM, 2019. 4
[GGGY20] Andreas Galanis, Leslie Ann Goldberg, Heng Guo, and Kuan Yang. Counting solutions
to random CNF formulas. In ICALP, volume 168 of LIPIcs, pages 53:1–53:14, 2020.
doi:10.4230/LIPIcs.ICALP.2020.53. 2, 3
[GH20] Heng Guo and Kun He. Tight bounds for popping algorithms. Random Struct.
Algorithms, 57(2):371–392, 2020. doi:10.1002/rsa.20928. 2
[GJ19] Heng Guo and Mark Jerrum. A polynomial-time approximation algorithm for allterminal network reliability. SIAM J. Comput., 48(3):964–978, 2019. 2
[GJL19] Heng Guo, Mark Jerrum, and Jingcheng Liu. Uniform sampling through the Lovász
local lemma. J. ACM, 66(3):18:1–18:31, 2019. (Conference version in STOC ’17).
doi:10.1145/3310131. 2, 3
[GLLZ19] Heng Guo, Chao Liao, Pinyan Lu, and Chihao Zhang. Counting hypergraph colorings
in the local lemma regime. SIAM J. Comput., 48(4):1397–1424, 2019. (Conference
version in STOC ’18). doi:10.1137/18M1202955. 2, 3
[GS01] Geoffrey R. Grimmett and David R. Stirzaker. Probability and random processes.
Oxford University Press, third edition, 2001. 31
[HDSMR16] Bryan He, Christopher De Sa, Ioannis Mitliagkas, and Christopher Ré. Scan order in
gibbs sampling: Models in which it matters and bounds on how much. Advances in
neural information processing systems, 29, 2016. 7
[HN99] Olle Haggstrom and Karin Nelander. On exact simulation of markov random fields
using coupling from the past. Scandinavian Journal of Statistics, 26(3):395–411, 1999.
4, 7, 29
[Hoe94] Wassily Hoeffding. Probability inequalities for sums of bounded random variables. In
The collected works of Wassily Hoeffding, pages 409–426. Springer, 1994. 37, 48
53
[HS17] David G. Harris and Aravind Srinivasan. A constructive Lovász local lemma for
permutations. Theory Comput., 13:Paper No. 17, 41, 2017. (Conference version in
SODA’14). doi:10.4086/toc.2017.v013a017. 2
[HS19] David G. Harris and Aravind Srinivasan. The Moser-Tardos framework with partial
resampling. J. ACM, 66(5):Art. 36, 45, 2019. (Conference version in FOCS ’13).
doi:10.1145/3342222. 2
[HSS11] Bernhard Haeupler, Barna Saha, and Aravind Srinivasan. New constructive aspects
of the Lovász local lemma. J. ACM, 58(6):28, 2011. (Conference version in FOCS ’10).
doi:10.1145/2049697.2049702. 2, 11
[HSZ19] Jonathan Hermon, Allan Sly, and Yumeng Zhang. Rapid mixing of hypergraph independent sets. Random Struct. Algorithms, 54(4):730–767, 2019. 8
[Hub98] Mark Huber. Exact sampling and approximate counting techniques. In Proceedings
of the thirtieth annual ACM symposium on Theory of computing, pages 31–40. ACM,
1998. 4, 7, 29
[Hub04] Mark Huber. Perfect sampling using bounding chains. Ann. Appl. Probab., 14(2):734–
753, 2004. 4
[JPV20] Vishesh Jain, Huy Tuan Pham, and Thuy Duong Vuong.
Towards
the sampling lovász local lemma.
CoRR, abs/2011.12196, 2020.
URL:
https://arxiv.org/abs/2011.12196, arXiv:2011.12196. 2, 3, 5, 42
[JPV21] Vishesh Jain, Huy Tuan Pham, and Thuy Duong Vuong. On the sampling lovász
local lemma for atomic constraint satisfaction problems. CoRR, abs/2102.08342, 2021.
URL: https://arxiv.org/abs/2102.08342, arXiv:2102.08342. 2, 3, 4, 5, 6, 8, 19,
42
[JSS20] Vishesh Jain, Ashwin Sah, and Mehtaab Sawhney.
Perfectly sampling k≥
(8/3 +o(1))∆-colorings in graphs.
CoRR, abs/2007.06360, 2020.
URL:
https://arxiv.org/abs/2007.06360, arXiv:2007.06360. 4
[JVV86a] Mark Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science, 43:169–
188, 1986. 2, 5
[JVV86b] Mark R. Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. Random generation of
combinatorial structures from a uniform distribution. Theoret. Comput. Sci., 43:169–
188, 1986. doi:10.1016/0304-3975(86)90174-X. 3, 4
[KM11] Kolipaka Kashyap, Babu Rao and Szegedy Mario. Moser and Tardos meet Lovász.
In STOC, pages 235–244, 2011. doi:10.1145/1993636.1993669. 2
[LP17] David A Levin and Yuval Peres. Markov chains and mixing times. American Mathematical Soc., 2017. doi:10.1090/mbk/107. 4, 27
[LS16] Eyal Lubetzky and Allan Sly. Information percolation and cutoff for the stochastic
ising model. Journal of the American Mathematical Society, 29(3):729–774, 2016. 8
54
[Moi19] Ankur Moitra. Approximate counting, the Lovász local lemma, and inference in
graphical models. J. ACM, 66(2):10:1–10:25, 2019. (Conference version in STOC ’17).
doi:10.1145/3268930. 2, 3
[Mos09] Robin A. Moser. A constructive proof of the Lovász local lemma. In STOC, pages
343–350, 2009. doi:10.1145/1536414.1536462. 2
[MR98] Michael Molloy and Bruce Reed. Further algorithmic aspects of the local lemma. In
STOC, pages 524–529, 1998. doi:10.1145/276698.276866. 2
[MT10] Robin A. Moser and Gábor Tardos. A constructive proof of the general Lovász local
lemma. J. ACM, 57(2):11, 2010. doi:10.1145/1667053.1667060. 2, 4, 35
[PW96] James G. Propp and David B. Wilson. Exact sampling with coupled Markov chains
and applications to statistical mechanics. Random Structures Algorithms, 9(1-2):223–
252, 1996. 4, 7, 29
[She85] James B. Shearer. On a problem of Spencer. Combinatorica, 5(3):241–245, 1985.
doi:10.1007/BF02579368. 2
[ŠVV09a] Daniel Štefankovič, Santosh Vempala, and Eric Vigoda. Adaptive simulated annealing:
A near-optimal connection between sampling and counting. J. ACM, 56(3):18, 2009.
doi:10.1145/1516512.1516520. 3
[SVV09b] Daniel Stefankovic, Santosh S. Vempala, and Eric Vigoda. Adaptive simulated annealing: A near-optimal connection between sampling and counting. J. ACM, 56(3):18:1–
18:36, 2009. doi:10.1145/1516512.1516520. 5
[Wig19] Avi Wigderson. Mathematics and computation. Princeton University Press, 2019. 6
A
Proof of Fact 5.21
Recall our setting: N ≥ 8 is an integer, R = ⌊N 1−η ⌋ where η = 0.595, A(N, x) = ⌊N/x⌋·(x+1)−N ,
and B(N, x) = N − ⌊N/x⌋ · x.
Fact (Fact 5.21 restated). If x ∈ {R − 1, R, R + 1}, then 1 ≤ x ≤ N and A(N, x), B(N, x) ≥ 0.
Proof. Since 1 ≤ x ≤ N is equivalent to 2 ≤ R ≤ N − 1, we only need to check
√
2 = ⌊81−η ⌋ ≤ ⌊N 1−η ⌋ = R ≤ N 1−η ≤ N ≤ N − 1.
Also B(N, x) is always non-negative as ⌊N/x⌋ ≤ N/x. Hence we focus on the A(N, x) ≥ 0 part.
Since ⌊t⌋ > t − 1, we have
N
N
− 1 · (x + 1) − N =
− x − 1.
A(N, x) >
x
x
Since n > t is equivalent to n ≥ ⌊t + 1⌋ for all integer n, we have
N
−x .
A(N, x) ≥
x
55
√
√
Therefore if x ≤ N , then A(N, x) ≥ 0. Since R = ⌊N 1−η ⌋ ≤ N , this shows A(N, R −√1) and
A(N, R) are all non-negative. Now√we deal with A(N, R + 1). Note that ⌊N 1−η ⌋ ≤ N 1−η ≤ N − 1
holds for all N ≥ 18, thus R + 1 ≤ N and A(N, R + 1) ≥ 0 for all N ≥ 18. Finally for 8 ≤ N ≤ 17,
A(N, R + 1) ≥ 0 can be verified numerically.
56