Academia.eduAcademia.edu

Perfect Sampling for (Atomic) Lovász Local Lemma

2021, ArXiv

We give a Markov chain based perfect sampler for uniform sampling solutions of constraint satisfaction problems (CSP). Under some mild Lovász local lemma conditions where each constraint of the CSP has a small number of forbidden local configurations, our algorithm is accurate and efficient: it outputs a perfect uniform random solution and its expected running time is quasilinear in the number of variables. Prior to our work, perfect samplers are only shown to exist for CSPs under much more restrictive conditions (Guo, Jerrum, and Liu, JACM’19). Our algorithm has two components: • A simple perfect sampling algorithm using bounding chains (Huber, STOC’98; Haggstrom and Nelander, Scandinavian Journal of Statistics’99). This sampler is efficient if each variable domain is small. • A simple but powerful state tensorization trick to reduce large domains to smaller ones. This trick is a generalization of state compression (Feng, He, and Yin, STOC’21). The crux of our analysis is a simple ...

Perfect Sampling for (Atomic) Lovász Local Lemma arXiv:2107.03932v1 [cs.DS] 8 Jul 2021 Kun He∗ Xiaoming Sun† Kewen Wu‡ Abstract We give a Markov chain based perfect sampler for uniform sampling solutions of constraint satisfaction problems (CSP). Under some mild Lovász local lemma conditions where each constraint of the CSP has a small number of forbidden local configurations, our algorithm is accurate and efficient: it outputs a perfect uniform random solution and its expected running time is quasilinear in the number of variables. Prior to our work, perfect samplers are only shown to exist for CSPs under much more restrictive conditions (Guo, Jerrum, and Liu, JACM’19). Our algorithm has two components: • A simple perfect sampling algorithm using bounding chains (Huber, STOC’98; Haggstrom and Nelander, Scandinavian Journal of Statistics’99). This sampler is efficient if each variable domain is small. • A simple but powerful state tensorization trick to reduce large domains to smaller ones. This trick is a generalization of state compression (Feng, He, and Yin, STOC’21). The crux of our analysis is a simple information percolation argument which allows us to achieve bounds even beyond current best approximate samplers (Jain, Pham, and Vuong, ArXiv’21). Previous related works either use intricate algorithms or need sophisticated analysis or even both. Thus we view the simplicity of both our algorithm and analysis as a strength of our work. 1 Introduction The constraint satisfaction problem (CSP) is one of the most important topics in computer science (both theoretically and practically). A CSP is a collection of constraints defined on a set of variables, and a solution to the CSP is an assignment of variables that satisfies all the constraints. For any given CSP, it is natural to ask the following questions: • Decision. Can we decide efficiently if the CSP has a solution? • Search. If the CSP is satisfiable, can we find a solution efficiently? • Sampling. If we can efficiently find a solution, can we efficiently sample a uniform random solution from the whole solution space? These questions, each deepening one above, progressively enhance our understanding on the computational complexity of CSPs. One can easily imagine the hardness of fully resolving these broad questions. Thus, not surprisingly, despite enormous results centered around them, we only have partial answers. Here we mention those related to our work. ∗ Institute of Computing Technology, Chinese Academy of Sciences. Email: hekun.threebody@foxmail.com Institute of Computing Technology, Chinese Academy of Sciences. Email: sunxiaoming@ict.ac.cn ‡ Department of EECS, University of California at Berkeley. Email: shlw kevin@hotmail.com † 1 The Decision Problem. A fundamental criterion for the existence of solutions is given by the famous Lovász local lemma (LLL) [EL75]. Interpreting the space of all possible assignments as a probability space and the violation of each constraint as a bad event, the local lemma provides a sufficient condition for the existence of an assignment to avoid all the bad events. This sufficient condition, commonly referred to as the local lemma regime, is characterized in terms of the violation probability of each constraint and the dependency relation among the constraints. The Search Problem. The algorithmic LLL (also called constructive LLL) provides efficient algorithms to find a solution in the local lemma regime. Plenty of works have been devoted to this topic [Bec91, Alo91, MR98, CS00, Mos09, MT10, KM11, HSS11, HS17, HS19]. The Moser-Tardos algorithm [MT10] is a milestone along this line: it finds a solution efficiently up to a sharp condition known as the Shearer’s bound [She85, KM11]. The Sampling Problem. The sampling LLL asks for efficient algorithms to sample a uniform random solution from all solutions in the local lemma regime. It serves as a standard toolkit for the probabilistic inference problem in graphical models [Moi19], and has many applications in the theory of computing, such as all-terminal network reliability [GJL19, GJ19, GH20]. With a better understanding of the decision and search problem, much attention has been devoted to the sampling LLL in recent years [GJL19, Moi19, GLLZ19, GGGY20, FGYZ20, FHY20, JPV20, JPV21]. Since this is also our focus, we elaborate it in the next subsection. 1.1 Sampling Lovász Local Lemma To state the long list of results on sampling LLL, we need the following notations. Given a CSP, let n be the number of variables, k be the maximum number of variables in each constraint, Q be the maximum number of values that each variable can take, ∆ be the maximum constraint degree, p be the maximum violation probability of a constraint, and N be the maximum number of forbidden local configurations for each constraint. A constraint is called atomic if it only has one forbidden local configuration. A CSP is called atomic if all of its constraints are atomic, i.e., N = 1. For example in Boolean k-CNF formula, each constraint will depend on exactly k Boolean variables and thus Q = 2, p = 2−k , N = 1. For hypergraph coloring, each vertex is allowed to choose a color from {1, 2, . . . , Q} and each edge contains exactly k vertices. The constraints require no edge is monochromatic. Therefore N = Q, p = Q1−k , and ∆ equals the maximum edge degree in the hypergraph. For general CSPs, each constraint may depend on different number of variables and each variable may have different domain size. The sampling LLL turns out to be computationally more challenging than the algorithmic LLL. For example, for k-CNF the Moser-Tardos algorithm can efficiently find a solution if ∆ . 2k where & informally hides lower order terms. However, it is intractable to approximately sample a uniform solution if ∆ & 2k/2 , even when the formula is monotone [BGG+ 19]. On the algorithmic side, most efforts are on the approximate sampling, where the output distribution is close to uniform under total variation distance. The breakthrough of Moitra [Moi19] shows k-CNF solutions can be sampled in time npoly(k∆) if ∆ . 2k/60 , where they novelly use the algorithmic LLL to mark/unmark variables and then convert the problem into solving linear programs of size npoly(k∆) . We remark that this algorithm is deterministic if we only need a multiplicative approximation of the number of solutions, which is another topic closely related with approximate sampling [JVV86a]. Moitra’s method has been successfully applied to hypergraph 2 colorings [GLLZ19] and random CNF formulas [GGGY20]1 . Recently, a much faster algorithm for sampling solutions of k-CNF is given in [FGYZ20], which implements a Markov chain on the assignments of the marked variables chosen  via Moitra’s method. e n1.001 together with an imThe resulting sampling algorithm has a near linear running time O e hides poly(N, k, ∆, Q, log(n)). We also remark that this algoproved regime ∆ . 2k/20 , where O rithm is inherently randomized even if we move to approximate counting. This nonadapive mark/unmark approach seems to only work for the Boolean variables, where each variable has two possible values. To extend the approach to general CSPs, Feng, He, and Yin [FHY20] introduced states compression, which considerably expands the  applicability of the e n1.001 if p1/350 · ∆ . 1/N . method used in [FGYZ20]. Their sampling algorithm runs in time O This algorithm is limited to the special cases of CSPs where each constraint is violated by a small number of local configurations (i.e., N is small). Recently, Jain, Pham, and Vuong [JPV20], shaving the dependency on N , provides a sampling algorithm with running time npoly(∆,k,log(Q)) when p1/7 · ∆ . 1. They revisit Moitra’s mark/unmark framework and use it in an adaptive way. This is the first polynomial time algorithm (assuming ∆, k, Q = O(1)) for general CSPs in local lemma regime. By a highly sophisticated information percolation argument, they [JPV21] also prove that the sampling algorithm in [FHY20] runs in  e n1.001 if p0.142 · ∆ . 1/N . time O Method [Moi19] [GLLZ19] [FGYZ20] [FHY20] [JPV20] [JPV21] k-CNF ∆ . 2k/60 Hypergraph Coloring General CSPs ∆ . Qk/16 ∆ . 2k/20 ∆ . 2k/13 ∆ . 2k/7 ∆ . 20.175k ∆ . Qk/9 ∆ . Qk/7 ∆ . Qk/3 p1/350 · ∆ . 1/N p1/7 · ∆ . 1 p0.142 · ∆ . 1/N Time npoly(k∆) npoly(k∆ log(Q))  e n1.001 O  e n1.001 O npoly(k∆ log(Q))  e n1.001 O Table 1: Approximate sampling algorithms in the local lemma regime. Table 1 summarizes the efficient regimes of these algorithms. We emphasize that all these sampling results, via standard reductions [JVV86b, ŠVV09a], also imply efficient algorithms for (random) approximate counting, which estimates the number of solutions within some multiplicative error. In addition, for algorithms using Moitra’s linear programming approach [Moi19, GLLZ19, GGGY20, JPV20], their approximate counting counterparts are deterministic. For the approaches using Markov chains [FGYZ20, FHY20, JPV21], the running time of their approximate counting e counterparts is O(m · T ), where T is the running time of the corresponding approximate sampling algorithm and m is the number of constraints. Though much progress has been made for the approximate sampling, much less are known for the perfect sampling. As far as we know, the only result on the perfect sampling in the local lemma regime is due to Guo, Jerrum, and Liu [GJL19], which provides a perfect sampler for the extremal CSPs where any two constraints sharing common variables cannot be violated simultaneously by the same assignment. Though there are known reductions from approximate sampling/counting to perfect sampling [JVV86b], it is unclear how to adopt them here considering the local lemma conditions. 1 [GGGY20] only provides an approximate counting algorithm for random CNF formulas. But with a close inspection, their algorithm can be turned to do approximate sampling. This follows from standard reductions and noticing fixing bad variables (defined in [GGGY20]) does not influence their (deterministic) algorithm. 3 Meanwhile, perfect sampling is an important topic in theoretical computer science. Plenty of works have been devoted to the study of perfect samplers [JVV86b, HN99, Hub98, Hub04, BC20, JSS20, Fil97, FMMR00, ACG12, FVY19]. Apart from its mathematical interest, one advantage of perfect sampler over approximate sampler is that the quality of the output is never in question. In contrast, some solution may never be outputted by an approximate sampler. Consider the following simple example: Let D1 and D2 be two distributions where D1 is uniform over {1, 2, . . . , n} and √ √ D2 is uniform over { n + 1, n + 2, . . . , n}. Then the total variation distance between D1 and D2 √ √ is only 1/ n = o(1). Thus D2 is considered a good approximation of D1 while the last n items are never sampled. This is indeed the case for [FGYZ20, FHY20, JPV21], and is undesirable if the CSP is used for addressing social problems and the missing solutions are contributed by the minority or the underrepresented. Besides the potential drawback in social fairness, this also leads to the following reasonable worry: is it possible that some solution is inherently harder to find than others. Fortunately our work shows this is not the case. Perfect sampling is also advantageous for practical purposes and heuristic algorithms. To perform a sampling task, Markov chain is arguably the most common approach. By the convergence theorem for Markov chains [LP17], it is usually easy to show the chain mixes to the desired distribution almost surely. However to provide a good bound for the mixing time is in general a difficult task. On the other hand, if the mixing time is unknown or poorly analyzed, it is not sure when to stop the chain so the output distribution is close enough to the desired one. However given the Markov chain, there are known techniques, like coupling from the past [PW96], to convert it into a perfect sampler, which always gives desired distribution when it stops even if we may not know any bounds on its expected running time. 1.2 Our Results In this paper, we provide perfect samplers for solutions of atomic CSPs in the local lemma regime. Though in previous paragraphs we only focus on sampling perfect uniform solution, our algorithm in fact works for general underlying distribution. Let Φ be an atomic CSP with variable set V and |V | = n where each v ∈ V is endowed with a distribution Dv supported on finite domain Ωv . Q • Let p be the maximum violation probability of a constraint under distribution v∈V Dv . • Let ∆ be the maximum constraint degree. • Let Q be the maximum size of a variable domain Ωv . • Let k be the maximum number of variables that a constraint depends on. Q Let µ be the distribution of solutions of Φ under v∈V Dv , i.e., Y  ′  µ(σ) = Pr σ = σ σ ′ is a solution of Φ for each σ ∈ Ωv . Q σ′ ∼ v∈V Dv v∈V The original Lovász local lemma [EL75] states if p · ∆ . 1 then Φ has a solution, i.e., µ is well-defined. Then the algorithmic LLL [MT10] shows one can efficiently find a solution under the same condition. Our main theorem shows one can efficiently sample a solution distributed as µ under a similar condition. Theorem 1.1 (Theorem 5.11, Informal). If pγ · ∆ . 1/c where q   3 + ln(c + 1) − ln2 (c + 1) + 6 ln(c + 1) Dv (q) and c = max 2, max max , γ= v∈V q,q ′ ∈Ωv Dv (q ′ ) 9 4 then our algorithm runs in expected time poly(k, Q, ∆) · n log(n) and outputs a random solution distributed perfectly as µ. We remark that for the uniform case (i.e., Dv is the uniform distribution), we have c = 2 and γ > 0.145, which already beats 1/7 from [JPV20] and 0.142 from [JPV21]. Theorem 1.1 is proved by a black-box reduction, using our state tensorization trick, to the perfect sampling algorithm on small variable domains. It is possible to get improved bounds for specific underlying distributions by a finer analysis. We take the uniform distribution as a starting example and improve 0.145 to 0.175 for general atomic CSPs. Theorem 1.2 (Corollary 5.18, Informal). If p0.175 · ∆ . 1 and each Dv is the uniform distribution, then our algorithm runs in expected time poly(k, Q, ∆) · n log(n) and outputs a perfect uniform random solution. We remark that this 0.175 also matches previous best bound for approximately uniform sampling solutions of k-CNF formula [JPV21]. In fact, in our analysis binary domains are the worst case for general atomic CSPs: the bound on k-CNF formula is the bottleneck for the bound on general atomic CSPs. Indeed, Theorem 1.2 can be further improved if variable domains are large. We use hypergraph coloring as an illustrating example, the bound of which matches the current best bound of approximate samplers [JPV21]. Theorem 1.3 (Theorem 5.14, Informal). Let H be a hypergraph on n vertices where each edge contains exactly k vertices. Let ∆ be the edge degree of H. Assume each vertex can choose a color from {1, 2, . . . , Q}, and a coloring of vertices is proper if it does not produce monochromatic edge. If ∆ . Qk/3 , then our algorithm runs in expected time poly(k, Q, ∆) · n log(n) and outputs a perfect uniform random proper coloring of H. Finally we briefly discuss the connection between our result and other topics. • Non-atomic CSPs. For a non-atomic CSP, let N be the maximum number of forbidden local configurations. We can convert it into an atomic CSP by decomposing the non-atomic constraints to atomic ones as in [FHY20], which only increases the constraint degree ∆ to at most ∆ · N . Then our perfect sampler is still efficient if N is small. • Approximate Sampling. Our perfect sampler is a Las Vegas algorithm with quasilinear expected running time T . It is well-known that terminating the algorithm after T /ε steps gives an ε-approximate sampler under total variation distance. In particular, the local lemma condition of our approximate sampler is the same as our perfect sampler, which breaks the current best record for atomic CSPs by [JPV20, JPV21]. We remark that one can obtain a better bound by analyzing the moments of the running time of our perfect sampler. We do not make the effort here since this is not our focus and this may require stronger local lemma conditions. • Approximate Counting. One way to reduce counting to sampling is to start from a CSP with no constraint, then add clauses one by one and use the self-reducibility [JVV86a]. Another strategy is to use the simulated annealing approach developed in [BSVV08, SVV09b, FGYZ20]. Both reductions produce efficient randomized approximate counting algorithms. We refer interested readers to their paper for details. Similarly as in the approximate sampling case, the local lemma condition of our approximate counting algorithm is the same as our perfect sampler, which breaks the current best record for atomic CSPs by [JPV20, JPV21]. 5 1.3 Proof Overview To illustrate the idea, we first focus on sampling a uniform solution of k-CNF formula: • There are n Boolean variables. Each variable is endowed with the uniform distribution over {0, 1}, and appears in at most d constraints. • Each constraint is a clause depending on exactly k variables and has exactly one forbidden local assignment. For example, (x1 ∨x2 ∨¬x3 )∧(x1 ∨x5 ∨x7 )∧(x2 ∨¬x4 ∨¬x6 ) is a 3-CNF formula where n = 6, m = 3 and k = 3, d = 2. Similar as previous works to deal with the connectivity issue of Glauber dynamics [Wig19], the first step of our algorithm is to mark variables so every clause has a certain amount of marked and unmarked variables. Let V be the set of variables and M ⊆ V be the set of marked variables. Then by the local lemma (Theorem 2.3), for any σ ∈ {0, 1}M and any v ∈ M the following two distributions are close under total variation distance: • An unbiased coin in {0, 1}. • The distribution of σ ′ (v) where σ ′ ∈ {0, 1}V is a uniform random solution conditioning on σ ′ (M \ {v}) = σ(M \ {v}). We call this local uniformity. To sample a solution approximately, previous works [FGYZ20, FHY20, JPV21] simulate an idealized Glauber dynamics PGlauber (Algorithm 7) on the assignments of the marked variables as follows: (A) Initialize σ(v) ∼ {0, 1} uniformly and independently for each v ∈ M. (B) Going forward from time 0 to T → +∞, let vt be the variable selected at time t ≥ 0. Iteratively find all clauses that are not yet satisfied by σ(M \ {vt }) and are connected to vt (Algorithm 1). Let Φ′ be this sub-k-CNF. Then update σ(vt ) ← σ ′ (vt ), where σ ′ ∈ {0, 1}V is a uniform random solution of Φ′ conditioning on σ ′ (M \ {vt }) = σ(M \ {vt }). Algorithmically, σ ′ is provided via rejection sampling (Algorithm 2) on Φ′ . (C) After Step (A) (B), extend σ to unmarked variables V \ M by sampling a uniform random solution conditioning on σ(M). To sample a solution perfectly, we simulate bounding chains PBChains (Algorithm 5) of PGlauber as follows: The algorithm guarantees at any point, each variable v ∈ M is assigned with a value in {0, 1, ⋆} where ⋆ represents uncertainty. (1) Initialize σ(v) = ⋆ for each v ∈ M. (2) Going forward from time −T to −1, let vt be the variable selected at time −T ≤ t < 0. Iteratively find all clauses that are not yet satisfied by σ(M \ {vt }) and are connected with vt (Algorithm 1). Let Φ′ be this sub-CSP. – If all marked variables connected to vt in Φ′ have value 0 or 1, we say vt is coupled. Then we update σ(vt ) by rejection sampling on Φ′ and σ(vt ) is always updated to 0 or 1. – Otherwise σ(vt ) is updated based on the local uniformity, which may be assigned to ⋆ with small probability (SafeSampling subroutine in Algorithm 5). 6 (3) After Step (1) (2), – if some marked variable has value ⋆, then we double T and re-run Step (1) (2);2 – otherwise we stop and extend σ to unmarked variables V \ M by sampling a uniform random solution conditioning on σ(M). To simplify the analysis of the algorithm, we use systematic scan for PGlauber and PBChains rather than random scan [HDSMR16]. Specifically, at time t ∈ Z the algorithm always updates the variable with index (t mod m) (Algorithm 7) where m = |M|. Let µM be the distribution of a uniform random solution projected on the marked variables M. Our goal is to prove the following claims for PBChains : • Correctness. When we stop in Step (3), σ(M) has distribution µM (Subsection 4.2). • Efficiency. In expectation, each update in Step (2) is efficient. (Subsection 4.1.1) • Coalescence. In expectation, we stop with T = O(n log(n)) (Subsection 4.1.2). Proof of Correctness. Firstly we show PGlauber converges to µM in Step (B) when T → +∞. Though it is a time inhomogeneous Markov chain, we are able to embed it into a time homogeneous Markov chain P ′ by viewing |M| consecutive updates as one step. Then it is easy to check P ′ is aperiodic and irreducible with unique stationary distribution µM . After that, we unpack P ′ to show PGlauber also converges to µM (Lemma 4.22). Next, we use the idea of coupling from the past [PW96] and bounding chains [Hub98, HN99]. • Coupling from the Past. Observe that for any positive integer L if we run PGlauber from −L · m to −1, it has the same distribution as we run it from 0 to L · m − 1. Thus by the argument above, Step (B) also has distribution µM if we run PGlauber from time −∞ to −1. • Bounding Chains. For each t ∈ Z, if σ(M) in PBChains has no ⋆, then the update process is exactly PGlauber . This means PBChains is a coupling of PGlauber (Proposition 4.25). Note that we use ⋆ to denote uncertainty which includes all possible assignments that we need to couple. Thus when PBChains stops at time T with σ b ∈ {0, 1}M at Step (3), any assignment, going through the updates from time −T to −1, converges to σ b. Combining the two observations above, we know σ b is distributed exactly as µM . Proof of Efficiency. We first remark that only the rejection sampling is time consuming, and its running time is a geometric distribution with expectation controlled by the local uniformity (Proposition 3.3). Thus to bound its expectation, it suffices to bound the size of Φ′ (Proposition 4.11). This uses the same 2-tree argument as in previous works and we briefly explain here. Since k-CNF has bounded degree, if Φ′ is large then we can find a large independent set S of clauses in Φ′ . Note that clauses in Φ′ are connected, thus we can further assume S is a 2-tree — S will be connected if we link any two clauses in S that are at distance 2. Intuitively, a 2-tree is an independent set that is not very spread out. Then it suffices to union bound the probability that some large 2-tree growing out of vt survives after previous updates. One potential pitfall is the total running time of PBChains depends on both T and the running time of each update, where they can be arbitrarily correlated. Thus we need to calculate the second 2 We remark that the randomness is reused. That is, the randomness used for time t < 0 is the same one regardless of the starting time −T . 7 moment of the update time (Subsection 4.1.1) and apply Cauchy-Schwarz inequality to break the correlation (Subsection 4.4). Proof of Coalescence. To upper bound the round T , we employ the information percolation argument similarly used in [LS16, HSZ19, JPV21]. For the sake of analysis, we assume Step (2) in PBChains also includes unmarked variables though it does nothing for the update, i.e., vt is the variable with index (t mod n) where n = |V |. The crucial observation is the following. If σ(vt0 ) is updated to ⋆ at time t0 , then at that point there must be some variable u 6= vt0 with value ⋆ and connected to vt0 . Let t1 be the last update time of u before t0 , and thus u = vt1 . Then we can find another variable u′ 6= vt1 with value ⋆ and connected to vt1 at time t1 . Continuing this process until we reach the initialization phase, we will find a list of time 0 > t0 > t1 > · · · > tℓ ≥ −T such that for each time ti , • σ(vti ) is updated to ⋆, • vti is connected to vti+1 and vti+1 has value ⋆. To express constraints through time, we define the extended constraint (e, C) (Definition 4.12), where C is a clause and e = {t′1 , . . . , t′k } ⊆ {−T, . . . , −1} is a time sequence such that • vt′1 , . . . , vt′k are the variables C depends on, • t′1 , . . . , t′k are succinct rounds of update for each vt′1 , . . . , vt′k . Since each vti and vti+1 are connected at time ti as we discussed above, it means we are able to find extended constraints (ei1 , C1i ), . . . , (eisi , Csii ) such that ti ∈ ei1 and ti+1 ∈ eisi and ei1 , . . . , eisi are connected over {−T, . . . , −1}. Thus all the edges {eij }i,j form a connected sub-hypergraph H on vertex set {−T, . . . , −1}. Since each extended constraint (e, C) represents succinct rounds of update for variables in C, e has range less than n = |V |, i.e., maxt1 ,t2 ∈e |t1 − t2 | < n. Thus the longest path P in H has length |P| = Ω(T /n). On the other hand, Step (2) of PBChains only finds clauses that are not satisfied by marked variables, which means each fixed extended constraint (e, C) appears in H with extremely low (roughly 2−k ) probability. Putting everything together, if we do not stop at round T , we should find a path P of extended constraints of length |P| = Ω(T /n). Meanwhile, each fixed extended constraint is found with probability at most roughly 2−k . Thus any fixed P exists with probability at most 2−k·|P|/2 , since extended constraints in odd positions of P form an independent set. Moreover, it is easy to see each extended constraint overlaps with O(k 2 d) many other extended constraints, which provides an upper bound O(k2 d)|P| for the number of possible P. By union bound, the probability that we do not stop at round T is roughly  4 2 Ω(T /n) k d ≪ n · 2−T /n , n· 2k where we assume k4 d2 ≪ 2k and the additional n comes from choosing t0 ∈ {−1, . . . , −n}, i.e., the last update resulting in ⋆. We remark that to deal with general atomic CSPs, we need to be more careful with the union bound (Subsection 4.1.2). This is because in general atomic CSPs constraints may depend on different number of variables. Nevertheless, our analysis is much simpler than the one in [JPV21]. One main reason is that we use systematic scan instead of random scan, which makes the updates of each variable well behave through time. Moreover, our main data structure, extended constraint (Definition 4.12), is also much simpler than the discrepancy checks used in their argument. 8 State Tensorization. The marking process is only efficient when the variable domains are small; otherwise it cannot guarantee useful local uniformity. Similarly for ⋆, it compresses too much information when the domain is large. Therefore to deal with large domains, we introduce a simple state tensorization trick to perform the reduction. For intuition, let’s consider the following concrete atomic CSP Φ: • The variables are u, v where u is endowed with distribution Du by Du (a) = Du (b) = Du (c) = 1/3; and v is endowed with Dv by Dv (A) = Dv (B) = 1/4, Dv (C) = 1/3, Dv (D) = 1/6. • The constraints are C1 , C2 where C1 = False iff u = a; and C2 = False iff u = c, v = B. Then we describe one possible state tensorization as follows (See Figure 1): • Define variables u1 , u2 where u1 is endowed with Du1 by Du1 (0) = 2/3, Du1 (1) = 1/3; and u2 is endowed with Du2 (0) = Du2 (1) = 1/2. We interpret u = a if (u1 , u2 ) = (0, 0); u = b if (u1 , u2 ) = (0, 1); and u = c if u1 = 1. • Define variables v1 , v2 , v3 where v1 is endowed with Dv1 by Dv1 (0) = Dv1 (1) = 1/2; and v2 is endowed with Dv2 (0) = Dv2 (1) = 1/2; and v3 is endowed with Dv3 (0) = 2/3, Dv3 (1) = 1/3. We interpret v = A if (v1 , v2 ) = (0, 0); u = C if (v1 , v3 ) = (1, 0), etc. u1 u u2 a b c A B C D 1 3 1 4 1 3 1 3 1 4 1 3 1 2 1 6 a v1 1 3 2 3 v 1 2 v2 c 1 2 b 1 2 1 2 A v3 1 2 2 3 B C 1 3 D (b) The new variables u1 , u2 and v1 , v2 , v3 . (a) The original variables u and v. Figure 1: An example for state tensorization. Hence after the state tensorization, C1 = False iff u1 = 0, u2 = 0; and C2 = False iff u1 = 1, v1 = 0, v2 = 1. Moreover, sampling the value of u, v from Du × Dv is equivalent to first sampling the value of u1 , u2 , v1 , v2 , v3 from Du1 × Du2 × Dv1 × Dv2 × Dv3 and then interpret the value of u, v from them. Therefore, to obtain a random solution under distribution Du × Dv , it suffices to first obtain a random solution under the product distribution of u1 , u2 , v1 , v2 , v3 and then interpret them back. Most importantly, this reduction does not change the violation probability of any individual constraint, nor change the dependency relation among constraints. These essentially guarantees that the desired local lemma condition does not deteriorate after the reduction. The formal description of the reduction can be found in Subsection 5.2. We remark that state tensorization, combined with the marking M, generalizes the state compression technique in [FHY20]. On the other hand, state tensorization is similar to standard gadget reduction in the study of complexity theory. For example by encoding large alphabets using binary bits, one can show Boolean CSPs are no easier to solve than CSPs with large variable domains for polynomial time algorithms. However we are not aware of such simple idea being used for perform sampling tasks. Organization. We give formal definitions in Section 2. Useful subroutines are provided in Section 3 and then we describe and analyze our main algorithm in Section 4. We discuss our result for different applications in Section 5. 9 2 Preliminaries We use e ≈ 2.71828 to denote the natural base. We use log(·) and ln(·) to denote the logarithm with base 2 and e respectively. We use [N ] to denote {1, 2, . S . . , N }; and use Z to denote the set of all integers. We say V is a disjoint union of (Vi )i∈[s] if V = i∈[s] Vi and Vi ∩ Vj = ∅ holds for any distinct i, j ∈ [s]. For positive integer m, t mod m = t − m · ⌊t/m⌋ for non-negative integer t; and t mod m = t · (1 − m) mod m for negative integer t. Q For any index set Q I and domains (Ωi )i∈I , we use i∈I Ωi to denote their product space. For some vector Q vec ∈ i∈I Ωi , we use vec(i) ∈ Ωi to denote the entry of vec indexed by i; and use vec(J) ∈ i∈J Ωi to denote the entries of vec on indices J ⊆ I. For a finite set X and a distribution D over X , we use x ∼ D to denote that x is a random variable sampled from X according to distribution D. For two events E1 , E2 with Pr [E2 ] = 0, we define the conditional probability Pr [E1 (x) | E2 (x)] = 0. We say event E happens almost surely if Pr [E] = 1. Constraint Satisfaction Problems. QLet V be a set of variables with finite domains (Ωv )v∈V . A constraint C on VQis a mapping C : v∈V Ωv → {True, False}. We say C depends on v ∈ V if there exists σ1 , σ2 ∈ v∈V Ωv such that C(σ1 ) 6= C(σ2 ) and σ1 , σ2 differ in (and only in) v. We use vbl(C) to denote the set of variables that C depends on, then C can be viewed as a mapping from Q v∈vbl(C) Ωv to {True, False}. Q Q C C ⊆ v∈vbl(C) Ωv ) to denote the set ⊆ v∈vbl(C) Ωv (resp., σTrue For convenience we use σFalse of falsifying (resp., satisfying) assignments of C. More generally, for C being a set of constraints, C ) to denote the set of falsifying (resp., satisfying) assignments of C, i.e., C (resp., σTrue we use σFalse C C C(σ) = False for all σ ∈ σFalse and some C ∈ C (resp., C(σ) = True for all σ ∈ σTrue and all C ∈ C). For sampling LLL, we also need to specify the underlying distribution. Assume each v ∈ V has some distribution Dv supported on Ωv .3 Define µCTrue as the distribution of solutions of C induced by (Dv )v∈V , i.e., Y  ′  C µCTrue (σ) = Pr σ = σ σ ′ ∈ σTrue for each σ ∈ Ωv . Q σ′ ∼ v∈V Dv v∈V Definition 2.1 ((Atomic) Constraint Satisfaction Problem). A constraint satisfaction problem is specified by Φ = V, (Ωv , Dv )v∈V , C where C is a set of constraints on V and each Dv is a distribution supported on Ωv . C = 1 for all C ∈ C. In this case, we abuse the notation to define We say Φ is atomic if σFalse C σFalse as the unique falsifying assignment of C. In addition, we define the following measures for Φ: • the width is k = k(Φ) = maxC∈C |vbl(C)|; • the variable degree is d = d(Φ) = maxv∈V |{C ∈ C | v ∈ vbl(C)}|; • the constraint degree is ∆ = ∆(Φ) = maxC∈C |{C ′ ∈ C | vbl(C) ∩ vbl(C ′ ) 6= ∅}|;4 • the domain size is Q = Q(Φ) = maxv∈V |Ωv |; • the maximal individual falsifying probability is p = p(Φ) = max C∈C σ∼ 3 4 Q Pr v∈vbl(C) Dv [C(σ) = False] = max C∈C X C σ∈σFalse Y v∈vbl(C) Dv (σ(v)). This means Dv (U ) > 0 for all U ∈ Ωv . One natural choice is the uniform distribution. Here ∆ is one plus the maximum degree of the dependency graph of Φ since C ∈ {C ′ ∈ C | vbl(C) ∩ vbl(C ′ ) 6= ∅}. 10 We will simply use k, d, ∆, Q, p when Φ is clear from the context. In addition we assume ∆ ≥ 2, d ≥ 2, and |V | ≥ 2 since otherwise the constraints in Φ are independent and the sampling problem becomes trivial. Lovász Local Lemma. The Lovász local lemma provides sufficient conditions to guarantee the existence of a solution of CSPs.  Theorem 2.2 ([EL75]). Let Φ = V, (Ωv , Dv )v∈V , C be a CSP. If ep∆ ≤ 1, then σ∼ QPr v∈V Dv   C σ ∈ σTrue ≥ (1 − ep)|C| > 0. The following more general version, first stated in [HSS11], can be proved with minor modification of the original proof of the Lovász local lemma [EL75].  Theorem 2.3 ([HSS11, Theorem 2.1]). Let Φ = V, (Ωv , Dv )v∈V , C be a CSP. If ep∆ ≤ 1, then C 6= ∅ and for any constraint B (not necessarily from C) we have σTrue Pr [B(σ) = True] ≤ (1 − ep)−|Γ(B)| σ∼µC True σ∼ QPr v∈V Dv [B(σ) = True] , where Γ(B) = {C ∈ C | vbl(C) ∩ vbl(B) 6= ∅}. Hypergraphs. Here we give definitions related with hypergraphs. All the definitions directly translate to graphs when every edge in the hypergraph contains two vertices. Let H be a hypergraph with finite vertex set V (H) and finite edge set E(H). Each edge e ∈ E(H) is a non-empty subset of V (H). We allow multiple occurrence of a same edge. For any  CSP Φ = V, (Ωv , Dv )v∈V , C we naturally view it as a hypergraph H(Φ) where V (H(Φ)) = V and E(H(Φ)) = {vbl(C)}C∈C . (1) Similar as the measures of CSPs, we define the following measures for a hypergraph H: • the width is k = k(H) = maxe∈E(H) |e|; • the vertex degree is d = d(H) = maxv∈V (H) |{e ∈ E(H) | v ∈ e}|; • the edge degree is ∆ = ∆(H) = maxe∈E(H) |{e′ ∈ E(H) | e ∩ e′ 6= ∅}|. For any two vertices u, v ∈ V (H), we say they are adjacent if there exists some e ∈ E(H) such that u ∈ e and v ∈ e; we say they are connected if there exists a vertex sequence w1 , w2 , . . . , wd ∈ V (H) such that w1 = u, wd = v and each wi , wi+1 are adjacent. Then hypergraph H is connected if any two vertices u, v ∈ V (H) are connected. Furthermore, we have the following basic fact regarding connected hypergraphs. Fact 2.4. Assume H is a connected hypergraph. Then for any e, e′ ∈ E(H), there exists a sequence of edges e1 , e2 , . . . , eℓ such that the following holds. • e1 = e, eℓ = e′ , and ei ∩ ei+1 6= ∅ for all i ∈ [ℓ − 1]. • ei ∩ ej = ∅ for all i, j ∈ [ℓ] with |i − j| > 1. A hypergraph H ′ is a sub-hypergraph of H if V (H ′ ) ⊆ V (H) and E(H ′ ) ⊆ E(H). If in addition e ∩ V (H ′ ) = ∅ holds for all e ∈ E(H) \ E(H ′ ), we say H ′ is an induced sub-hypergraph of H. 11  Marking. Apart from the CSP Φ = V, (Ωv , Dv )v∈V , C itself, our algorithms will also need a subset of V which we call marking. Both the correctness and efficiency of our algorithms rely on the marking. Assume Φ is atomic and recall our measures for Φ from Definition 2.1. We define the following constants given marking M, the meaning of which will be clear as we proceed to the next section: • The maximal conditional falsifying probability of (Φ, M) is Y C α = α(Φ, M) = max Dv (σFalse (v)). C∈C (2) v∈vbl(C)\M • When eα ≤ 1, define – the multiplicative bias of (Φ, M) as β = β(Φ, M) = (1 − eα)−d ; (3) – the maximal multiplicative-biased falsifying probability of (Φ, M) as Y C (v)); ρ = ρ(Φ, M) = max β · Dv (σFalse (4) – the maximal multiplicative-biased unpercolated probability of (Φ, M) as Y  C (v)) + (β − 1) (|Ωv | − 2) . β · Dv (σFalse λ = λ(Φ, M) = max |vbl(C)|2 (5) C∈C C∈C v∈vbl(C)∩M v∈vbl(C)∩M When context is clear, we will just use α, β, ρ, λ. Wildcard. We will reserve ⋆ as a wildcard symbol. Our algorithms will use ⋆ to represent all the possibilities in some domain.  Let Φ = V, (Ωv , Dv )v∈V , C be For each v ∈ V we assume ⋆ ∈ / Ωv and define Ω⋆v = Q a CSP. Ωv ∪ {⋆}. For any C ∈ C and σ ∈ v∈V Ω⋆v we abuse notation to define ( Q False ∃σ ′ ∈ v∈V Ωv such that C(σ ′ ) = False and σ(v) ∈ {σ ′ (v), ⋆} for all v ∈ V, (6) C(σ) = True otherwise. Here we define CSPs projected on assignments with ⋆ coordinates. Intuitively this assignment fixes (and only fixes) the non-⋆ variables for the CSP. But pedantically we provide the following definition. Q Definition 2.5 (Projected Constraint Satisfaction Problem). For any σ ∈ v∈V Ω⋆v , we define the projected constraint satisfaction problem Φ|σ = V, (Ωv |σ , Dv |σ )v∈V , C|σ by setting ( (Ωv , Dv ) σ(v) = ⋆, (Ωv |σ , Dv |σ ) = for all v ∈ V ({σ(v)} , point distribution) otherwise and C|σ = {C|σ | C ∈ C, C(σ) = False} where C|σ has the same evaluation rule as C ∈ C but depends on possibly fewer variables, i.e., vbl(C|σ ) = {v ∈ vbl(C) | σ(v) = ⋆} ⊆ vbl(C). Similarly we sometimes view C|σ as constraint only on vbl(C|σ ). Recall the measures defined in Definition 2.1. We note the following simple fact. Fact 2.6. k(Φ|σ ) ≤ k(Φ), d(Φ|σ ) ≤ d(Φ|σ ), ∆(Φ|σ ) ≤ ∆(Φ), and Q(Φ|σ ) ≤ Q(Φ). Moreover, if Φ is an atomic CSP, then Φ|σ is an atomic CSP. 12 3 Useful Subroutines In this section, we provide some useful subroutines for later reference in our main algorithm. 3.1 A Component Subroutine Recall notations defined in Definition 2.5. We first set up the Q following Component(Φ, M, σ, u) subroutine, which uses u ∈ V and current assignment σ ∈ v∈V Ω⋆v to (hopefully) decompose projected CSP Φ|σ into two disjoint parts: One containing u and one isolated from u. For our purpose, the input will guarantee that Φ is atomic, σ(u) = ⋆, and σ(v) ≡ ⋆ for all v ∈ V \ M. Algorithm 1: The Component subroutine  Input: an atomic CSP Φ = V, (Ωv , Dv )v∈V , C , a marking M ⊆ V , an assignment  Q V \M ⋆ σ∈ , and u ∈ V with σ(u) = ⋆ v∈M Ωv × {⋆} ′ ′ Output: (Φ , Token) where Φ = V ′ , (Ωv |σ , Dv |σ )v∈V ′ , C ′ and Token ∈ {True, False} 1 Initialize V ′ ← {u} and C ′ ← ∅ 2 while ∃C|σ ∈ C|σ \ C ′ with vbl(C|σ ) ∩ V ′ 6= ∅ do 3 if vbl(C|σ ) ⊆ {u} ∪ (V \ M) then Update V ′ ← V ′ ∪ vbl(C|σ ) and C ′ ← C ′ ∪ {C|σ } 4 else return (Φ′ , False) 5 end 6 return (Φ′ , True) Here we note the following observation regarding Algorithm 1. Proposition 3.1. The following holds for Component(Φ, M, σ, u). (1) It runs in time O (∆k|C ′ | + dk). C (v) holds for any (2) u ∈ V ′ ⊆ V , C ′ ⊆ C|σ , and σ(v) = ⋆ for all v ∈ V ′ . Moreover, σ(v) = σFalse ′ C|σ ∈ C and v ∈ (vbl(C) ∩ M) \ {u}. (3) Φ′ is an atomic CSP and hypergraph H(Φ′ ) (Defined in (1)) is connected.  (4) If Token = True, let V ′′ = V \ V ′ and C ′′ = C|σ \ C ′ . Then Φ′′ = V ′′ , (Ωv |σ , Dv |σ )v∈V ′′ , C ′′ is C| ′ ′′ C| ′ ′′ σ σ C C = σTrue an atomic CSP. Moreover σTrue × σTrue = µCTrue × µCTrue . and µTrue Proof. Item (2) is evident from the algorithm, Φ being atomic, and Definition 2.5. For Item (3), note that each time we add a constraint C|σ into C ′ , we add vbl(C|σ ) into V ′ . Thus Φ′ is a CSP. Since Φ is atomic, by Fact 2.6 Φ′ is also atomic. In addition, we only consider C|σ with vbl(C|σ ) ∩ V ′ 6= ∅ at Line 2, hence H(Φ′ ) is connected. For Item (1), the algorithm can be executed by first checking all (at most d) constraints related with u, then iteratively checking (at most ∆|C ′ | in total) constraints related with the newly added constraints in C ′ . Thus the total running time is O(k) · (∆|C ′ | + d). Now we focus on Item (4) when Token = True. The condition from Line 2 implies for any C|σ ∈ C ′′ , vbl(C|σ ) ∩ V ′ = ∅ and thus vbl(C|σ ) ⊆ V ′′ . Therefore Φ′′ is a CSP. Since Φ is atomic, by Fact 2.6 Φ′′ is also atomic. Then the “moreover” part follows from V (resp., C|σ ) being a disjoint union of V ′ and V ′′ (resp., C ′ and C ′′ ). 13 3.2 A RejectionSampling Subroutine The following simple perfect sampler, which is based on the standard rejection sampling technique, will be a building block for our main algorithm. Algorithm 2: The RejectionSampling algorithm  Input: a CSP Φ = V, (Ωv , Dv )v∈V , C and a randomness tape r Output: an assignment σ ∈ µCTrue 1 while True do Q 2 Sample σ ∼ v∈V Dv with fresh randomness from r 3 if C(σ) = True for all C ∈ C then return σ 4 end We have the following result on Algorithm 2 by basic facts of geometric distributions. C 6= ∅. Fact 3.2. The following holds for RejectionSampling(Φ, r) over random r if σTrue • It halts almost surely, and outputs σ ∼ µCTrue when it halts. • Let T be the number of while iterations it takes before it halts. Then E[T ] = 1 Prσ∼Qv∈V Dv  C σ ∈ σTrue • Let X be its total running time.5 Then E[X] = O (E[T ] · (k|C| + Q|V |))  and and   E T 2 = 2 · (E[T ])2 − E[T ].     E X 2 = O (E[T ] · (k|C| + Q|V |))2 . The following result is useful when we perform rejection sampling on a projected CSP, say Φ′ from Algorithm 1.  Proposition 3.3. Let Φ = V, (Ωv , Dv )v∈V , C be an atomic CSP and M ⊆ V be a marking. Let  Q V \M ⋆ k = k(Φ), ∆ = ∆(Φ), Q = Q(Φ), α = α(Φ, M), and β = β(Φ, M). Let σ ∈ v∈M Ωv × {⋆}  ′ ′ ′ be an arbitrary assignment and Φ = V , (Ωv |σ , Dv |σ )v∈V ′ , C be an arbitrary sub-CSP of Φ|σ where V ′ ⊆ V and C ′ ⊆ C|σ . If eα∆ ≤ 1, then the following holds for RejectionSampling(Φ′, r) over random r. (1) Let X be its running time. Then X < +∞ almost surely and, if hypergraph H(Φ′ ) (Defined in (1)) is connected, we have ! !  2 kQ|C ′ | + Q (kQ|C ′ | + Q)2 E[X] = O and E X = O . ′ ′ (1 − eα)|C | (1 − eα)2·|C | ′ (2) Let σ ′ be its output. Then σ ′ ∼ µCTrue . Moreover for any v ∈ V ′ and any q ∈ Ωv |σ , we have   Pr σ ′ (v) = q ≤ β · Dv |σ (q). 5 Each while iteration can be performed in time O(k|C| + Q|V |) where O(Q) · |V |) is for Line 2 and O(k) · |C| is for Line 3. 14 Proof. Firstly by Fact 2.6, ∆(Φ′ ) ≤ ∆. Then for any C|σ ∈ C ′ , we have σ e∼ Q = Pr v∈vbl(C|σ ) Y v∈vbl(C|σ ) ≤ Y [C|σ (e σ ) = False] C Dv (σFalse (v)) v∈vbl(C)\M ≤ α. Dv |σ (since Φ is atomic and by Definition 2.5) C (v)) Dv (σFalse (since σ(V \ M) = ⋆V \M and thus vbl(C|σ ) ⊆ V \ M) (by (2)) Thus p(Φ′ ) ≤ α (Recall p from Definition 2.1) and we apply Theorem 2.2 to obtain i h ′ C′ ≥ (1 − eα)|C | > 0. σ e ∈ σTrue Q Pr σ e∼ v∈V ′ Dv |σ By Fact 2.6, k(Φ′ ) ≤ k. Then Item (1) follows immediately from Fact 3.2 by noticing |V ′ | ≤ ′ k|C ′ | + 16 when H(Φ′ ) is connected. σ ′ ∼ µCTrue in Item (2) follows from Fact 3.2 as well. Note that β ≥ 1. Thus by Definition 2.5 we may safely assume σ(v) = ⋆ in Item (2). Let B(σ ′ ) be the event (i.e., constraint) “σ ′ (v) = q”. Then vbl(B) = {v} and       Pr ′ B(σ ′ ) = Pr ′ σ ′ (v) = q and B(σ ′ ) = Dv |σ (q). Q Pr σ′ ∼µC True σ′ ∼ σ′ ∼µC True v∈V ′ Dv |σ By Fact 2.6, d(Φ′ ) ≤ d. Thus Item (2) follows naturally from Theorem 2.3 by the definition of β and noticing   C|σ ∈ C ′ vbl(B) ∩ vbl(C|σ ) 6= ∅ = C|σ ∈ C ′ v ∈ vbl(C|σ ) ≤ d(Φ′ ) ≤ d. 3.3 A SafeSampling Subroutine The following simple SafeSampling complements RejectionSampling when there is uncertainty to update u ∈ M. Details will be clear in Subsection 4.2.2. Algorithm 3: The SafeSampling algorithm  Input: an atomic CSP Φ = V, (Ωv , Dv )v∈V , C , a marking M ⊆ V , some u ∈ M, and a randomness tape r Output: some value q ∈ Ω⋆u 1 Recall β = β(Φ, M) from (3) ⋆ using r where D ⋆ is a distribution over Ω⋆ by 2 Sample c from Du u u Du⋆ (q) return c 3 ( max {0, 1 − β · (1 − Du (q))} q ∈ Ωu , = P q = ⋆. 1 − q′ ∈Ωu Du⋆ (q ′ ) We note the following observation regarding SafeSampling. Proposition 3.4. The following holds for SafeSampling(Φ, M, u, r) over random r if eα∆ ≤ 1. 6 The additional +1 is for the case |V ′ | = 1 and C ′ = ∅. 15 (1) It runs in time O(Q) where Q = Q(Φ) from Definition 2.1 and Du⋆ from Line 2 is a distribution. (2) For any q ∈ Ωu , Du⋆ (q) ≤ Du (q) and Du⋆ ({q, ⋆}) ≤ β · Du (q) + (β − 1)(|Ωu | − 2).   Q V \M ⋆ (3) Let σ ∈ be an arbitrary assignment with σ(u) = ⋆. Let Φ′ = V ′ , (Ωv |σ , Dv |σ )v∈V ′ , C ′ v∈M Ωv ×{⋆} ′ be an arbitrary sub-CSP of Φ|σ where V ′ ⊆ V and C ′ ⊆ C|σ . Let σ ′ ∼ µCTrue .7 Then for any q ∈ Ωu |σ = Ωu , we have Du⋆ (q) ≤ Pr [σ ′ (u) = q]. Proof. First note that β ≥ 1. For Item (2), observe that Du⋆ (q) = max {0, 1 − β · (1 − Du (q))} ≤ max {0, 1 − 1 · (1 − Du (q))} = Du (q) and Du⋆ ({q, ⋆}) = 1 − X q ′ ∈Ωu \{q} Du⋆ (q ′ ) ≤ 1 − X q ′ ∈Ωu \{q} = 1 + (β − 1)(|Ωu | − 1) − β · X q ′ ∈Ωu \{q}  1 − β · (1 − Du (q ′ )) Du (q ′ ) = 1 − (β − 1)(|Ωu | − 1) − β · (1 − Du (q)) = β · Du (q) + (β − 1)(|Ωu | − 2). P For Item (1), it suffices to observe, using Item (2), that Du⋆ (⋆) ≥ 1 − q′ ∈Ωu Du (q ′ ) = 0. Finally for Item (3), we simply have X     Pr σ ′ (u) = q = 1 − Pr σ ′ (u) = q ′ q ′ ∈Ωu \{q} ≥1−β · X q ′ ∈Ωu \{q} Du (q ′ ) (by Item (2) of Proposition 3.3 and Du |σ = Du ) = 1 − β · (1 − Du (q)) ≥ Du (q) ≥ Du⋆ (q). 4 (since β ≥ 1 and by Item (2)) The AtomicCSPSampling Algorithm We now formally describe our main algorithm AtomicCSPSampling in Algorithm 4. The missing subroutines will be provided as we prove the correctness and efficiency of Algorithm 4. Intuitively, the σ e after the while iterations will be a random assignment over M (i.e., σ e ∈  Q V \M ) with certain distribution; and the final output σ simply extends the assignv∈M Ωv × {⋆} ment of σ e to V \ M. Putting them together, we will prove σ is distributed as µCTrue . Recall measures α, ρ, λ, k, d, ∆, Q defined in (2), (4), (5), and Definition 2.1. Now we present our main theorem, the proof of which is the focus of the rest of the section.  Theorem 4.1. Let Φ = V, (Ωv , Dv )v∈V , C be an atomic CSP. Let M ⊆ V be a marking. If eα∆ ≤ 1, e∆2 ρ ≤ 1/32, and ∆2 λ ≤ 1/16, then the following holds for AtomicCSPSampling(Φ, M). • Correctness. It halts almost surely and outputs σ ∼ µCTrue when it halts.  • Efficiency. Its expected total running time is O kQ∆3 |V | log(|V |) . 7 C′ µTrue is well-defined guaranteed by Item (2) of Proposition 3.3. 16 Algorithm 4: The AtomicCSPSampling algorithm  Input: an atomic CSP Φ = V, (Ωv , Dv )v∈V , C and a marking M ⊆ V C Output: an assignment σ ∈ σTrue 1 Assign infinitely long randomness ri independently for each i ∈ Z 2 Initialize T ← 1 3 while True do  Q V \M ⋆ 4 σ e ← BoundingChain(Φ, M, −T, r−T , . . . , r−1 ) /* σ e∈ */ v∈M Ωv × {⋆} 5 if σ e(v) 6= ⋆ for all v ∈ M then break 6 else Update T ← 2 · T 7 end 8 σ ← FinalSampling(Φ, M, σ e) 9 return σ Remark 4.2. We remark that λ ≥ ρ since β ≥ 1 and domains have at least 2 elements8 . Thus we can use, say, ∆2 λ ≤ 1/100 to dominate both e∆2 ρ ≤ 1/32 and ∆2 λ ≤ 1/16 and thus simplify the conditions in Theorem 4.1. This indeed only loses minor factors. However, we choose to present in the current format to make the proofs cleaner as e∆2 ρ and ∆2 λ comes from different places. 4.1 The BoundingChain Subroutine Recall our SafeSampling subroutine from Subsection 3.3. We present the BoundingChain subroutine. Algorithm 5: The BoundingChain subroutine  Input: an atomic CSP Φ = V, (Ωv , Dv )v∈V , C , a marking M ⊆ V , a starting time −T , and randomness tapesQr−T , . . . , r−1 Output: an assignment σ ∈ v∈V Ω⋆v 1 Initialize σ(v) ← ⋆ for all v ∈ V /* Assume V = {v0 , . . . , vn−1 } */ 2 for t = −T to −1 do 3 it ← t mod n, and σ(vit ) ← ⋆ /* Update σ(vit ) in this round */ 4 (Φt , Token t ) ← Component(Φ, M, σ, vit ) where Φt = Vt , (Ωv |σ , Dv |σ )v∈Vt , Ct 5 if vit ∈ V \ M then Continue 6 else if (vit ∈ M) ∧ (Token t = True) then 7 σ ′ ← RejectionSampling(Φt , rt ) 8 Update σ(vit ) ← σ ′ (vit ) 9 else /* (vit ∈ M) ∧ (Tokent = False) */ 10 Update σ(vit ) ← SafeSampling(Φ, M, vit , rt ) 11 end 12 end 13 return σ Remark 4.3. Algorithm 5 also works if we ignore variables outside M. This is because σ(V \ M) is always kept ⋆V \M . However the current version is more convenient for our analysis in Subsection 4.1.2 and does not influence the running time much. 8 If some domain has only 1 element, then the corresponding distribution must be a point distribution. Thus we can simply fix the variable to this value and simplify the CSP. 17 We first note the following fact regarding each round of update.  Q ⋆ Lemma 4.4. Let t ∈ {−T, . . . , −1} be an arbitrary time with vit ∈ M. Let σ ∈ v∈M Ωv × {⋆}V \M be an arbitrary assignment. Let q ∈ Ω⋆vit be the update of σ(vit ) in the t-th for iteration of Algorithm 5 over random rt . If eα∆ ≤ 1, then (1) q is well-defined almost surely; (2) for any q ′ ∈ Ωvit we have   Pr q = q ′ σ, t ≤ β ·Dvit (q ′ ) and   Pr q ∈ q ′ , ⋆   σ, t ≤ β ·Dvit (q ′ )+(β −1) Ωvit − 2 . Proof. Note that σ uniquely determines Tokent , i.e., whether we update σ(vit ) by RejectionSampling or SafeSampling. Thus we only need to consider two possibilities separately. • RejectionSampling. By Item (3) of Proposition 3.1 and Item (1) of Proposition 3.3, q is welldefined almost surely. Since σ(vit ) is set to ⋆ right before the update and q is never ⋆ in RejectionSampling, we have Dvit |σ = Dvit and, by Item (2) of Proposition 3.3,   Pr q ∈ q ′ , ⋆    σ, t = Pr q = q ′ σ, t ≤ β · Dvit (q ′ ). • SafeSampling. By Item (1) of Proposition 3.4, q is always well-defined. Then the two bounds follow from Item (2) of Proposition 3.4. We will show the following result for one single call of Algorithm 5. Proposition 4.5. If eα∆ ≤ 1, e∆2 ρ ≤ 1/32, and ∆2 λ ≤ 1/16, then the following holds over random r−T , . . . , r−1 for BoundingChain(Φ, M, −T, r−T , . . . , r−1 ). • Efficiency. Let Xt be the running time of the t-th for iteration. Then Xt < +∞ almost  2 2 5 2 surely and E Xt = O dk ∆ Q . • Coalescence. Let E be the event “in the returned assignment σ, there exists some u ∈ M such that σ(u) = ⋆”. If T ≥ 2|V | − 1, we have Pr [E] ≤ 4|V | · 2−T /|V | . 4.1.1 Moment Bounds on the Running Time To establish the efficiency part of Proposition 4.5, we need to control the size of Φt in each for iteration. This requires some additional definitions. Definition 4.6 (2-tree). Let G be an undirected graph. A set of vertices S ⊆ V (G) is a 2-tree if the following holds. • distG (u, v) ≥ 2 holds for any distinct u, v ∈ S where distG (u, v) is the length of the shortest path in G from u to v.9 • If we add an edge between every u, v ∈ S with distG (u, v) = 2, then S is connected. Intuitively a 2-tree is an independent set that is not very spread out. The following lemmas bound the number of 2-trees and show how to extract a large 2-tree from any connected subgraph. 9 For example distG (u, u) ≡ 0 for all u ∈ V (G), and distG (u, v) = 1 iff (u, v) is an edge in E(G). 18 Lemma 4.7 ([FGYZ20, Corollary 5.7]). Let G be a graph with maximum degree d. Then for any ℓ−1 /2. v ∈ V (G) and integer ℓ ≥ 1, the number of 2-trees in G of size ℓ containing v is at most ed2 Lemma 4.8 ([JPV21, Lemma 4.5]). Let G be a graph with maximum degree d. Let G′ be a connected subgraph of G. Then for any v ∈ V (G′ ), there exists a 2-tree S ⊆ V (G′ ) with v ∈ S and size |S| ≥ |V (G′ )|/(d + 1). Lemma 4.9 ([FGYZ20, Observation 5.5]). If a graph G has a 2-tree of size ℓ > 1 containing v ∈ V (G), then G also has a 2-tree of size ℓ − 1 containing v. The following result is an immediate corollary of Lemma 4.8 and Lemma 4.9. Corollary 4.10. Let G be a graph with maximum degree d. Let G′ be a connected subgraph of G. Then for any v ∈ V (G′ ) and any integer ℓ ≤ ⌈|V (G′ )|/(d + 1)⌉, there exists a 2-tree S ⊆ V (G′ ) with v ∈ S and size |S| = ℓ. Now we show the following concentration bound. Proposition 4.11. Let d = d(Φ), ∆ = ∆(Φ), α = α(Φ, M), and ρ = ρ(Φ, M). Assume eα∆ ≤ 1. For any t ∈ {−T, . . . , −1}, recall Ct from Line 4 of Algorithm 5. Then we have ℓ−1 d · e∆2 ρ for any integer ℓ ≥ 1. 2   Proof. Construct the line graph Lin(Φ) = V Φ , E Φ of Φ = V, (Ωv , Dv )v∈V , C where Pr [|Ct | ≥ ℓ · ∆] ≤ VΦ =C and E Φ = {{e1 , e2 } ∈ C × C | vbl(e1 ) ∩ vbl(e2 ) 6= ∅, e1 6= e2 } . Then Lin(Φ) is an undirected graph with maximum degree ∆ − 1. Let σ and vit be the assignment and variable to update at time t respectively. Let G be the subgraph of Lin(Φ) induced by vertex set V (G) = {C ∈ C | C|σ ∈ Ct }. Then by Item (3) of Proposition 3.1, G is a connected subgraph of Lin(Φ).10 For any C|σ ∈ Ct with vit ∈ vbl(C), by e = ℓ provided ℓ ≤ ⌈|Ct | /∆⌉. Corollary 4.10 there exists a 2-tree Se ⊆ V (G) with C ∈ Se and size |S| Define  S = 2-tree S ⊆ V Φ (|S| = ℓ) ∧ (∃C ∈ S, vit ∈ vbl(C)) . Then by Lemma 4.7 and noticing there are at most d choices of C, we have ℓ−1  ℓ−1 d · e (∆ − 1)2 d · e∆2 |S| ≤ ≤ . 2 2 C (v) for any C| ∈ C and v ∈ (vbl(C) ∩ M) \ {v }. By Item (2) of Proposition 3.1, σ(v) = σFalse σ t it Note that σ(v) is initialized as ⋆. Thus σ(v) must be updated before time t. Let UpdTime(v, t) be the last update time of v before the t-th for iteration in Algorithm 5, and let Ev,C be the event C (v) in the UpdTime(v, t)-th for iteration”. Recall the definition of ρ from “σ(v) is updated to σFalse 11 (4), then we have Pr [|Ct | ≥ ℓ · ∆] ≤ Pr [Ev,C , ∀C|σ ∈ Ct , ∀v ∈ (vbl(C) ∩ M) \ {vit }] 10 Actually Item (3) of Proposition 3.1 says the subgraph of Lin(Φ|σ ) induced by vertex set Ct is connected, which implies G is a connected subgraph of Lin(Φ) as vbl(C|σ ) ⊆ vbl(C). 11 We remark that for the fourth inequality below, we do not assume any independence between Ev,C . We simply use the chain rule of conditional probability in the order of time. For example, if UpdTime(v1 , t) < UpdTime(v2 , t), then Pr [Ev1 ,C1 ∧ Ev2 ,C2 ] = Pr [Ev1 ,C1 ] · Pr [Ev2 ,C2 | Ev1 ,C1 ] and then we apply Item (2) of Lemma 4.4 twice. 19 h i e ∀v ∈ (vbl(C) ∩ M) \ {vit } ≤ Pr Ev,C , ∀C ∈ S, X ≤ Pr [Ev,C , ∀C ∈ S, ∀v ∈ (vbl(C) ∩ M) \ {vit }] S∈S ≤ X Y Y S∈S C∈S v∈(vbl(C)∩M)\{vi } t (by union bound)  C (v)) min 1, β · Dv (σFalse (since (vbl(C))C∈S are pairwise disjoint and by Item (2) of Lemma 4.4) X Y Y C ≤ (v)) β · Dv (σFalse S∈S C∈S,vit ∈vbl(C) / v∈vbl(C)∩M ≤ X S∈S ρ|S|−1 ≤ ℓ−1 d · e∆2 ρ . 2 Now we obtain moment bounds for the running time of each for iteration in Algorithm 5. Proof of the Efficiency Part of Proposition 4.5. Let Yt and Zt be the running time   of Line 4 and  2  Line 5-11 respectively. By Item (1) of Proposition 3.1, we have E Yt |Ct | = O (∆k |Ct | + dk)2 . By Item (3) of Proposition 3.1, Item (1) of Proposition 3.3, and Item (1) of Proposition 3.4, we also have ( )! ! 2 2  2  (kQ |C | + Q) (kQ |C | + Q) t t E Zt |Ct | = O max 1, Q2 , =O . (1 − eα)2·|Ct | (1 − eα)2·|Ct | By Proposition 4.11, we have ℓ−1 d d Pr [|Ct | ≥ ℓ · ∆] ≤ · e∆2 ρ ≤ · 2 2  1 32 ℓ−1 for any integer ℓ ≥ 1. Since Xt = Yt + Zt + O(1) and (a + b + c)2 ≤ 4 · (a2 + b2 + c2 ), we have E  Xt2  =O 1+E ≤ O d2 k  2  Yt2 +  +E +∞ X  Zt2  =O 1+ +∞ X L=0 Pr [|Ct | = L] E  Yt2 + Pr [|Ct | ≥ ℓ · ∆] · ∆ · O ∆2 k2 (∆(ℓ + 1))2 + Zt2 |Ct | = L  ! (Qk∆(ℓ + 1))2 ! (1 − eα)2(ℓ+1)∆ (bucketing L ∈ [ℓ · ∆, (ℓ + 1) · ∆)) +∞    X Pr [|Ct | ≥ ℓ · ∆] · ∆ · O ∆2 k2 (∆(ℓ + 1))2 + (Qk∆(ℓ + 1))2 · 16ℓ+1 ≤ O d2 k2 + ℓ=0 ℓ=0 (since eα ≥ 1/∆ and ∆ ≥ 2, we have (1 − eα)∆ ≥ 1/4)  +∞   X  1 ℓ−1  2 2 2 2 ≤ O d k + O (d∆) ∆ k (∆(ℓ + 1))2 + (Qk∆(ℓ + 1))2 · 16ℓ+1 32 ℓ=1   = O d2 k2 + dk2 ∆5 + dk 2 ∆3 Q2 = O dk 2 ∆5 Q2 . (since d ≤ ∆) 20 4.1.2 Concentration Bounds for the Coalescence Here we analyze the coalescence part of Proposition 4.5, which is also the stopping condition for the while iterations in Algorithm 4. We use information percolation argument and need additional setup. We follow the notation convention in Subsection 4.1.1: V = {v0 , . . . , vn−1 } and it = t mod n for t ∈ {−T, . . . , −1}. UpdTime(v, t) is the last update time of v before time t, i.e.,   UpdTime(v, t) = max −T − 1, max t′ < t vit′ = v . (7) The additional −T − 1 is to set up the boundary condition corresponding to the initialization step. Definition 4.12 (Extended Constraints). For any C ∈ C and e = {t1 , . . . , tm } ⊆ {−T, . . . , −1}, we say (e, C) is an extended constraint if the following holds.  (1) m = |vbl(C)| and vbl(C) = vit1 , vit2 , . . . , vitm . (2) e = {UpdTime(v, tmax + 1) | v ∈ vbl(C)} where tmax = maxt′ ∈e t′ . Intuitively an extended constraint of C is a consecutive rounds of updates in vbl(C). Fact 4.13. The following holds for extended constraints. (1) If (e1 , C1 ) and (e2 , C2 ) are two extended constraints and vbl(C1 )∩vbl(C2 ) = ∅, then e1 ∩e2 = ∅. (2) If (e, C) is an extended constraint, then (2a) 0 ≤ tmax (e) − tmin (e) < n, where tmin (e) = mint′ ∈e t′ ; (2b) e = {UpdTime(v, t′ ) | v ∈ vbl(C)} for any tmax + 1 ≤ t′ ≤ tmin + n; (2c) for any C ′ ∈ C, we have  ′ e (e′ , C ′ ) is an extended constraint with e ∩ e′ 6= ∅ < 2 · vbl(C ′ ) . Proof. Item (1) is evident from Item (1) of Definition 4.12. For Item (2), we assume |vbl(C)| = m and vbl(C) = {va1 , . . . , vam }. Let  S = {−T, . . . , −1} ∩ {ai − j · n | i ∈ [m], j ∈ Z} = b1 , b2 , . . . , bT (C) where −T ≤ b1 < · · · < bT (C) ≤ −1. Note that bi ≡ bi+m mod n for all i ∈ [T (C) − m]. If (e, C) is an extended constraint, then by Item (2) of Definition 4.12, e consists of a consecutive interval of S, i.e., e = {bo , bo+1 , . . . , bo+m−1 } for some o ∈ [T (C)−m+1]. Thus tmax (e)−tmin (e) = bo+m−1 −bo < n (since bi ≡ bi+m mod n) which verifies Item (2a). Since either bo + n = bo+m or bo + n ≥ 0, we know UpdTime(vai , t′ ) = UpdTime(vai , bo+m−1 + 1) for all i ∈ [m] and bo+m−1 + 1 ≤ t′ ≤ bo + n which verifies Item (2b). ′ Now we prove Item (2c). By n Item (1), assume o without loss of generality vbl(C) ∩ vbl(C ) 6= ∅. Let m′ = |vbl(C ′ )| and vbl(C ′ ) = va′1 , . . . , va′ ′ and define m n o  S ′ = {−T, . . . , −1} ∩ a′i − j · n i ∈ [m′ ], j ∈ Z = b′1 , . . . , b′T (C ′ ) where −T ≤ b′1 < · · · < b′T (C ′ ) ≤ −1. Then similarly, we have b′i ≡ b′i+m′ mod n for all i ∈ [T (C ′ ) −  m′ ] and e′ = b′o′ , . . . , b′o′ +m′ −1 for some o′ ∈ [T (C ′ ) − m′ + 1]. Let imin = min {i ∈ [T (C ′ )] | b′i ∈ e} and imax = max {i ∈ [T (C ′ )] | b′i ∈ e}. Then e ∩ e′ 6= ∅ iff imin ≤ o′ + m′ − 1 and imax ≥ o′ . Therefore there are at most imax − imin + m′ choices of o′ . Since b′imin , b′imax ∈ e, by Item (2a) we know b′imax − b′imin < n. Hence imax < imin + m′ . In all, there are at most 2m′ − 1 choices of e′ . 21 Definition 4.14 (Extended Hypergraph). Extended hypergraph H ext = (V ext , E ext ) has vertex set V ext = {−T, . . . , −1} and extended constraints as edges: E ext = {e ⊆ {−T, . . . , −1} | (e, C) is an extended constraint} . Moreover, we label each edge e with C if it is added into E ext by extended constraint (e, C). We allow multiple occurrence of the same edge but the labels are different. Define σ0 as the final returned assignment in Algorithm 5. For each t ∈ {−T, . . . , −1}, let σt be the assignment at Line 4 of the t-th for iteration in Algorithm 5. In particular σt (vit ) = ⋆ due to Line 3. Now we present the following algorithm informally described in Subsection 1.3 to sequentially find constraints that are not satisfied during the BoundingChain process. We remark that this algorithm is only for our analysis, and we do not run it during AtomicCSPSampling. Algorithm 6: Find failed constraints during the BoundingChain process Input: assignments (σt )t∈{−T,...,0} defined above and some u ∈ M with σ0 (u) = ⋆ Output: H ′ = (V ′ , E ′ ) where V ′ ⊆ V ext and E ′ ⊆ E ext 1 Set t0 ← UpdTime(u, 0) and initialize V ′ ← {t0 } , E ′ ← ∅ 2 FailedConstraints(t0) 3 return (V ′ , E ′ ) 4 Procedure FailedConstraints(t): 5 if t < −T + n − 1 then return /* (vit ∈ M) ∧ (Tokent = False) */ 6 Initialize Vt ← {vit } and Ct ← ∅ 7 while ∃C|σt ∈ C|σt \ Ct with vbl(C|σt ) ∩ Vt 6= ∅ do 8 e ← {UpdTime(v, t + 1) | v ∈ vbl(C)} /* (e, C) is an extended constraint */ 9 Update Ct ← Ct ∪ {C|σt } and Vt ← Vt ∪ vbl(C|σt ) 10 Update E ′ ← E ′ ∪ {e} and V ′ ← V ′ ∪ e /* e is labeled by C */ 11 end 12 foreach v ∈ (Vt ∩ M) \ {vit } do 13 FailedConstraints(UpdTime(v, t)) 14 end 15 end We have the following observation regarding Algorithm 6. Lemma 4.15. Algorithm 6 halts always. Furthermore, if T ≥ 2n − 1 then (1) for each (e, C) from Line 8 when we execute FailedConstraints(t), (1a) it is an extended constraint, C (v) in the UpdTime(v, t+ (1b) for each v ∈ vbl(C), the assignment on v is updated to ⋆ or σFalse 1)-th for iteration in Algorithm 5; (2) each time we call FailedConstraints(t), t is already in V ′ ; (3) H ′ is a connected sub-hypergraph of H ext ; (4) there exists some e0 , e1 ∈ E ′ such that tmax (e0 ) ≥ −n and tmin (e1 ) < −T + n − 1. 22 Proof. Since UpdTime(v, t) < t for all v ∈ V and t ∈ {−T, . . . , 0}, Algorithm 6 always halts. We prove Item (1) by induction on the calls of FailedConstraints(t). The first call t0 represents the final update of the assignment on u ∈ M, which results in σ0 (u) = ⋆. • Item (1a) for t0 . Note that 0 > t0 ≥ 0−n ≥ −T +n−1. Then UpdTime(v, t0 +1) = t0 ≥ −T for v = vit0 ; and UpdTime(v, t0 + 1) ≥ −T for all v 6= vit0 . This means −T − 1 ∈ / e and thus (e, C) is an extended constraint. • Item (1b) for t0 . Since C|σt0 ∈ C|σt0 at Line 7, we know C(σt0 ) = False and, by (6), σt0 (v) ∈  C σFalse (v), ⋆ for each v ∈ vbl(C). This means, if v 6= vit0 , the assignment on v is updated to such value in the UpdTime(v, t0 ) = UpdTime(v, t0 + 1)-th for iteration in Algorithm 5. Meanwhile if v = vit0 , then the assignment on v is updated to ⋆ in this t0 = UpdTime(v, t0 +1)th for iteration, resulting in σ0 (v) = ⋆. To complete the induction, we note that each later call of FailedConstraints relies on some v from Line 13 when we execute some FailedConstraints(UpdTime(v, t)). This means v ∈ M \ {vit } and σt (v) = ⋆ and t ≥ −T +n−1. Thus the assignment on v is updated to ⋆ in the −T ≤ UpdTime(v, t)th for iteration in Algorithm 5. Then the argument above also goes through with almost no change. For Item (2), note that V ′ is initialized as {t0 }. Upon Line 13, we have UpdTime(v, t) = UpdTime(v, t + 1) since v 6= vit , which has been added into V ′ by Line 10 earlier. Now we turn to Item (3). By Item (1), E ′ ⊆ E ext . Meanwhile by Line 10, E ′ are edges over vertex set V ′ . Thus it suffices to show H ′ = (V ′ , E ′ ) is connected. By Item (2), we only need to show vertices added during the while iterations in FailedConstraints(t) is connected to t. This can be proved by induction: When we find C|σt satisfying the condition at Line 7, fix some v ′ ∈ vbl(C|σt ) ∩ Vt . Then for the edge e constructed in the next step, each UpdTime(v, t + 1) ∈ e is connected to UpdTime(v ′ , t + 1) ∈ e. Note that either (A) v ′ = vit and thus UpdTime(v ′ , t + 1) = t, or (B) UpdTime(v ′ , t + 1) was added into V ′ in an earlier time and connected to t. Hence each UpdTime(v, t + 1) ∈ e is connected to t as desired. Finally we prove Item (4). Each time we call FailedConstraints(t), it implies the assignment on vit ∈ M is updated to ⋆ in the t-th for iteration in Algorithm 5. This means Tokent = False and SafeSampling is performed in Algorithm 5. By comparing Algorithm 1 and FailedConstraints, we know vit is connected by falsified constraints to some v ∈ (Vt ∩ M) \ {vit } that σt (v) = ⋆. This implies at least one round of Line 13 here will be executed. Thus the recursion of Algorithm 6 continues until t < −T + n − 1. By Item (2), there exists some t1 ∈ V ′ with t1 < −T + n − 1. Meanwhile t0 ∈ V ′ and t0 ≥ −n. Hence by Item (3), there exists e0 , e1 ∈ E ′ such that t0 ∈ e0 and t1 ∈ e1 ; and tmax (e0 ) ≥ t0 ≥ −n, tmin (e1 ) ≤ t1 < −T + n − 1. Finally we complete the proof of Proposition 4.5. Proof of the Coalescence Part of Proposition 4.5. Since σ0 is defined as the final returned assignment, we know event E (Defined in Proposition 4.5) is “there exists some u ∈ M such that σ0 (u) = ⋆”. Using Algorithm 6 we obtain H ′ = (V ′ , E ′ ). By Fact 2.4 and Item (3) (4) of e1 , C e2 , . . . , C eℓ such that Lemma 4.15, there exists a path ee1 , ee2 , . . . , eeℓ ∈ E ′ ⊆ E ext with labels C (a) eei ∩ eei+1 6= ∅ and thus tmin (e ei ) ≤ tmax (e ei+1 ) for all i ∈ [ℓ − 1]; (b) eei ∩ eej = ∅ for all i, j ∈ [ℓ] with |i − j| > 1; (c) −n ≤ tmax (e e1 ) < 0; (d) −T ≤ tmin (e eℓ ) < −T + n − 1. 23 Meanwhile tmax (e e1 ) = tmax (e eℓ ) + ℓ−1 X i=1 (tmax (e ei ) − tmax (e ei+1 )) ≤ tmin (e eℓ ) + n − 1 + ℓ−1 X i=1 (tmin (e ei ) + n − 1 − tmax (e ei+1 )) (by Item (2a) of Fact 4.13) ≤ (−T + n − 2) + ℓ · (n − 1). (by Item (a) (d) above) Thus by Item (c) above, we have ℓ ≥ −2 + ⌈T /(n − 1)⌉. For convenience we truncate the tail of the path so that ℓ is the largest even number no more than −2+⌈T /(n−1)⌉. Thus ℓ ≥ −3+⌈T /(n−1)⌉. We remark that since the tail is truncated, Item (d) above is not necessarily true now. Now for any fixed path e1 , . . . , eℓ ∈ E ext with labels C1 , . . . , Cℓ satisfying Item (a) (b) (c) above, we bound the probability that this path, denoted by P , appears in H ′ . Assume ℓ/2 Y i=1 |vbl(C2i−1 )| ≤ ℓ/2 Y i=1 |vbl(C2i )| (8) and the other case can be analyzed similarly. For each t ∈ {−T, . . . , −1} and C ∈ C with vit ∈ vbl(C) C (v) in the t-th for iteration we define Et,C as the event “ the assignment on vit is updated to ⋆ or σFalse V in Algorithm 5”. By Item (1b) of Lemma 4.15, ei , labeled by Ci , appearing in H ′ implies t∈ei Et,Ci happens. Recall the definition of λ from (5), then we have12     Pr P appears in H ′ ≤ Pr e2i appears in H ′ with label C2i for all i ∈ [ℓ/2]     ℓ/2 ℓ/2 ^ ^ ^ ^ ≤ Pr  Et,C2i  = Pr  Et,C2i  i=1 t∈e2i ≤ ℓ/2 Y i=1 t∈e2i ,vit ∈M Y i=1 v∈vbl(C2i )∩M  C (v)) + (β − 1)(|Ωv | − 2) β · Dv (σFalse (since (e2i )i∈[ℓ/2] are pairwise disjoint and by Item (2) of Lemma 4.4) ≤ ℓ/2 Y i=1 (4∆ |vbl(C2i )|) −2 ≤ ℓ Y i=1 (4∆ |vbl(Ci )|)−1 . (since ∆2 λ ≤ 1/16 and by (8)) P Now it suffices to union bound over all possible paths, i.e., Pr [E] ≤ P Pr [P appears in H ′ ] where P is some fixed path e1 , . . . , eℓ ∈ E ext with labels C1 , . . . , Cℓ satisfying Item (a) (b) (c) above. First by Item (c), there are at most n possible tmax (e1 ). Then by Item (1) of Fact 4.13 there are at most d possible C1 given tmax (e1 ). By Item (2) of Definition 4.12 this determines (e1 , C1 ). Now given (e1 , C1 ), we enumerate the rest of the path by a rooted tree T (e1 , C1 ) constructed as follows: • T (e1 , C1 ) has depth 2(ℓ − 1) and the root is labeled with (e1 , C1 ). • Given a node z with label (ei , Ci ), i ∈ [ℓ − 1], we construct the next two layers differently. – For each Ci+1 ∈ C with vbl(Ci ) ∩ vbl(Ci+1 ) 6= ∅, we create a child node z ′ with label Ci+1 and link to z. 12 We remark that for the third inequality below, we do not assume any independence between Et,C . For example, if t1 < t2 , then Pr [Et1 ,C4 ∧ Et2 ,C2 ] = Pr [Et1 ,C4 ] · Pr [Et2 ,C2 | Et1 ,C4 ] and then we apply Item (2) of Lemma 4.4 twice. 24 (z ′ has at most ∆ possibilities.) – For each z ′ , assume its label is Ci+1 . We find some ei+1 such that ei ∩ ei+1 6= ∅ and (ei+1 , Ci+1 ) is an extended constraint, and then create a child node z ′′ with label (ei+1 , Ci+1 ) and link to z ′ . (z ′′ , given z ′ , has at most 2 · |vbl(Ci+1 )| possibilities by Item (2c) of Fact 4.13.) – We move to each z ′′ and repeat the construction. Each leaf of T (e1 , C1 ) represents a path P which satisfies Item (a) (c) already, and • either, P does not satisfy Item (b) and thus does not contribute to the union bound; Q • or, P is valid and Pr [P appears in H ′ ] ≤ ℓi=1 (4∆ |vbl(Ci )|)−1 as above. Now we put weight on each internal node z of T (e1 , C1 ) as follows: √ • If z has label (e, C), then its weight is w(z) = ( 2∆)−1 . √ −1 . • Otherwise z has label C, then its weight is w(z) = 2 2 · |vbl(C)| Q This means Pr [P appears in H ′ ] ≤ (4∆)−1 internal node z in P w(z) for each valid P in T (e1 , C1 ) where the (4∆)−1 factor is because there are only ℓ − 1 internal nodes for each case along the path. Thus Y X X   w(z) Pr P appears in H ′ ≤ (4∆)−1 valid P in T (e1 ,C1 ) internal node z in P valid P in T (e1 ,C1 ) −1 ≤ (4∆) X Y w(z) P in T (e1 ,C1 ) internal node z in P √ ≤ ( 2)−2(ℓ−1) /(4∆) = 2−ℓ /(2∆), √ where the last inequality can be proved by induction on the depth and noticing ( 2 · w(z))−1 is at least the number of child nodes of z for each internal node z ∈ T (e1 , C1 ). Putting everything together, we have X X   Pr [E] ≤ Pr P appears in H ′ valid (e1 ,C1 ) valid P in T (e1 ,C1 ) ≤ nd · 2−ℓ /(2∆) T 2− n−1 ≤ n·2 4.2 ((e1 , C1 ) has at most nd choices) − Tn ≤ 4n · 2 (since ℓ ≥ −3 + ⌈T /(n − 1)⌉ and d ≤ ∆) . The Distribution after BoundingChain Subroutines Recall in AtomicCSPSampling(Φ, M) (Algorithm 4), we keep doubling T and performing the corresponding BoundingChain(Φ, M, −T, r−T , . . . , r−1 ) until the returned assignment has no ⋆ on M. Thus before we present the FinalSampling subroutine, we pause for now to analyze these BoundingChain calls in a whole.  Definition 4.16 (Projected Distribution). Given an atomic CSP Φ = V, (Ωv , Dv )v∈V , C and a  Q V \M and define the projected distribution µM ∈ RΛ by marking M ⊆ V , let Λ = v∈M Ωv × {⋆}   for all σ ∈ Λ. µM (σ) = Pr σ ′ (M) = σ(M) σ′ ∼µC True 25 We will show the distribution after all BoundingChain subroutines is exactly µM . Proposition 4.17. If eα∆ ≤ 1, e∆2 ρ ≤ 1/32, and ∆2 λ ≤ 1/16, then the while iterations in AtomicCSPSampling(Φ, M) halts almost surely and the final assignment σ e has distribution µM . We introduce and analyze the following SystematicScan(Φ, M, σin, L, R, rL , . . . , rR ) algorithm, then show it couples with BoundingChain. Algorithm 7: The SystematicScan algorithm  Input: an atomic CSP Φ = V, (Ωv , Dv )v∈V , C , a marking M ⊆ V , an assignment  Q V \M , a starting time L, a stopping time R, and σin ∈ v∈M Ωv × {⋆} randomness tapes rL , .Q . . , rR .  V \M Output: an assignment σ ∈ . v∈M Ωv × {⋆} 1 Initialize σ ← σin 2 for t = L to R do /* Assume V = {v0 , . . . , vn−1 } */ 3 it ← t mod n, and σ(vit ) ← ⋆ /* Update σ(vit ) in this round */ 4 Φt = Vt , (Ωv |σ , Dv |σ )v∈Vt , Ct ← Component(Φ, M, σ, vit ) /* Ignore the returned Token since it is always True here */ 5 if vit ∈ V \ M then Continue 6 else 7 σ ′ ← RejectionSampling(Φt , rt ) 8 Update σ ← σ ′ (vit ) 9 end 10 end 11 return σ 4.2.1 Convergence of SystematicScan We first show SystematicScan converges to µM . We set up basic chains.  for Markov Q notations V \M Let Λ be a finite state space; for our purpose it will P be Ω × {⋆} . We view any v v∈M distribution µ over Λ as a horizontal vector in RΛ where a∈Λ µ(a) = 1 and µ(a) ≥ 0 holds for all a ∈ Λ. We denote 1a ∈ RΛ as the point distribution of a ∈ Λ, i.e., 1a (b) = 1 iff a = b. Λ×Λ A Markov chain (Xt )t≥0 over P Λ is given by its transition matrices (Pt )t≥0 where each Pt ∈ R has non-negative entries and b∈Λ Pt (a, b) ≡ 1 for all a ∈ Λ. Then Xt = X0 P0 P1 · · · Pt−1 where X0 ∈ RΛ is the starting distribution. In particular, • if Pt ≡ P for all t ≥ 0, then (Xt )t≥0 is a time homogeneous Markov chain given by P; • if (Pt )t≥0 are possibly different, then (Xt )t≥0 is a time inhomogeneous Markov chain. Assume (Xt )t≥0 is a time homogeneous Markov chain over Λ given by transition matrix P. We say P is • irreducible if for any a, b ∈ Λ, there exists some integer t ≥ 0 such that Pt (a, b) > 0;  • aperiodic if for any a ∈ Λ, gcd integer t > 0 Pt (a, a) > 0 = 1;13 • stationary with respect to distribution µ if µP = µ; 13 gcd stands for greatest common divisor. 26 • reversible with respect to distribution µ if µ(a)P(a, b) = µ(b)P(b, a) holds for all a, b ∈ Λ. Here we note the following two classical results. Fact 4.18 (e.g., [LP17, Proposition 1.20]). If P is reversible with respect to µ, then P is also stationary with respect to µ. Theorem 4.19 (The Convergence Theorem, e.g., [LP17, Theorem 4.9]). Suppose (Xt )t≥0 is an irreducible and aperiodic time homogeneous Markov chain over finite state space Λ with stationary distribution µ and transition matrix P. Then for any X0 , we have limt→+∞ Xt = µ.14 Now we turn to SystematicScan and follow the notation in Algorithm 7: V =  Q convention V \M × {⋆} . Ω {v0 , . . . , vn−1 } and it = t mod n. Recall we also fix Λ = v∈M v Definition 4.20 (One-step Transition Matrix). For any i ∈ {0, . . . , n − 1}, define the one-step transition matrix on vi ∈ V as Pi ∈ RΛ×Λ where   Pi (σ1 , σ2 ) = Pr σ ′ (M) = σ2 (M) σ ′ (M \ {vi }) = σ1 (M \ {vi }) . σ′ ∼µC True We first prove some useful facts and also connect one-step transition matrices to SystematicScan. Proposition 4.21. If eα∆ ≤ 1, then the following holds. (1) Each Pi is well-defined. (2) For any i ∈ {0, . . . , n − 1} and σ1 , σ2 ∈ Λ, we have Pi (σ1 , σ2 ) ≡ 0 if σ1 (M \ {vi }) 6= σ2 (M \ {vi }); and Pi (σ1 , σ2 ) > 0 if otherwise. (3) µM Pi1 · · · Pim = µM holds for any sequence i1 , i2 , . . . , im ∈ {0, . . . , n − 1} of finite length m. (4) For any L ≤ R and σin , SystematicScan(Φ, M, σin , L, R, rL , . . . , rR ) halts almost surely over random rL , . . . , rR and its output distribution is µ = 1σin PiL PiL+1 · · · PiR . Proof. First we prove Item (1) (2). For any fixed i ∈ {0, . . . , n − 1} and σ1 ∈ Λ, define assignment σ by setting σ(V \ {vi }) = σ1 (V \ {vi }) and σ(vi ) = ⋆. Then for any σ2 ∈ Λ we have   Pi (σ1 , σ2 ) = Pr σ ′ (M) = σ2 (M) , C| σ σ′ ∼µTrue C| σ being well-defined, which where C|σ is defined in Definition 2.5. Thus Item (1) is equivalent to µTrue ′ follows from Proposition 3.3. Since σ(M) has no ⋆, σ (M \ {vi }) ≡ σ(M \ {vi }) ≡ σ1 (M \ {vi }). Thus the first part of Item (2) follows naturally. As for the second part, if σ1 (V \{vi }) = σ2 (V \{vi }), then Prσ′ ∼µC [σ ′ (M) = σ2 (M)] True . (9) Pi (σ1 , σ2 ) = Prσ′ ∼µC [σ ′ (M \ {vi }) = σ1 (M \ {vi })] True C|σ Thus Pi (σ1 , σ2 ) > 0 iff the enumerator is positive, which is equivalent to µTrue2 being well-defined and is, again, guaranteed by Proposition 3.3.  Now we prove Item (3). Since µM Pi1 · · · Pim = µM Pi1 Pi2 · · · Pim , it suffices to show for m = 1 and then apply induction. By Fact 4.18, it suffices to show for each i ∈ {0, . . . , n − 1}, Pi is reversible with respect to µΦ|π , i.e., µM (σ1 )Pi (σ1 , σ2 ) = µM (σ2 )Pi (σ2 , σ1 ) for any σ1 , σ2 ∈ Λ. 14 The convergence is entry-wise. 27 (10) By Item (2), we may safely assume σ1 (M \ {vi }) = σ2 (M \ {vi }) and observe that µM (σ1 )Pi (σ1 , σ2 ) = = Pr σ′ ∼µC True Pr σ′ ∼µC True M    σ (M) = σ1 (M) · ′  σ (M) = σ2 (M) · ′ Prσ′ ∼µC [σ ′ (M) = σ2 (M)] True (by (9)) Prσ′ ∼µC [σ ′ (M \ {vi }) = σ1 (M \ {vi })] True Prσ′ ∼µC [σ ′ (M) = σ1 (M)] True Prσ′ ∼µC [σ ′ (M \ {vi }) = σ2 (M \ {vi })] True = µ (σ2 )Pi (σ2 , σ1 ). Finally we turn to Item (4). By induction on R, it suffices to verify for L = R = t ∈ Z that 1σin Pit = Pit (σin , ·) is the distribution of σout ← SystematicScan(Φ, M, σin , t, t, rt ). Similarly as above, define assignment σ e by setting σ e(V \ {vit }) = σin (V \ {vit }) and σ e(vit ) = ⋆. Then for each σ b ∈ Λ we have   Pit (σin , σ b) = Pr σ ′ (M) = σ b(M) . C| σ e σ′ ∼µTrue Now we consider two separate cases. e(M). Thus Pit (σin , σ b) equals 1 if σ b = σin ; and equals 0 if • vit ∈ V \ M. Then σ ′ (M) ≡ σ otherwise. This agrees with σout ≡ σin from the algorithm.  • vi ∈ M. Let Φt = Vt , (Ωv |σe , Dv |σe )v∈Vt , Ct be from Line 4. Then by Item (2) of Proposition 3.3, we have ( Prσ′ ∼µCt [σ ′ (vit ) = σ b(vit )] σ b(V \ {vit }) = σ e(V \ {vit }), True Pr [σout = σ b] = 0 otherwise. By Item (4) of Proposition 3.1, Prσ′ ∼µCt [σ ′ (vit ) = σ b(vit )] = Pr True Thus Pr [σout = σ b] = = ( Pr C| σ e σ′ ∼µTrue 0 Pr C| σ e σ′ ∼µTrue  [σ ′ (vit ) = σ b(vit )] C| σ e σ′ ∼µTrue [σ ′ (vit ) = σ b(vit )]. σ b(V \ {vit }) = σ e(V \ {vit }), otherwise  σ ′ (M) = σ b(M) (since σ b ∈ Λ and σ ′ (M \ {vit }) ≡ σ e(M \ {vit })) = Pit (σin , σ b). Item (4) of Proposition 4.21 shows SystematicScan is a time inhomogeneous Markov chain. General theory regarding time inhomogeneous Markov chains can be much more complicated but luckily we can embed this one into a time homogeneous Markov chain. Lemma 4.22. Assume eα∆ ≤ 1. Let L and σin ∈ Λ be arbitrary. Define µR = 1σin PiL · · · PiR . Then limR→+∞ µR = µM . Proof. Let F = PiL · · · PiL+n−1 . Since it = t mod n, the one-step transition matrices repeatedly applied to 1σin has period n. Hence µR = 1σin Fm if R = L + m · n − 1 and m ≥ 1. Let Y0 = 1σin and Yt = Y0 Fi for t ≥ 1, then (Yt )t≥0 is a time homogeneous Markov chain with transition matrix F. Here we verify the following properties of F. • Stationary with respect to µM . This follows immediately from Item (3) of Proposition 4.21. 28 • Aperiodic. By Item (2) of Proposition 4.21, for any i ∈ {0, . . . , n − 1} and any σ ∈ Λ we have Pi (σ, σ) > 0. Thus F(σ, σ) > 0 which implies F is aperiodic. • Irreducible. Let σ1 , σ2 ∈ Λ be arbitrary. For each j ∈ {0, . . . , n}, define σ j ∈ Λ by ( σ2 (vi′ ) i′ ∈ {iL , . . . , iL+j−1 } , j σ (vi′ ) = σ1 (vi′ ) otherwise. Then σ1 = σ 0 and σ2 = σ n . Moreover PiL+j (σ j , σ j+1 ) > 0 for all j ∈ {0, . . . , n − 1} by Item (2) of Proposition 4.21. Thus F(σ1 , σ2 ) ≥ PiL (σ 0 , σ 1 )PiL+1 (σ 1 , σ 2 ) · · · PiL+n−1 (σ n−1 , σ n ) > 0. Therefore by Theorem 4.19, limm→+∞ µL+m·n−1 = limt→+∞ Yt = µM . Since each Pi is stationary with respect to µM by Item (3) of Proposition 4.21, for any finite integer o ≥ 0   lim µL+m·n−1+o = lim µL+m·n−1 PiL · · · PiL+o−1 = µM PiL · · · PiL+o−1 = µM . m→+∞ m→+∞ Hence limR→+∞ µR = µM . 4.2.2 Coupling from the Past and the Bounding Chain We have showed SystematicScan converges to distribution µM , but to obtain a sample distributed exactly according to µM we need to run for infinite time. The trick for making it finite is to think backwards. That is the idea of coupling from the past [PW96]; then the bounding chain [Hub98, HN99] is used to make the process more computationally efficient. Let P ∈ RΛ×Λ be some transition matrix. We say f : Λ × [0, 1] → Λ is a coupling of P if for all a, b ∈ Λ, Prr∼[0,1] [f (a, r) = b] = P(a, b). We use random function f r : Λ → Λ to denote the coupling f with randomness r, i.e., f r (a) = f (a, r) for all a ∈ Λ. Recall our definition of Pi from Definition 4.20 and it = t mod n from Algorithm 7. Lemma 4.23 (Coupling from the Past). Let ft : Λ × [0, 1] → Λ be a coupling of Pit for all t ∈ Z. Define random functions FL,R : Λ → Λ over random (rt )t∈Z for −∞ < L ≤ R < +∞ as  rR−1 · · · fLrL (a) · · · for all a ∈ Λ. FL,R (a) = fRrR fR−1 Let M ≥ 1 be the smallest integer such that F−M,−1 is a constant function. Let A = F−M,−1 (Λ) be the corresponding constant. Then F−M ′ ,−1 (Λ) ≡ A for any M ′ > M , and A is distributed as µM if M < +∞ almost surely. Proof. Since Pi = Pi+n , for any integer ℓ ≥ 1 and all a, b ∈ Λ we have ℓ ⊤ Pr [F−ℓ·n,−1 (a, b)] = 1a (P−n P−n+1 · · · P−1 )ℓ 1⊤ b = 1a (P0 P1 · · · Pn−1 ) 1b = Pr [F0,ℓ·n−1 (a, b)] . Thus for any a, b ∈ Λ, by Lemma 4.22 we have lim Pr [F−ℓ·n,−1 (a) = b] = lim Pr [F0,ℓ·n−1 (a) = b] = µM (b). ℓ→+∞ ℓ→+∞  On the other hand for M ′ > M , F−M ′ ,−1 (Λ) = F−M,−1 F−M ′ ,−M −1 (Λ) = A. If M < +∞ almost surely, then we have Pr [A = b] = lim Pr [F−ℓ·n,−1 (a) = b] = µM (b). ℓ→+∞ 29 Therefore to obtain a perfect sample from µM we only need to (1) design a coupling, then (2) sample randomness tapes (rt )t≥0 , and lastly (3) find some M ′ ≥ 1 such that F−M ′ ,−1 is a constant function and output the corresponding constant. Now we show in the following Algorithm 8 that Algorithm 5 implicitly provides a coupling for (1) and an efficient way to check (3).   Q Q V \M V \M ⋆ . We define Λ⋆ = . Then we construct Recall Λ = v∈M Ωv × {⋆} v∈M Ωv × {⋆} ⋆ ⋆ gi : Λ × [0, 1] → Λ for each i ∈ {0, . . . , n − 1} as follows. We also remark that this coupling is information-theoretic and is for analysis only. Algorithm 8: A coupling gi : Λ⋆ × [0, 1] → Λ⋆ for each i ∈ {0, . . . , n − 1} Input: an assignment σin ∈ Λ⋆ and a randomness tape r ∈ [0, 1]. Output: an assignment σout ∈ Λ⋆ .  Data: an atomic CSP Φ = V, (Ωv , Dv )v∈V , C , a marking M ⊆ V , and an index i ∈ {0, . . . , n − 1}. Assume V = {v0 , . . . , vn−1 }. 1 Initialize σout ← σin and divide r into two independent parts r1 , r2 2 if vi ∈ V \ M then return σout 3 Recall β = β(Φ, M) from (3) and define distribution Dv⋆i over Ω⋆ vi by setting ( max {0, 1 − β · (1 − Dvi (q))} P Dv⋆i (q) = 1 − q′ ∈Ωv Dv⋆i (q ′ ) i 4 5 6 7 8 Dv⋆i q ∈ Ωvi , q = ⋆. Sample c1 ∼ using r1 and update σout (vi ) ← c1 if c1 6= ⋆ then return σout  (Φ′ , Token) ← Component(Φ, M, σout , vi ) where Φ′ = V ′ , (Ωv |σout , Dv |σout )v∈V ′ , C ′ if Token = False then return σout Define distribution Dv†i over Ωvi by setting Dv†i (q) = Prσ′ ∼µC′ [σ ′ (vi ) = q] for q ∈ Ωvi . True 9 10 11 Define distribution Dv′ i over Ωvi by setting Dv′ i (q) = Sample c2 ∼ Dv′ i using r2 and update σout (vi ) ← c2 return σout Dv†i (q)−Dv⋆i (q) Dv⋆i (⋆) for q ∈ Ωvi We say σ1 ∈ σ2 for some σ1 , σ2 ∈ Λ⋆ if σ2 (v) ∈ {σ1 (v), ⋆} for all v ∈ V . Fact 4.24. When σ1 , σ2 ∈ Λ, σ1 ∈ σ2 iff σ1 = σ2 . Now we verify the following properties of Algorithm 8. Proposition 4.25. If eα∆ ≤ 1 then the following holds for gi : Λ⋆ × [0, 1] → Λ⋆ from Algorithm 8. (1) All the distributions in Algorithm 8 are well-defined. (2) gi (σ1 , r) ∈ gi (σ2 , r) holds for any r ∈ [0, 1] and σ1 , σ2 ∈ Λ⋆ with σ1 ∈ σ2 . (3) gi restricted on Λ is a coupling of Pi . (4) For any t ≡ i mod n, gi is the same update procedure as the t-th for iteration in Algorithm 5. Proof. First we prove Item (1). Note that Dv⋆i is the same one as in SafeSampling(Φ, M, vi, ·). Thus it is well-defined by Item (1) of Proposition 3.4. Now we assume Token = True to check Dv†i ′ and Dv′ i . By Item (2) of Proposition 3.3, Dv†i is well-defined since µCTrue is well-defined. By Item (3) of Proposition 3.4, Dv⋆i (q) ≤ Dv†i (q) for any q ∈ Ωvi . Hence Dv′ i is also well-defined. For Item (2), we have the following cases, each of which satisfies gi (σ1 , r) ∈ gi (σ2 , r). 30 • c1 6= ⋆. Then both σ1 (vi ) and σ2 (vi ) are updated to c1 . • c1 = ⋆ and Token = False for σ2 . Then σ2 (vi ) is updated to ⋆. • c1 = ⋆ and Token = True for σ2 . Since σ1 ∈ σ2 , Token also equals True for σ1 . Moreover they get the same CSP from Line 6. Thus σ1 (vi ) and σ2 (vi ) are updated to the same value c2 . Item (3) is obviously true if vi ∈ V \ M thus we assume vi ∈ M. Observe that Dv†i (q) = Dv′ i (q) · Dv⋆i (⋆) + Dv⋆i (q) for any q ∈ Ωvi . Then Item (3) follows from Item (4) of Proposition 4.21 with L = R = i. Finally we prove Item (4). Assume vi ∈ M since otherwise it is trivial. To obtain the pseudocode in Algorithm 5, we reorganize Algorithm 8 by first set σout ← ⋆ and call Component(Φ, M, σout, vi ), then based on the value of Token we either (A) sample c1 only then update σout (vi ) ← c1 , or (B) sample both c1 and c2 then update σout (vi ) ← c1 if c1 6= ⋆; and σout (vi ) ← c2 if otherwise. The former is SafeSampling, and the latter, executed jointly, is exactly RejectionSampling as we analyzed for Item (3). One more ingredient we need is the following well-known Borel-Cantelli theorem. Theorem 4.26 P (Borel-Cantelli Theorem, e.g., [GS01, Section 7.3]). Let T be a non-negative random variable. If +∞ i=0 Pr [T > i] < +∞ then T < +∞ almost surely. Now we are ready to prove Proposition 4.17. Proof of Proposition 4.17. Assume the while iterations in Algorithm 4 stop at T = TFinal or TFinal = +∞ if iterations never end. Since each iteration halts almost surely by Proposition 4.5, TFinal is well-defined almost surely. Let V = (v0 , . . . , vn−1 ) and define it = t mod n for all t ∈ Z as in Algorithm 5. Define random functions GL,R : Λ⋆ → Λ⋆ over random (rt )t∈Z for −∞ < L ≤ R < +∞ as  rR−1 rR GL,R (a) = gR gR−1 for all a ∈ Λ⋆ , · · · gLrL (a) · · · where gt = git : Λ⋆ × [0, 1] → Λ⋆ is from Algorithm 8 and gtr (·) = gt (·, r). Let M1 ≥ 1 be the smallest integer such that G−M1 ,−1 is a constant function on Λ, i.e., G−M1 ,−1 (Λ) ≡ A. By Item (3) of Proposition 4.25 and Lemma 4.23, G−M1′ ,−1 (Λ) ≡ A for all M1′ ≥ M1 , and A is distributed as µM if M1 < +∞ almost surely. Let M2 ≥ 1 be the smallest integer such that G−M2 ,−1 (⋆V ) ∈ Λ. Iteratively applying Item (2) of Proposition 4.25, we know G−M2 ,−1 (Λ) ∈ G−M2 ,−1 (⋆V ). Then by Fact 4.24, G−M2 ,−1 (Λ) is constant. Thus M2 ≥ M1 and G−M2 ,−1 (⋆V ) = A. By Item (4) of Proposition 4.25, BoundingChain(Φ, M, −T, r−T , . . . , r−1 ) equals G−T,−1 (⋆V ), which means TFinal ≥ M2 . Thus the final assignment σ e equals A and has distribution µM provided TFinal < +∞ almost surely. Now we only need to show TFinal < +∞ almost surely. Note that either TFinal = 1 or, by Algorithm 4, BoundingChain(Φ, M, −TFinal /2, r−TFinal /2 , . . . , r−1 ) = G−TFinal /2,−1 (⋆V ) ∈ / Λ. Thus TFinal ≤ 2 · M2 , which means it suffices to show M2 < +∞ almost surely. By Item (3) of Proposition 4.25 and the analysis above, G−i,−1 (⋆V ) = A ∈ Λ for all i ≥ M2 ; thus +∞ X i=0 Pr [M2 > i] ≤ 2n − 1 + +∞ X i=2n−1   Pr G−i,−1 (⋆V ) ∈ /Λ 31 = 2n − 1 + ≤ 2n − 1 + +∞ X i=2n−1 +∞ X i=2n−1 Pr [BoundingChain(Φ, M, −i, r−i , . . . , r−1 ) ∈ / Λ] i 4n · 2− n < +∞, (by Proposition 4.5) as desired by Theorem 4.26. 4.3 The FinalSampling Subroutine Finally we give the missing FinalSampling(Φ, M, e σ) subroutine, which simply completes the assignment on V \ M for σ e. Algorithm 9: The FinalSampling subroutine  Input: an atomic CSP Φ = V, (Ωv , Dv )v∈V , C , a marking M ⊆ V , and an assignment  Q V \M σ e∈ v∈M Ωv × {⋆} C Output: an assignment σ ∈ σTrue e, v) for all v ∈ V 1 Φv = Vv , (Ωv |σ e , Dv |σ e )v∈Vv , Cv ← Component(Φ, M, σ /* Ignore the returned Token since it is always True here */ 2 Initialize V ′ ← ∅ 3 while ∃v ∈ V \ V ′ do 4 σv ← RejectionSampling(Φv , rv ) /* rv is a fresh new randomness tape */ ′ 5 Assign σ(Vv ) ← σv (Vv ) and update V ← V ′ ∪ Vv 6 end 7 return σ We observe the following results regarding Algorithm 9. Lemma 4.27. If eα∆ ≤ 1, then the following holds for FinalSampling(Φ, M, e σ). C| σ e • It halts almost surely, and outputs σ ∼ µTrue when it halts.   P • Its expected total running time is at most O kdQ v∈V (1 − eα)−|Cv | . Proof. All the Φv can be easily constructed with one pass of C and V which takes time O (k|C| + |V |). Cv By Proposition  3.3, each iteration of Line 4 halts almost surely and generates σv ∼ µTrue in expected time O (kQ |Cv | + Q) · (1 − eα)−|Cv | . Therefore FinalSampling halts almost surely and its expected total running time is at most   X O k|C| + |V | + (kQ |Cv | + Q) · (1 − eα)−|Cv |  v needed by Line 4 ! X kQ |Cv | + Q −|Cv | · (1 − eα) = O k|C| + |V | + |Vv | v∈V ! X −|Cv | . (since |Cv | ≤ d |Vv | and |C| ≤ d|V |) ≤ O kdQ (1 − eα) v∈V 32 Moreover when FinalSampling halts, by iteratively applying Item (4) of Proposition 3.1 we have Y C|σe v σ∼ = µTrue . µCTrue v needed by Line 4 Combining Proposition 4.17, we analyze the performance of Algorithm 9 in Algorithm 4. Proposition 4.28. If eα∆ ≤ 1, e∆2 ρ ≤ 1/32, and ∆2 λ ≤ 1/16, then the following holds for the FinalSampling in Algorithm 4. • It halts almost surely, and outputs σ ∼ µCTrue when it halts.  • Its expected running time is at most O d2 kQ∆|V | . C . Define σ e by Proof. By Item (1) of Lemma 4.27, it halts almost surely. Fix an arbitrary σ ′ ∈ σTrue ′ V \M setting σ e(M) = σ (M) and σ e(V \ M) = ⋆ . Combining Proposition 4.17 and Definition 4.16, we have   C|σe ′   C|σe b(M) = σ ′ (M) · µTrue (σ ) Pr σ = σ ′ = µM (e σ ) · µTrue (σ ′ ) = Pr σ = σ b∼ = QPr v∈V Prσb∼ Q Dv  σ b∼µC True  C σ b(M) = σ ′ (M) σ b ∈ σTrue ·  σ ′ (M), σ b σ b(M) = ∈   Q C Prσb∼ v∈V Dv σ b ∈ σTrue v∈V Dv σ′′ ∼ C σTrue  · QPr v∈V Dv |σe h i C|σe σ ′′ = σ ′ σ ′′ ∈ σTrue Prσ′′ ∼Qv∈V Prσ′′ ∼Qv∈V Dv |σe Dv |σe h [σ ′′ = σ ′ ] C| σ e σ ′′ ∈ σTrue i. ′ By Definition v if v ∈ V \ M; and equals the point distribution of σ (v) if v ∈ M. Q 2.5, Dv |σe equals DQ Let σ1 ∼ v∈V \M Dv and σ2 ∼ v∈M Dv be independent. Then Prσ′′ ∼Qv∈V Prσ′′ ∼Qv∈V Dv |σe Dv |σe h [σ ′′ = σ ′ ] C| σ e σ ′′ ∈ σTrue i= = Prσ1 [σ1 = σ ′ (V \ M)] h i C|σe Prσ1 σ1 ◦ σ ′ (M) ∈ σTrue (◦ represents vector concatenation) Prσ1 ,σ2 [σ1 = σ ′ (V \ M), σ2 = σ ′ (M)] h i C|σe , σ2 = σ ′ (M) Prσ1 ,σ2 σ1 ◦ σ ′ (M) ∈ σTrue (since σ1 , σ2 are independent) = = = σ′ ] Pr [σ ◦ σ2 = h σ1 ,σ2 1 i C|σe , σ2 = σ ′ (M) Prσ1 ,σ2 σ1 ◦ σ2 ∈ σTrue Prσb∼Qv∈V Dv [b σ = σ′] h i C|σe Prσb∼Qv∈V Dv σ b ∈ σTrue ,σ b(M) = σ ′ (M) Prσb∼Qv∈V Dv [b σ = σ′]   C ,σ Prσb∼Qv∈V Dv σ b ∈ σTrue b(M) = σ ′ (M) C| σ e where the last equality is because when σ b(M) = σ ′ (M), we have σ b(M) = σ e(M) and thus σ b ∈ σTrue C . Hence in all, we have iff σ b ∈ σTrue  ′  Pr σ = σ = as desired. [b σ = σ′]  = QPr  C σ b∼ v∈V b ∈ σTrue Dv σ Prσb∼Qv∈V Prσb∼Qv∈V Dv 33 Dv   C σ b = σ′ σ b ∈ σTrue = µCTrue (σ ′ ) Note that µM is a stationary distribution for Pi by Item (3) Proposition 4.21. Meanwhile by Item (4) of Proposition 4.25, the t-th for iteration in BoundingChain is a coupling of Pit . Thus in FinalSampling, upon receiving σ e which has distribution µM , we can execute |V | more rounds e using fresh randomness, and the resulted assignment still has of Line 3-11 in Algorithm 5 on σ M distribution µ . In other words, we may safely assume the last |V | rounds of update in the final BoundingChain procedure are all using RejectionSampling. Thus each |Cv | in Lemma 4.27 also satisfies the concentration bound in Proposition 4.11. By a similar calculation in the proof of the efficiency part of Proposition 4.5 and noticing |C| ≤ d|V |, the expected running time here is at most !  +∞  X  1 ℓ−1 ℓ+1 2 = O d2 kQ∆|V | . ·∆·4 O d kQ|V | 32 ℓ=1 4.4 Putting Everything Together Now we put everything together to prove our main theorem. Proof of Theorem 4.1. The correctness part follows immediately from Proposition 4.28 and Proposition 4.17. Thus recall measures defined in Definition 2.1 and we focus on the efficiency part. • Let X be the total running time of AtomicCSPSampling(Φ, π). • Let A be the time for computing β(Φ, M) in (3). Then A = O(k|C|) ≤ O(dk|V |). • For integer i ≥ 1 and j ∈ [i], let Xi,j be the running for iteration in h i time of the (−j)-th  2 2 5 2 BoundingChain(Φ, M, −i, r−i, . . . , r−1 ). Then E Xi,j = O dk ∆ Q by Proposition 4.5. • Let TFinal be the T when the while iterations stop. Pr [TFinal ≥ t] ≤ 4|V | · 2−t/|V | for t ≥ 2|V | − 1. Then by Proposition 4.5, we have • Let Y be the running time of the FinalSampling in the end. Then by Proposition 4.28 we have E[Y ] = O d2 kQ∆|V | . Therefore we have X = A + Plog(TFinal ) P2t j=1 X2t ,j t=0 + Y 15 and t +∞ X 2 X   E X2t ,j · [TFinal ≥ 2t ] E[X] = E[A] + E[Y ] + t=0 j=1 ≤ E[A] + E[Y ] + +∞ X 2t r X t=0 j=1 t ≤ E[A] + E[Y ] + m X 2 X t=0 j=1 ≤ O d2 kQ∆|V | + p i h E X22t ,j Pr [TFinal ≥ 2t ] (by Cauchy-Schwarz inequality) r h +∞ X 2t r h i i X 2 E X2t ,j + E X22t ,j Pr [TFinal ≥ 2t ] dk2 ∆5 Q2 · t=m+1 j=1 (m ≥ ⌊log(2|V | − 1)⌋ to be determined later) !! +∞ X p 2t − |V m t 2 + |V | . 2 ·2 | t=m+1 15 Technically we also need to initialize the randomness at Line 1 of Algorithm 4, and check for Line 5 in Algorithm 4, and initialize the assignment at Line 1 of Algorithm 5. However these can be done on the fly and their cost will be minor compared with the parts we listed. 34 We pick m = ⌈log(|V |) + log log(|V |) + 10⌉ then m 2 + p |V | +∞ X t 2 t− |V | 2 t=m+1 m ≤2 + p p |V | Z +∞ x 2 x− |V | 2 2x (2x− n is decreasing when 2x ≥ dx m m |V | −2 · 2 |V | 2 ln (2) = O (|V | log(|V |)) .  Since d ≤ ∆, we have E[X] = O kQ∆3 |V | log(|V |) . = 2m + |V | · (since  −n ln2 (2) 2x · 2− n ′ n ln(2) ) 2x = 2x− n ) Remark 4.29. Computing higher moments of Xi,j , Y and using possibly stronger assumption, one can improve the dependency on k, ∆, Q in the expected running time. However we view these as constants compared with |V |. Thus we do not make the effort here. 5 Applications Here we instantiate Theorem 4.1 to special CSPs. We will use the following algorithmic Lovász local lemma for constructing the marking M.  C Theorem 5.1 ([MT10]). Let Φ = V, (Ωv , Dv )v∈V , C be a CSP. If ep∆ ≤ 1, then σTrue 6= ∅ and C there exists a randomized algorithm which outputs some σ ∈ σTrue in time O(k∆|V |) with probability at least 0.99. We define the smooth parameter of a CSP Φ by κ = κ(Φ) = max max ′ v∈V q,q ∈Ωv Dv (q) . Dv (q ′ ) (11) Note that κ(Φ) ≥ 1 always. When context is clear, we will simply write κ. 5.1 Binary Domains Let M ⊆ V be a marking. We specialize α, β, λ when all domains are of size 2. • α = α(Φ, M) = maxC∈C α(Φ, M, C) where α(Φ, M, C) = Y v∈vbl(C)\M C Dv (σFalse (v)); • When eα ≤ 1, define – β = β(Φ, M) = (1 − eα)−d ≤ (1 − eα)−∆ ; – λ = λ(Φ, M) = maxC∈C λ(Φ, M, C) where λ(Φ, M, C) = |vbl(C)|2 · β |vbl(C)∩M| · Y v∈vbl(C)∩M By Remark 4.2, we will use the following version of Theorem 4.1. 35 C Dv (σFalse (v)).  Theorem 5.2 (Theorem 4.1, Binary Domains). Let Φ = V, (Ωv , Dv )v∈V , C be an atomic CSP such that |Ωv | = 2 for all v ∈ V . Let M ⊆ V be a marking. If e · α(Φ, M, C) · ∆ ≤ 1 and ∆2 · λ(Φ, M, C) ≤ 1/100 for all C ∈ C, then the following holds for AtomicCSPSampling(Φ, M). • Correctness. It halts almost surely and outputs σ ∼ µCTrue when it halts.  • Efficiency. Its expected total running time is O k∆3 |V | log(|V |) . We now construct a valid marking when the underlying distributions are arbitrary.  Lemma 5.3. Let Φ = V, (Ωv , Dv )v∈V , C be an atomic CSP such that |Ωv | = 2 for all v ∈ V . For any ζ ∈ (0, 1), if pγ · ∆ ≤ 0.01 · ζ/κ where q 3 − 9ζ + ln(κ + 1) − ln2 (κ + 1) + 6 · (1 − 3ζ) ln(κ + 1) , γ= 9 then there exists a marking M ⊆ V such that e · α(Φ, M, C) · ∆ ≤ 1 and ∆2 · λ(Φ, M, C) ≤ 1/100 for all C ∈ C. Moreover M can be constructed in time O(k∆|V |) with success probability at least 0.99. Q C (v)). Then p = p(Φ) = max Proof. For each C ∈ C, define pC = v∈vbl(C) Dv (σFalse C∈C pC . Mean|vbl(C)| |vbl(C)| while by the definition of κ, we know (1/(κ + 1)) ≤ pC ≤ (κ/(κ + 1)) . Thus ln(1/pC ) . ln(1 + 1/κ) |vbl(C)| ≤ (12) Since xζ ≥ ζ ln(x) holds for any x > 0, we also have ln(1/pC ) ≤ p−ζ C /ζ. (13) Let η, τ ∈ (0, 1) be parameters and M be the marking to be determined later. We will ensure 1 − η − τ ≥ 0 and η − τ − 3ζ ≥ 0. For each C ∈ C, let EC be the event (i.e., constraint)   Y C EC = “ ln  Dv (σFalse (v)) − η ln(pC ) > τ ln(1/pC ) ”. v∈vbl(C)∩M Now we check e · α(Φ, M, C) · ∆ ≤ 1 and ∆2 · λ(Φ, M, C) ≤ 1/100 assuming no EC happens. Since vbl(C) is the disjoint union of vbl(C) ∩ M and vbl(C) \ M, if EC does not happen, then α(Φ, M, C) ≤ p1−η−τ ≤ p1−η−τ . C Since (1 − x)−1/x ≤ 4 holds for any x ∈ (0, 1/2], we have β ≤ 1−e·p  1−η−τ −∆ e·p1−η−τ ·∆ ≤4 ≤  κ+1 κ ζ 36 if 2e ln(2)·p1−η−τ ·∆ ≤ ζ ·ln (1 + 1/κ) . (14) Note that 2e ln(2) · p1−η−τ · ∆ ≤ ζ · ln (1 + 1/κ) already implies e · α(Φ, M, C) · ∆ ≤ 1. Combining (12), (13), and (14), we also have ≤ λ(Φ, M, C) ≤ |vbl(C)|2 · β |vbl(C)| pη−τ C −ζ −3ζ ln2 (1/pC ) · pη−τ pη−τ pη−τ −3ζ C C ≤ ≤ . ln2 (1 + 1/κ) ζ 2 ln2 (1 + 1/κ) ζ 2 ln2 (1 + 1/κ) Therefore ∆2 · λ(Φ, M, C) ≤ 1/100 is reduced to ∆2 · pη−τ −3ζ ≤ 0.01 · ζ 2 · ln2 (1 + 1/κ). In all, it suffices to make sure 1 − η − τ ≥ 0, η − τ − 3ζ ≥ 0, and 2e ln(2) · p1−η−τ · ∆ ≤ ζ · ln(1 + 1/κ), and ∆2 · pη−τ −3ζ ≤ 0.01 · ζ 2 · ln2 (1 + 1/κ). (15) Now we show how to set η, τ and construct M to make sure no EC happens. We put each v ∈ V into M independently with probability η. For each v ∈ V , let xv ∈ {0, 1} be the indicator for whether v is in M. Then   X X   C C (v))  > τ ln(1/pC ) ”. xv ln 1/Dv (σFalse (v)) − E  xv ln 1/Dv (σFalse EC = “ v∈vbl(C) v∈vbl(C) By Hoeffding’s inequality [Hoe94, Theorem 2], we have ( ) −2τ 2 ln2 (1/pC )  Pr [EC ] ≤ 2 exp P 2 C v∈vbl(C) ln 1/Dv (σFalse (v)) ) ( −2τ 2 ln2 (1/pC )  P (by the definition of κ) ≤ 2 exp C (v)) ln(κ + 1) · v∈vbl(C) ln 1/Dv (σFalse    P −2τ 2 ln(1/pC ) C (v)) ) (since ln(pC ) = v∈vbl(C) ln Dv (σFalse = 2 exp ln(κ + 1) 2τ 2 / ln(κ+1) = 2 · pC ≤ 2 · p2τ 2 / ln(κ+1) . Since vbl(EC ) = vbl(C) and thus it correlates with ∆ many EC ′ (including itself), by Theorem 5.1 we can construct M (i.e., (xv )v∈V ) to avoid all EC in time O(k∆|V |) with probability at least 0.99 as long as 2 2e · p2τ / ln(κ+1) · ∆ ≤ 1. Now we set η = (2 − τ + 3ζ)/3 and q − ln(κ + 1) + ln2 (κ + 1) + 6 · (1 − 3ζ) ln(κ + 1) 1 − 3ζ < τ= 6 2 so that 1 − η − τ , (η − τ − 3ζ)/2, and 2τ 2 / ln(κ + 1) all equal q − ln(κ + 1) + ln2 (κ + 1) + 6 · (1 − 3ζ) ln(κ + 1) 1 > 0. γ = −ζ − 3 9 Then all the conditions in (15) boil down to pγ · ∆ ≤ 0.1 · ζ · ln(1 + 1/κ). Since κ ≥ 1 and ln(1 + x) ≥ 0.1x for all x ∈ [0, 1], we can safely replace ln(1 + 1/κ) with 0.1/κ as desired in the statement. 37 Note that whether M satisfies the conditions in Theorem 5.2 can be easily checked in time O(k|C|) = O(k∆|V |) by computing α(Φ, M, C) and λ(Φ, M, C) for each C ∈ C. Thus we can keep performing Lemma 5.3 until the marking M is valid and then we run AtomicCSPSampling(Φ, M). This provides a Las Vegas algorithm as below. Corollary 5.4. There exists a Las Vegas algorithm which takes as input an atomic CSP Φ =  V, (Ωv , Dv )v∈V , C and a parameter ζ ∈ (0, 1) such that the following holds. If |Ωv | = 2 holds for all v ∈ V and pγ · ∆ ≤ 0.01 · ζ/κ where q 3 − 9ζ + ln(κ + 1) − ln2 (κ + 1) + 6 · (1 − 3ζ) ln(κ + 1) , γ= 9 then the algorithm outputs a random solution of Φ distributed perfectly as µCTrue in expected time O k∆3 |V | log(|V |) . Remark 5.5. One natural choice of the underlying distributions is the uniform distribution. In this case κ = 1. By setting ζ → 0, we have q 3 + ln(2) − ln2 (2) + 6 ln(2) > 0.171. γ→ 9 For example we can set ζ = 10−10 and the local lemma condition is simply p0.171 · ∆ ≤ 10−12 /κ. In Lemma 5.15 we will optimize it to 0.175 by a tighter concentration bound. 5.2 Large Domains: State Tensorization Here we formally introduce the state tensorization technique, generalizing state compression from [FHY20]. This, as emphasized in Subsection 1.3, allows us to transform a large domain into a product of binary domains. Let Ω be a finite domain of size at least 2 and D be a distribution supported on Ω. A state tensorization for (Ω, D) (See Figure 2 for a concrete example) is a rooted tree T where • T has |Ω| leaves and each internal node of T has at least two child nodes; • the leaves of T have a one-to-one correspondence with elements in Ω. For each node z ∈ T , let leafs(z) be the set of leaves in the sub-tree of z. Then leafs(rt) = Ω for root rt. For each internal node z, we use childs(z) to denote its child nodes. For any z ′ ∈ childs(z), we use z → z ′ to denote the edge from z to z ′ . Moreover, we define the weight of z → z ′ as P q∈leafs(z ′ ) D(q) ′ W (z → z ) = P . (16) q∈leafs(z) D(q) It is easy to see the total weight of outgoing edges of any internal node is 1. If |Ω| = 1 and thus D is the point distribution, then the state tensorization T for (Ω, D) has two nodes z and z ′ where z is the root and z ′ is the only leaf and W (z → z ′ ) = 1. For each q ∈ Ω, we use path(q, T ) to denote the set of internal nodes in T on the path from the root to the leaf z that corresponds to q. For example in Figure 2, path(b, T ) = {z0 , z1 , z3 }. We first observe the following fact regarding edge weights in T . Fact 5.6. Let q ∈ Ω be arbitrary. Let path(q, T ) = {z0 , . . . , zℓ } and zℓ+1 beQthe leaf node corresponding to q. Assume z0 , . . . , zℓ+1 is in the root-to-leaf order. Then D(q) = ℓi=0 W (zi → zi+1 ). 38 z0 3 5 2 5 z1 5 6 State tensorization T for (Ω, D) z3 1 3 a 1 3 1 3 z2 1 6 1 4 d e 3 4 f c b Figure 2: One example of T for (Ω, D) where Ω = {a, b, c, d, e, f } and D(a) = D(b) = D(c) = 1/6, D(d) = D(e) = 1/10, D(f ) = 3/10. We omit the leaf nodes. Proof. By (16), we have ℓ Y i=0 W (zi → zi+1 ) = ℓ Y P ′ q ′ ∈leafs(zi+1 ) D(q ) P ′ q ′ ∈leafs(zi ) D(q ) i=0 P ′ q ′ ∈leafs(zℓ+1 ) D(q ) ′ q ′ ∈leafs(z0 ) D(q ) = P = D(q) Now we move to atomic CSPs and show formally how state tensorization helps reduce domain sizes.  Definition 5.7 (Tensorized Atomic Constraint Satisfaction Problem). Let Φ = V, (Ωv , Dv )v∈V , C be an atomic CSP. Let (Tv )v∈V be state tensorizations  where each Tv is a state tensorization for ⊗ ⊗ 16 as the tensorized atomic constraint satisfac(Ωv , Dv ). We construct Φ = Z, (Ωz , Dz )z∈Z , C tion problem:17 • Z is the set of internal nodes of all Tv . • For each z ∈ Z, Ωz = childs(z) and Dz is a distribution supported on Ωz by setting Dz (z ′ ) = W (z → z ′ ) for all z ′ ∈ childs(z). • For each C ∈ C, we construct C ⊗ ∈ C ⊗ by setting [ C vbl(C ⊗ ) = path(σFalse (v), Tv ) v∈vbl(C) C C (v)) where and C ⊗ (σ) = False iff σ(z) = z(σFalse (v)) for all v ∈ vbl(C) and z ∈ path(σFalse 18 z(q) ∈ childs(z) is the child node of z such that q ∈ leafs(z(q)). For example in Figure 2, we have Ωz0 = {z1 , z2 } and Dz0 (z1 ) = 3/5, Dz0 (z2 ) = 2/5. Assume constraint C is false iff d is assigned. Then C ⊗ will have vbl(C ⊗ ) = path(d, T ) = {z0 , z1 } and, for σ ∈ Ωz0 × Ωz1 × Ωz2 × Ωz3 , C ⊗ (σ) iff σ(z0 ) = z1 and σ(z1 ) equals the leaf node corresponding to d. Recall measures defined in Definition 2.1. Here we note some basic facts for Definition 5.7. 16 We require each Tv uses different set of tree nodes. So there will be no confusion when using z without explicitly providing Tv ∋ z. 17 Technically Φ⊗ depends on (Tv )v∈V which we omit here for simplicity. In addition we remark Φ⊗ may have redundant variables that do not appear in any constraint; this is indeed consistent with our definition of CSPs as we never require them to shave redundant variables. 18 Assume path(q, Tv ) = {z0 , . . . , zℓ } where z0 , . . . , zℓ is in the top-down order of Tv . Let zℓ+1 be the leaf node corresponding to q. Then zi (q) = zi+1 for each i ∈ {0, . . . , ℓ}. 39 Fact 5.8. The following holds for Φ⊗ . (1) Φ⊗ is an atomic CSP where |C ⊗ | = |C| and |Z| = (2) Q(Φ⊗ ) = maxz∈Z |childs(z)| and k(Φ⊗ ) P v∈V | {internal nodes in Tv } | ≤ Q(Φ)|V |. C (v), T ) ≤ k(Φ) · maxC∈C,v∈vbl(C) path(σFalse v (3) ∆(Φ⊗ ) = ∆(Φ), d(Φ⊗ ) = d(Φ), and p(Φ⊗ ) = p(Φ). Moreover for all C ∈ C we have  ⊗ ′  [C(σ) = False] = C (σ ) = False . Q Pr Q Pr σ∼ v∈vbl(C) σ′ ∼ Dv z∈vbl(C ⊗ ) Dz Proof. Item (1) (2) are obvious. Now we focus on Item (3). Note that for any C ∈ C and v ∈ V , v ∈ vbl(C) iff the root of Tv is in vbl(C ⊗ ). On the other hand if some internal node z is in vbl(C ⊗ ) then all the ancestors of z are also in vbl(C ⊗ ). Thus ∆(Φ⊗ ) = ∆(Φ) and d(Φ⊗ ) = d(Φ). The “moreover” part and p(Φ⊗ ) = p(Φ) follow from Fact 5.6. Now Q we formally describe how to Q translate an assignment for Φ⊗ into an assignment for Φ. For Trans any σ ∈ z∈Z Ωz , we define σ ∈ v∈V Ωv by the following process for each v ∈ V : • We start from the root of Tv . • If we are at an internal node z of Tv , then proceed to its child node σ(z) and repeat. • Otherwise we are at a leaf node, then set σ Trans (v) by its corresponding value in Ωv . Finally we prove the following simple but powerful reduction result. ⊗ Proposition 5.9. If σ ∼ µCTrue , then σ Trans ∼ µCTrue . Therefore to obtain a random solution of Φ distributed perfectly as µCTrue , it suffices to have a ⊗ random solution of Φ⊗ distributed perfectly as µCTrue and then perform Trans operation. Proof. Recall RejectionSampling(Φ⊗, ·) in Algorithm 2: Q (1) Sample σ ∼ z∈Z Dz . (2) If C ⊗ (σ) = True for all C ⊗ ∈ C ⊗ , then accept σ; otherwise resample σ. ⊗ Obviously σ ∼ µCTrue . By the definition of C ⊗ in Definition 5.7 and the definition of Trans above, we have C ⊗ (σ) = True iff C(σ Trans ) = True. Thus we can safely replace Step (2) by (2a) If C(σ Trans ) = True for all C ∈ C, then accept σ; otherwise resample σ. On the other hand for Step (1), we can first sample the σ Trans part and then complete it to σ: (1a) For each v ∈ V , we start from the root of Tv . – If we are at an internal node z of Tv , then sample σ(z) ∼ Dz and proceed to σ(z). – Otherwise we are at a leaf node, then move to the next variable in V . (1b) For each z ∈ Z that σ(z) is not sampled in Step (1a), we complete it by σ(z) ∼ Dz . Q By Fact 5.6 and the definition of Trans, Step (1a) is equivalent to sample σ Trans ∼ v∈V Dv and σ Trans does not depend on the values sampled in Step (1b). More intuitively, we express this hybrid argument on rejection sampling through the following equivalence of flow charts: equals equals equals Step (1) ⇆ Step (2) → σ Trans Step (1a) (1b) ⇆ Step (2a) → σ Trans Step (1a) ⇆ Step (2a) → Step (1b) → σ Trans Step (1a) ⇆ Step (2a) → σ Trans , where the last line is exactly RejectionSampling(Φ, ·). Thus σ Trans ∼ µCTrue . 40 5.3 General Atomic Constraint Satisfaction Problem: Arbitrary Distribution Now we deal with general atomic CSPs where domains may be large. Firstly we prove the following lemma which describes a balanced way to construct state tensorization. Lemma 5.10. There exists a deterministic algorithm such that the following holds. Let Ω be a finite domain of size at least 2 and D be a distribution supported on Ω. Let κ = maxq,q′ ∈Ω D(q)/D(q ′ ). The algorithm constructs a state tensorization T for (Ω, D) where (1) the algorithm runs in time O(|Ω| log(|Ω|)) and T is a binary tree; (2) for any internal node z ∈ T and {z1 , z2 } = childs(z), we have W (z→z1 ) W (z→z2 ) ≤ max {κ, 2}. Proof. T is constructed like a Huffman tree as follows: • For each q ∈ Ω create a node zq with value val(zq ) = D(q). Initialize set S = {zq | q ∈ Ω}. • While |S| ≥ 2, – select two distinct nodes z1 , z2 ∈ S with minimum value, i.e., val(z1 ), val(z2 ) are the smallest among all nodes in S, – create a parent node z of z1 and z2 ; then set val(z) = val(z1 ) + val(z2 ) and update val(z2 ) 1) S ← (S ∪ {z}) \ {z1 , z2 }. Note that W (z → z1 ) = val(z val(z) and W (z → z2 ) = val(z) . • The final node in S when |S| = 1 is the root of T . Then Item (1) is obvious if S is implemented as a balanced binary search tree or a heap. Now we turn to Item (2). Define κ(S) = maxz,z ′ ∈S val(z)/val(z ′ ) when |S| ≥ 2. Then each time W (z→z1 ) we select z1 , z2 ∈ S and link them to z, we have W (z→z2 ) ≤ κ(S). Therefore it suffices to show κ(S) ≤ max {κ, 2} throughout the construction. • Initialization. Then we simply have κ(S) = κ. • Afterwards. Assume S = {z1 , z2 , . . . , zℓ } for ℓ ≥ 2 where val(z1 ) ≤ val(z2 ) ≤ · · · ≤ val(zℓ ). Let S ′ = {z, z3 , . . . , zℓ } be S after the update where val(z) = val(z1 ) + val(z2 ). Now we have two possible cases. – If val(z) ≤ val(zℓ ), then ′ κ(S ) = max  val(zℓ ) val(zℓ ) , val(z) val(z3 )  ≤ val(zℓ ) = κ(S) ≤ max {κ, 2} . val(z1 ) – If val(z) > val(zℓ ), then κ(S ′ ) = val(z) val(z1 ) + val(z2 ) = ≤ 2. val(z3 ) val(z3 ) By Lemma 5.10, Proposition 5.9, and Corollary 5.4, we have the following theorem. Theorem 5.11. There exists a Las Vegas algorithm which takes as input an atomic CSP Φ =  V, (Ωv , Dv )v∈V , C and a parameter ζ ∈ (0, 1) such that the following holds. κ where Let κ e = max {κ, 2} (Recall κ = κ(Φ) from (11)). If pγ · ∆ ≤ 0.01 · ζ/e q 3 − 9ζ + ln(e κ + 1) − ln2 (e κ + 1) + 6 · (1 − 3ζ) ln(e κ + 1) , γ= 9 then the algorithm outputs a random solution of Φ distributed perfectly as µCTrue in expected time  O k∆3 Q2 |V | log(Q|V |) . 41 Proof. By fixing the variable which has domain size 1, we may safely assume |Ωv | ≥ 2 for all v ∈ V . We use Lemma 5.10 to construct Tv for each (Ωv , Dv ). This takes time O(Q(Φ) log(Q(Φ))) · |V |. Then let Φ⊗ = Z, (Ωz , Dz )z∈Z , C ⊗ be the tensorized atomic CSP (Defined in Definition 5.7). By Item (2) of Lemma 5.10, we have κ(Φ⊗ ) ≤ max {κ(Φ), 2}. Also by Fact 5.8, we have ∆(Φ⊗ ) = ∆(Φ), p(Φ⊗ ) = p(Φ), k(Φ⊗ ) ≤ k(Φ) · Q(Φ)19 , and |Z| ≤ Q(Φ)|V |. By Proposition 5.9, it suffices ⊗ to obtain a random solution of Φ⊗ distributed perfectly as µCTrue , which, by Corollary 5.4, gives the claimed bounds. Remark 5.12. Applying Theorem 5.11 to the uniform distributions, i.e., κ = 1, and setting ζ → 0, we have q 3 + ln(3) − ln2 (3) + 6 ln(3) > 0.145. γ→ 9 This already beats 1/7 from [JPV20] and 0.142 from [JPV21]. In Corollary 5.18 we will further optimize it to 0.175 with a more refined construction. 5.4 Hypergraph Coloring The previous bounds can be improved if the domains are large enough and the underlying distributions are smooth. Here we take the hypergraph coloring problem as an example. Definition 5.13 (Hypergraph Coloring). Let Q and k be positive integers. Let H = (V, E) be a k-uniform hypergraph, i.e., each edge e ∈ E contains exactly k distinct variables. We associate it with an atomic CSP Φ = Φ(H, Q) = V, ([Q], U )V , C where U is the uniform distribution over [Q]  and C = Ce,i : [Q]V → {True, False} e ∈ E, i ∈ [Q] and Ce,i (σ) = False iff σ(v) = i for all v ∈ e. A solution to Φ is called a proper coloring for H. To avoid confusion, k, d, ∆ will only be referred to k(H), d(H), ∆(H). It is easy to see k(Φ) = k, d(Φ) = Q · d, ∆(Φ) = Q · ∆, Q(Φ) = Q, and p(Φ) = Q−k . Theorem 5.14. There exists a Las Vegas algorithm which takes as input a k-uniform hypergraph H = (V, E) and an integer Q. If ∆ = ∆(H) ≤ Q(1/3−oQ,k (1))k , then the algorithm outputs  a perfect uniform random proper coloring for H in expected time O k∆3 Q4 log(Q)|V | log(Q|V |) . Proof. For each v ∈ V , we construct the state tensorization Tv for (Ωv , Dv ) = ([Q], U ) as a complete binary tree, i.e., Tv has depth D = ⌈log(Q)⌉ and Tv has 2i nodes at level i for all 0 ≤ i < ⌈log(Q)⌉. Given the state tensorizations, we obtain the tensorized atomic CSP Φ⊗ = Z, (Ωz , Dz )z∈Z , C ⊗ ⊗ for Φ = Φ(H, Q). By Definition 5.13 and Proposition 5.9 it suffices to obtain σ ∼ µCTrue . Define R = ⌊ 23 log(Q)⌋. We construct the marking M for Φ⊗ by putting all internal nodes in Tv of level at least R into M for each v ∈ V . To apply Theorem 5.2 to (Φ⊗ , M), we compute the constants α, β, λ (See in Subsection 5.1) for (Φ⊗ , M). C (v), T ) = {z , . . . , z } where z , . . . , z is Fix any C ⊗ ∈ C ⊗ and v ∈ vbl(C). Assume path(σFalse v 0 0 ℓ ℓ in the top-down order of Tv . Then ℓ ∈ {D − 2, D − 1} and zR , . . . , zℓ ∈ M. Since Tv is a complete binary tree, by (16) and the definition of Dz in Definition 5.7, if ℓ ≥ R then we have ℓ Y i=R ⊗ C Dzi (σFalse (zi )) = 1 ≤ 2−(ℓ−R) ≤ 2−(D−2−R) ≤ 4/Q1/3 . |leafs(zR )| ) by analyzing the depth It is possible to get a better bound on k(Φ⊗ ) (for example k(Φ⊗ ) ≤ k(Φ) · ln(κ(Φ)Q(Φ)+1) ln(1+1/κ(Φ)) of each Tv . This is because the construction of Tv is “balanced” and the support of Dv has size |Ωv | ≤ Q(Φ) only. However this only slightly improves the bound on the running time which is not our main focus. 19 42 Thus k  α(Φ⊗ , M, C ⊗ ) ≤ 4/Q1/3 . (17) By Item (3) of Fact 5.8, ∆(Φ⊗ ) = ∆(Φ) = Q · ∆. Since (1 − x)−1/x ≤ 4 holds for any x ∈ (0, 1/2], we have k  1 1 1/3 k . (18) β ≤ 4e·(4/Q ) ·Q∆ ≤ 4 kD if e · 4/Q1/3 · Q∆ ≤ kD Meanwhile R−1 Y 2ℓ−R+2 2D−R+1 |leafs(zR )| C⊗ ≤ ≤ ≤ 8/Q2/3 . Dzi (σFalse (zi )) = Q Q Q i=0 By Item (2) of Fact 5.8, k(Φ⊗ ) ≤ k(Φ) · D = kD. Thus combining (18), we have k  λ(Φ⊗ , M, C ⊗ ) ≤ k2 D 2 · 4 · 8/Q2/3 . (19) Therefore, it suffices to make sure D − 2 ≥ R, k  and k2 D 2 · 4 · 8/Q2/3 · (Q∆)2 ≤ 1/100. k  1 , e · 4/Q1/3 · Q∆ ≤ kD Since D = ⌈log(Q)⌉ and R = ⌊ 23 log(Q)⌋, it suffices to ensure Q≥5 and ∆ ≤ Q1/3 4 !k · 1 = Q(1/3−oQ,k (1))k . 40kQ log(Q) Now we compute the running time. Note that the reduction to Φ⊗ and the construction of ⊗ M only take O(Q|V |) time. Since |Z| ≤ Q|V |, k(Φ⊗ ) = O(k log(Q)),  and ∆(Φ ) = Q∆, by 3 4 Theorem 5.2 the algorithm runs in time O k∆ Q log(Q)|V | log(Q|V |) . 5.5 General Atomic Constraint Satisfaction Problem: Uniform Distribution The analysis in Subsection 5.3 is purely a black-box reduction from large domains to binary domains: the construction of the marking does not use the fact that the CSP after reduction is actually a tensorized atomic CSP. Here we provide a unified construction for the state tensorizations and the marking when the underlying  distributions are the uniform distribution. Let Φ = V, (Ωv , Dv )v∈V , C be an atomic CSP such that |Ωv | ≥ 2 and Dv is the uniform distribution for all v ∈ V . Our strategy is similar as before:  (1) Construct state tensorizations (Tv )v∈V to obtain Φ⊗ = Z, (Ωz , Dz )z∈Z , C ⊗ as the tensorized atomic CSP for Φ. We will make sure each Ωz is a binary domain, i.e., Q(Φ⊗ ) = 2, to apply Theorem 5.2. (2) Construct a marking M ⊆ Z to satisfy e · α(Φ⊗ , M, C ⊗ ) · ∆(Φ⊗ ) ≤ 1 and ∆(Φ⊗ )2 · λ(Φ⊗ , M, C ⊗ ) ≤ 1/100 for all C ⊗ ∈ C ⊗ , where we recall α, β, λ from Subsection 5.1. (3) Apply Theorem 5.2 to Φ⊗ and M. By Proposition 5.9, this provides a perfect uniform solution to Φ after performing Trans operation. 43 Q Q −1 ⊗ ⊗ C (v)) = For each C ∈ C, define pC = v∈vbl(C) Dv (σFalse v∈vbl(C) |Ωv | . For each C ∈ C , define Q C ⊗ (z)). To avoid confusion and by Item (3) of Fact 5.8, we will use ∆ for pC ⊗ = z∈vbl(C ⊗ ) Dz (σFalse both ∆(Φ⊗ ) and ∆(Φ); use p for both p(Φ⊗ ) and p(Φ), and use pC for both pC and pC ⊗ . We construct the state tensorization Tv for each (Ωv , Dv ) in a random and independent way. The marking M ⊆ Z will be constructed along with each Tv . This will be similar as Lemma 5.3: (Tv )v∈V and M are constructed with high success probability using Theorem 5.1. Recall path(·, ·) from Subsection 5.2 and z(·) from Definition 5.7. For each v ∈ V and q ∈ Ωv , define   Y X(v, q) = log  Dz (z(q)) . z∈path(q,Tv )∩M By Fact 5.6 and noticing Dv is uniform, we know   X(v, q) = log |Ωv |−1 − log  Y z∈path(q,Tv )\M  Dz (z(q)) . Recall the definition of C ⊗ from Definition 5.7, then we have X  C log α(Φ⊗ , M, C ⊗ ) = log(pC ) − X(v, σFalse (v)) (20) v∈vbl(C) −∆ and β = β(Φ⊗ , M) ≤ (1 − e · maxC ⊗ ∈C ⊗ α (Φ⊗ , M, C ⊗ )) . Then we also have   X  ⊗ 2 C (v)). log λ(Φ⊗ , M, C ⊗ ) ≤ log vbl(C ⊗ ) · β |vbl(C )| + X(v, σFalse v∈vbl(C) Let η, τ1 , τ2 , ζ ∈ (0, 1) be parameters to be determined later. We will ensure 1 − η − τ1 ≥ 0 and (1) η − τ2 − 3ζ ≥ 0. For each C ∈ C, let EC be the event (i.e., constraint)   X (1) C (v)) − η log(pC ) < −τ1 log(1/pC ) ” X(v, σFalse EC = “  v∈vbl(C) (2) and let EC be  (2) EC = “  X v∈vbl(C)  C X(v, σFalse (v)) − η log(pC ) > τ2 log(1/pC ) ”. (1) (2) Now we check e · α(Φ⊗ , M, C ⊗ ) · ∆ ≤ 1 and ∆2 · λ(Φ⊗ , M, C ⊗ ) ≤ 1/100 assuming no EC or EC 1−η−τ1 happens. Then firstly α(Φ⊗ , M, C ⊗ ) ≤ pC ≤ p1−η−τ1 . Since (1 − x)−1/x ≤ 4 holds for any x ∈ (0, 1/2], we have β ≤ 4e·p 1−η−τ1 ·∆ ≤ (3/2)ζ if 2e ln(2) · p1−η−τ1 · ∆ ≤ ζ · ln(3/2). Note that 2e ln(2) · p1−η−τ1 · ∆ ≤ ζ · ln(3/2) already implies e · α(Φ⊗ , M, C ⊗ ) · ∆ ≤ 1. Now we turn to λ. Since xζ ≥ ζ ln(x) holds for any x > 0, we have ln(1/pC ) ≤ p−ζ C /ζ. Our ⊗ )| |vbl(C ⊗ and state tensorizations will have κ(Φ ) ≤ 2 (Recall κ(·) from (11)), thus pC ≤ (2/3) 2 λ(Φ⊗ , M, C ⊗ ) ≤ vbl(C ⊗ ) · β |vbl(C ⊗ )| η−τ2 pC ≤ η−τ2 −3ζ η−τ2 −ζ pC ln2 (1/pC ) · pC pη−τ2 −3ζ ≤ ≤ . ζ 2 ln2 (3/2) ζ 2 ln2 (3/2) ζ 2 ln2 (3/2) 44 Therefore ∆2 · λ(Φ⊗ , M, C ⊗ ) ≤ 1/100 is reduced to ∆2 · pη−τ2 −3ζ ≤ 0.01 · ζ 2 · ln2 (3/2). In all, it suffices to make sure we can construct the state tensorizations and the marking to (2) (1) avoid all EC and EC with the following additional conditions: 2e ln(2) · p 1−η−τ1 1 − η − τ1 ≥ 0, · ∆ ≤ ζ · ln(3/2), η − τ2 − 3ζ ≥ 0, and ∆2 · pη−τ2 −3ζ ≤ 0.01 · ζ 2 · ln2 (3/2). (21) We first consider the simple case where all domains are binary to verify Remark 5.5. Lemma 5.15. Assume |Ωv | = 2 for all v ∈ V . If p0.175 · ∆ ≤ 10−7 , then Φ⊗ and M can be constructed in time O(k(Φ) · ∆|V |) with success probability at least 0.99 such that Q(Φ⊗ ) = 2, κ(Φ⊗ ) = 1, and e · α(Φ⊗ , M, C ⊗ ) · ∆ ≤ 1 and ∆2 · λ(Φ⊗ , M, C ⊗ ) ≤ 1/100 for all C ⊗ ∈ C ⊗ . Proof. Here we simply let each Tv have one root with two child nodes. Then Φ⊗ is essentially Φ itself. We put each variable into M with probability η independently. Then for each v ∈ V and q ∈ Ωv , we have (   −1 with probability η, 1 X(v, q) = log · [v ∈ M] = 2 0 with probability 1 − η.   Hence for any t ∈ R, we have E et·X(v,q) = 1 − η + η · e−t . Note that pC = 2−|vbl(C)| . Let t1 , t2 > 0 be some parameters to be optimized soon. Then we have   i h X (1) C Pr EC = Pr −t1 · X(v, σFalse (v)) > (η + τ1 ) · t1 · |vbl(C)| v∈vbl(C)      X C = Pr exp −t1 · (v)) > exp {(η + τ1 ) · t1 · |vbl(C)|} X(v, σFalse   v∈vbl(C)      X C ≤ E exp −t1 · X(v, σFalse (v))  · e−(η+τ1 )·t1 ·|vbl(C)| (by Markov’s inequality)    v∈vbl(C) =  = |vbl(C)|  1 − η + η · et1 · e−(η+τ1 )·t1 (since X(v, q)’s are independent for different v) ! 1−η−τ1  η+τ1 |vbl(C)|  η 1−η 1) (setting et1 = (1−η)(η+τ η·(1−η−τ1 ) ) 1 − η − τ1 η + τ1 = 2−|vbl(C)|·KL(η+τ1 kη) ≤ pKL(η+τ1 kη) , where KL(akb) = a log h (2) Pr EC i a b   + (1 − a) log = Pr t2 · ≤  X v∈vbl(C)  1−a 1−b  (since 2−|vbl(C)| = pC ≤ p) is the Kullback-Leibler divergence. Similarly  C (v)) > −(η − τ2 ) · t2 · |vbl(C)| X(v, σFalse |vbl(C)|  1 − η + η · e−t2 · e(η−τ2 )·t2 45 =  1−η 1 − η + τ2 1−η+τ2  ≤ pKL(η−τ2 kη) . (1) η η − τ2 η−τ2 !|vbl(C)| (setting e−t2 = (1−η)(η−τ2 ) η·(1−η+τ2 ) ) (2) Let EC = EC ∨ EC . Since each variable is put into M independently, EC depends only on the constructions over vbl(C) and thus correlates with at most ∆ many EC ′ (including itself) and Pr [EC ] ≤ 2 · pmin{KL(η+τ1 kη),KL(η−τ2 kη)} . To apply Theorem 5.1, it suffices to make sure 2e · pmin{KL(η+τ1 kη),KL(η−τ2 kη)} · ∆ ≤ 1. (22) Combining (21) and (22), we can pick η, τ1 , τ2 , ζ satisfying 1 − η − τ1 ≥ 0, η − τ2 − 3ζ ≥ 0; then let   η − τ2 − 3ζ , KL(η + τ1 kη), KL(η − τ2 kη) γ = min 1 − η − τ1 , 2 and all the conditions boil down to pγ · ∆ ≤ 0.01 · ζ. Numerically maximizing γ, we have γ = 0.175, together with η = 0.595, τ1 = 0.23, τ2 = 0.245 − 3 · 10−5 , and ζ = 10−5 . Now we proceed to the general case. Restricted by the binary case, we cannot hope for a better bound than p0.175 · ∆ . 1. Therefore our goal is to show this bound is obtainable using the numerical constants η, τ1 , τ2 , ζ determined above. Lemma 5.16. Assume |Ωv | ≥ 2 for all v ∈ V . If p0.175 · ∆ ≤ 10−7 , then Φ⊗ and M can be constructed in time O(k(Φ)Q(Φ) · ∆|V |) with success probability at least 0.99 such that Q(Φ⊗ ) = 2, κ(Φ⊗ ) ≤ 2, and e · α(Φ⊗ , M, C ⊗ ) · ∆ ≤ 1 and ∆2 · λ(Φ⊗ , M, C ⊗ ) ≤ 1/100 for all C ⊗ ∈ C ⊗ . Proof. Note that the choice of η, τ1 , τ2 , ζ above and the assumption p0.175 · ∆ ≤ 10−7 already ensure (21), which guarantees e · α(Φ⊗ , M, C ⊗ ) · ∆ ≤ 1 and ∆2 · λ(Φ⊗ , M, C ⊗ ) ≤ 1/100 for all C ⊗ ∈ C ⊗ . Thus in the following we only need to show how to construct the state tensorizations (and thus (2) (1) Φ⊗ ) and the marking M so that Q(Φ⊗ ) = 2, κ(Φ⊗ ) ≤ 2, and no EC or EC happens. For each C ∈ C and integer m ≥ 2, define S(C, m) = {v ∈ vbl(C) | |Ωv | = m}. Then pC = Q+∞ −|S(C,m)| . Let t , t > 0 be the constants used in the proof of Lemma 5.15, i.e., 1 2 m=2 m     (1 − η)(η + τ1 ) η · (1 − η + τ2 ) t1 = ln ∈ [1.1659, 1.1660] and t2 = ln ∈ [1.0035, 1.0036] . η · (1 − η − τ1 ) (1 − η)(η − τ2 ) Our construction will satisfy the following proposition. We will provide its detail and proof soon. The key part is Item (3) which intuitively says the simple binary case is actually the worst case. Proposition 5.17. For each v ∈ V , the construction of the state tensorization Tv and the marking on internal nodes of Tv is randomized and satisfies the following properties. (1) For each v ∈ V , the construction is independent and takes O(|Ωv |) = O(Q(Φ)) time. 46 (z→z1 ) (2) Each possible Tv is a binary tree where W W (z→z2 ) ≤ 2 (Recall W (·) from (16)) holds for any internal node z ∈ Tv and {z1 , z2 } = childs(z). (3) For each v ∈ V and q ∈ Ωv , we have i h E e−t1 ·X(v,q)−(η+τ1 )·t1 ·log(|Ωv |) ≤ |Ωv |−KL(η+τ1 kη) and i h E et2 ·X(v,q)+(η−τ2 )·t2 ·log(|Ωv |) ≤ |Ωv |−KL(η−τ2 kη) . We first finish the proof assuming Proposition 5.17. Note that both Q(Φ⊗ ) = 2 and κ(Φ⊗ ) ≤ 2 (2) (1) follow from Item (2) of Proposition 5.17. Thus we focus on how to make sure no EC or EC happens. Similarly as in the proof of Lemma 5.16, we have   +∞ +∞ i h X X X (1) C |S(C, m)| · log(m) Pr EC = Pr −t1 · X(v, σFalse (v)) > (η + τ1 ) · t1 ·   +∞ Y Y ≤ E exp =  −t1 · m=2 v∈S(C,m) ≤ +∞ Y m=2 m=2 v∈S(C,m)  Y +∞ X X m=2 v∈S(C,m) C (v)) − (η + τ1 ) · t1 · X(v, σFalse +∞ X m=2 i h C E e−t1 ·X(v,σFalse (v))−(η+τ1 )·t1 ·log(m) (by Item (1) of Proposition 5.17) m−KL(η+τ1 kη) (by Item (3) of Proposition 5.17) m=2 v∈S(C,m) KL(η+τ1 kη) = pC ≤ pKL(η+τ1 kη) . (since pC = We also have h (2) Pr EC   |S(C, m)| · log(m)   i  = Pr t2 · ≤ +∞ Y +∞ X X m=2 v∈S(C,m) Y m=2 v∈S(C,m) C X(v, σFalse (v)) > −(η − τ2 ) · t2 · i h C E et2 ·X(v,σFalse (v))+(η−τ2 )·t2 ·log(m) +∞ X m=2 Q+∞ m=2 m −|S(C,m)| )  |S(C, m)| · log(m) ≤ pKL(η−τ2 kη) . (1) (2) Let EC = EC ∨ EC . By Item (1) of Proposition 5.17, the constructions are independent for each variable. Therefore EC depends only on the constructions over vbl(C) and thus correlates with at most ∆ many EC ′ (including itself) and Pr [EC ] ≤ 2 · pmin{KL(η+τ1 keta),KL(η−τ2 kη)} ≤ 2 · p0.175 . By our assumption p0.175 · ∆ ≤ 10−7 , we apply Theorem 5.1 to find a construction of the state (2) (1) tensorizations (and thus Φ⊗ ) and the marking M to avoid all EC = EC ∨ EC . The running time is O(k(Φ)∆|V |) · O(Q(Φ)) where O(Q(Φ)) is from Item (1) of Proposition 5.17. Putting everything together, we have the following corollary which justifies Remark 5.12. 47 Corollary 5.18. There exists a Las Vegas algorithm which takes as input an atomic CSP φ = V, (Ωv , Dv )v∈V , C such that the following holds. If each Dv is the uniform distribution and p0.175 · ∆ ≤ 10−7 , then the algorithm outputs a perfect uniform random solution of Φ in expected time O k∆3 Q2 |V | log(Q|V |) . Proof. By fixing the variable which has domain size 1, we may safely assume |Ωv | ≥ 2 for all v ∈ V .  We keep using Lemma 5.16 to construct Φ⊗ = Z, (Ωz , Dz )z∈Z , C ⊗ and M until they satisfy the conditions in Theorem 5.2. Then we run the algorithm in Theorem 5.2 to obtain a random ⊗ solution of Φ⊗ distributed perfectly as µCTrue . Then by Proposition 5.9, we perform Trans operation to obtain σ Trans ∼ µCTrue which is just a perfect uniform random solution of Φ. To check the running time, by Fact 5.8 we have ∆(Φ⊗ ) = ∆(Φ), p(Φ⊗ ) = p(Φ), k(Φ⊗ ) ≤ k(Φ)·Q(Φ)20 , and |Z| ≤ Q(Φ)|V |. Therefore by Lemma 5.16 and Theorem 5.2, it takes O(kQ∆|V |) time in expectation to construct Φ⊗ and M, and O k∆3 Q2 |V | log(Q|V |) time in expectation to do the sampling. Before proving Proposition 5.17, we set up some technical lemmas. Fact 5.19 (e.g., [Hoe94, Lemma 1 and Equation (4.16)]). Let X be an arbitrary random variable with a ≤ X ≤ b almost surely. Let c = E[X]. Then for any t ∈ R, we have   b − c a·t c − a b·t (b−a)2 t2 · ec·t . E et·X ≤ ·e + ·e ≤e 8 b−a b−a Fact 5.20. Our choice of η, τ1 , τ2 , t1 , t2 satisfies    log2 (3)t22 log2 (3)t21 KL(η+τ1 kη) ≤ τ t − ≤ τ 2 t2 − log(x) and 1 1 8 8 log(e) KL(η−τ2 kη) log(e) and t21 8 and  ≤ τ 1 t1 − log2 (5/2)t21 8 KL(η+τ1 kη) log(e)  ≤ τ 1 t1 −  log(x) and KL(η+τ1 kη) log(e)  t22 8  ≤ τ 2 t2 − log(x) and KL(η−τ2 kη) log(e) log2 (5/2)t22 8   log(x) for all x ≥ 8, log(x) for all x ∈ {3, 4, 6, 7} ,  ≤ τ 2 t2 − KL(η−τ2 kη) log(e)  log(x) for x = 5. Proof. Since log(x) is an increasing function for x > 0, it only needs to verify the first two inequalities at x = 8 and the middle two at x = 3. Then all of them can be verified numerically. Now we give the proof of Proposition 5.17. Proof of Proposition 5.17. Our construction will be independent for each v ∈ V and depend only on |Ωv |. Item (1) (2) will be evident as we describe the construction. Let N = |Ωv | ≥ 2. The simplest case N = 2 is essentially Lemma 5.15: Tv has one root with two leaf nodes, and we put the root into M with probability η. By the choice of t1 , t2 (See the calculation in Lemma 5.15), the two inequalities in Item (3) are actually equality. With foresight, our construction for N ≥ 3 will satisfy the following conditions. (A) E [X(v, q)] = η log(1/N ) for each q ∈ Ωv . 20 Actually we can upper bound k(Φ⊗ ) by k(Φ) · O(log(Q(Φ))). This is because the state tensorizations here are all “balanced” binary trees of depth O(log(Q(Φ))). However this only slightly improves the bound on the running time which is not our main focus. 48 (B) If N = 5, then X(v, q) ∈ [aN , bN ] always holds for all q ∈ Ωv where bN − aN ≤ log(5/2). (C) If N ∈ {3, 4, 6, 7}, then X(v, q) ∈ [aN , bN ] always holds for all q ∈ Ωv where bN − aN ≤ 1. (D) If N ≥ 8, then X(v, q) ∈ [aN , bN ] always holds for all q ∈ Ωv where bN − aN ≤ log(3). Then we verify Item (3) given Item (A) (B) (C) (D) here: h i h i E e−t1 ·X(v,q)−(η+τ1 )·t1 ·log(N ) = E e−t1 ·(X(v,q)−η log(1/N ))−τ1 ·t1 ·log(N ) n o (bN −aN )2 t21 ≤ exp − τ · t · log(N ) (by Fact 5.19 and Item (A)) 1 1 8 o n ) = N −KL(η+τ1 kη) ≤ exp −KL(η + τ1 kη) · log(N log(e) (by Item (B) (C) (D) and Fact 5.20) and similarly i i h h t2 ·(X(v,q)−η log(1/N ))−τ2 ·t2 ·log(N ) t2 ·X(v,q)+(η−τ2 )·t2 ·log(N ) =E e E e n o (bN −aN )2 t22 ≤ exp − τ · t · log(N ) 2 2 8 ≤ N −KL(η−τ2 kη) . (by Fact 5.19 and Item (A)) (by Item (B) (C) (D) and Fact 5.20) Now we give the construction for N ≥ 3 and check Item (A) (B) (C) (D). N = 5. Let p5 ∈ [0, 1] be a constant to be determined. The construction for N = 5 is as follows. • In Figure 3, we select the left tree with probability p5 and the right with probability 1 − p5 . • After fixing the tree, we assign elements in Ωv to the leaf nodes uniformly to obtain Tv . • Finally we put the internal nodes boxed by dashed squares into the marking M. z0 z1 z0 z2 z1 z3 z2 z3 Figure 3: The construction for N = 5. Internal nodes boxed by dashed squares are put in M. P Recall X(v, q) = z∈path(q,Tv )∩M log(Dz (z(q))) where z(q) ∈ childs(z) is the child node of z such that q ∈ leafs(z(q)). Since the leaf nodes are uniform, for any fixed q ∈ Ωv we have E [X(v, q) | the left tree] = (4 log(2/5) + log(1/5)) /5 < η log(1/5) and E [X(v, q) | the right tree] = (3 log(1/3) + 2 log(1/2)) /5 > η log(1/5). Thus we can easily set p5 to make sure E [X(v, q)] = p5 · E [X(v, q) | the left tree] + (1 − p5 ) · E [X(v, q) | the right tree] = η log(1/5), which proves Item (A). As for Item (B), it suffices to observe X(v, q) ∈ {log(1/5), log(1/3), log(2/5), log(1/2)} ⊆ [log(1/5), log(1/2)]. 49 N ∈ {3, 4, 6, 7}. The construction for N ∈ {3, 4, 6, 7} is the same as N = 5 above except that we now use Figure 4: we will mix the left and right trees to make sure E [X(v, q)] = η log(1/N ) as demanded by Item (A). To do so it suffices to check η log(1/N ) is sandwiched between E [X(v, q) | the left tree] and E [X(v, q) | the right tree]. z0 z0 z1 z0 z1 z3 z1 (a) The construction for N = 3. z0 z2 z4 z1 z3 z4 (b) The construction for N = 6. z0 z0 z1 z1 z0 z2 z1 z2 z3 z2 (c) The construction for N = 4. z0 z2 z4 z1 z5 z3 z2 z4 z5 (d) The construction for N = 7. Figure 4: The construction for N ∈ {3, 4, 6, 7}. For N = 3 we use Figure 4a. Then X(v, q) ∈ {log(1/3), log(2/3)} which proves Item (C). Also E [X(v, q) | the left tree] = log(1/3) < η log(1/3) and E [X(v, q) | the right tree] = (log(1/3) + 2 log(2/3)) /3 > η log(1/3). For N = 4 we use Figure 4c. Then X(v, q) ∈ {log(1/2), log(1/4)} which proves Item (C). Also E [X(v, q) | the left tree] = log(1/2) > η log(1/4) and E [X(v, q) | the right tree] = log(1/4) < η log(1/4). For N = 6 we use Figure 4b. Then X(v, q) ∈ {log(1/3), log(1/2)} which proves Item (C). Also E [X(v, q) | the left tree] = log(1/3) < η log(1/6) and E [X(v, q) | the right tree] = log(1/2) > η log(1/6). For N = 7 we use Figure 4d. Then X(v, q) ∈ {log(2/7), log(3/7), log(1/4), log(1/3)} which proves Item (C). Also E [X(v, q) | the left tree] = (4 log(2/7) + 3 log(3/7)) /7 > η log(1/7) and E [X(v, q) | the right tree] = (4 log(1/4) + 3 log(1/3)) /7 < η log(1/7). 50 N ≥ 8. For each x ∈ [N ] define A(N, x) = ⌊N/x⌋ · (x + 1) − N and B(N, x) = N − ⌊N/x⌋ · x. Let R = ⌊N 1−η ⌋. The proof of the following technical result is deferred to Appendix A. Fact 5.21. If x ∈ {R − 1, R, R + 1}, then 1 ≤ x ≤ N and A(N, x), B(N, x) ≥ 0. For each x ∈ {R − 1, R, R + 1} we have the following construction (See Figure 5 for an intuition), the correctness of which is guaranteed by Fact 5.21. • Let T1 , . . . , TA(N,x) (resp., TA(N,x)+1 , . . . , TA(N,x)+B(N,x) ) be balanced binary trees that each has x (resp., x + 1) leaf nodes.21  • Define distribution Dx supported on T1 , . . . , TA(N,x)+B(N,x) by setting ( x/N Dx (Ti ) = (x + 1)/N i ≤ A(N, x), i > A(N, x).  Then construct a binary tree on top of T1 , . . . , TA(N,x)+B(N,x) using Lemma 5.10 and Dx .22 • Assign elements in Ωv uniformly to A(N, x) · x + B(N, x) · (x + 1) = N leaf nodes to get Tv .  • Finally we put all the nodes on top of T1 , . . . , TA(N,x)+B(N,x) into the marking M. T1 TA(N,x)+B(N,x) T2 Figure 5: The construction for N ≥ 8 and x ∈ {R − 1, R, R + 1}. Nodes inside the dashed square are put in M. Observe that for any fixed x ∈ {R − 1, R, R + 1} and any q ∈ Ωv , the construction gives X(v, q) ∈ {log(x/N ), log((x + 1)/N )} and         x  B(N, x) · (x + 1) x+1 x+1 x A(N, x) · x + , log log log ∈ log . E [X(v, q) | x] = N N N N N N Therefore E [X(v, q) | x = R + 1] ≥ log((R + 1)/N ) = log and   ⌊N 1−η ⌋ + 1 /N ≥ η log(1/N )  E [X(v, q) | x = R − 1] ≤ log(R/N ) = log ⌊N 1−η ⌋/N ≤ η log(1/N ). Now we have two cases: 21 Here “balanced” simply means the sub-trees of sibling nodes have size difference at most 1. In particular, this guarantees Item (2) of Proposition 5.17. 22 Note that κ(Dx ) ≤ (x + 1)/x ≤ 2. By Item (2) of Lemma 5.10, Item (2) of Proposition 5.17 is still preserved. Meanwhile, since A(N, x) + B(N, x) ≤ N/x = O(N η ), this step takes O(N η log(N )) = O(N ) time. 51 • If E [X(v, q) | x = R] ≥ η log(1/N )23 , we mix the constructionof x = R−1 make  and Rx= R toR+1  R−1 sure E [X(v, q)] = η log(1/N ) for Item (A). Then X(v, q) ∈ log N , log N , log N which verifies Item (D) since R ≥ 2. • Otherwise, we mix ofx = R and x = R + 1 in the similar way. Then  the construction  R R+2 X(v, q) ∈ log N , log R+1 , log which also verifies Item (D). N N Acknowledgement KH wants to thank Weiming Feng for helpful discussion. KW wants to thank Chao Liao, Pinyan Lu, Jiaheng Wang, Kuan Yang, Yitong Yin, Chihao Zhang for helpful discussion on related topics in summer 2020. KW also wants to thank Kuan Yang for the help with Mathematica. Finally we thank Weiming Feng, Chunyang Wang, and Kuan Yang for helpful comments on an earlier version of the paper. References [ACG12] Ittai Abraham, Shiri Chechik, and Cyril Gavoille. Fully dynamic approximate distance oracles for planar graphs via forbidden-set distance labels. In Proceedings of the fortyfourth annual ACM symposium on Theory of computing, pages 1199–1218. ACM, 2012. 4 [Alo91] Noga Alon. A parallel algorithmic version of the local lemma. Random Struct. Algorithms, 2(4):367–378, 1991. (Conference version in FOCS ’91). doi:10.1002/rsa.3240020403. 2 [BC20] Siddharth Bhandari and Sayantan Chakraborty. Improved bounds for perfect sampling of k-colorings in graphs. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020, pages 631–642. ACM, 2020. doi:10.1145/3357713.3384244. 4 [Bec91] József Beck. An algorithmic approach to the Lovász local lemma. Random Struct. Algorithms, 2(4):343–365, 1991. doi:10.1002/rsa.3240020402. 2 [BGG+ 19] Ivona Bezáková, Andreas Galanis, Leslie A. Goldberg, Heng Guo, and Daniel Štefankovič. Approximation via correlation decay when strong spatial mixing fails. SIAM J. Comput., 48(2):279–349, 2019. doi:10.1137/16M1083906. 2 [BSVV08] Ivona Bezáková, Daniel Stefankovic, Vijay V. Vazirani, and Eric Vigoda. Accelerating simulated annealing for the permanent and combinatorial counting problems. SIAM J. Comput., 37(5):1429–1454, 2008. doi:10.1137/050644033. 5 [CS00] Artur Czumaj and Christian Scheideler. Coloring nonuniform hypergraphs: a new algorithmic approach to the general Lovász local lemma. Random Struct. Algorithms, 17(3-4):213–237, 2000. doi:10.1002/1098-2418(200010/12)17:3/4<213::AID-RSA3>3.0.CO;2-Y. 2 23 We emphasize that E [X(v, q) | x] has the same value for all q ∈ Ωv . 52 [EL75] Paul Erdös and László Lovász. Problems and results on 3-chromatic hypergraphs and some related questions. Infinite and finite sets, 11:609–627, 1975. 2, 4, 11 [FGYZ20] Weiming Feng, Heng Guo, Yitong Yin, and Chihao Zhang. Fast sampling and counting k-SAT solutions in the local lemma regime. In STOC, pages 854–867. ACM, 2020. doi:10.1145/3357713.3384255. 2, 3, 4, 5, 6, 19 [FHY20] Weiming Feng, Kun He, and Yitong Yin. Sampling constraint satisfaction solutions in the local lemma regime. arXiv preprint arXiv:2011.03915, 2020. 2, 3, 4, 5, 6, 9, 38 [Fil97] James Allen Fill. An interruptible algorithm for perfect sampling via Markov chains. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC), pages 688–695, 1997. 4 [FMMR00] James Allen Fill, Motoya Machida, Duncan J Murdoch, and Jeffrey S Rosenthal. Extension of Fill’s perfect rejection sampling algorithm to general chains. Random Structures & Algorithms, 17(3-4):290–316, 2000. 4 [FVY19] Weiming Feng, Nisheeth K. Vishnoi, and Yitong Yin. Dynamic sampling from graphical models. In STOC, pages 1070–1081. ACM, 2019. 4 [GGGY20] Andreas Galanis, Leslie Ann Goldberg, Heng Guo, and Kuan Yang. Counting solutions to random CNF formulas. In ICALP, volume 168 of LIPIcs, pages 53:1–53:14, 2020. doi:10.4230/LIPIcs.ICALP.2020.53. 2, 3 [GH20] Heng Guo and Kun He. Tight bounds for popping algorithms. Random Struct. Algorithms, 57(2):371–392, 2020. doi:10.1002/rsa.20928. 2 [GJ19] Heng Guo and Mark Jerrum. A polynomial-time approximation algorithm for allterminal network reliability. SIAM J. Comput., 48(3):964–978, 2019. 2 [GJL19] Heng Guo, Mark Jerrum, and Jingcheng Liu. Uniform sampling through the Lovász local lemma. J. ACM, 66(3):18:1–18:31, 2019. (Conference version in STOC ’17). doi:10.1145/3310131. 2, 3 [GLLZ19] Heng Guo, Chao Liao, Pinyan Lu, and Chihao Zhang. Counting hypergraph colorings in the local lemma regime. SIAM J. Comput., 48(4):1397–1424, 2019. (Conference version in STOC ’18). doi:10.1137/18M1202955. 2, 3 [GS01] Geoffrey R. Grimmett and David R. Stirzaker. Probability and random processes. Oxford University Press, third edition, 2001. 31 [HDSMR16] Bryan He, Christopher De Sa, Ioannis Mitliagkas, and Christopher Ré. Scan order in gibbs sampling: Models in which it matters and bounds on how much. Advances in neural information processing systems, 29, 2016. 7 [HN99] Olle Haggstrom and Karin Nelander. On exact simulation of markov random fields using coupling from the past. Scandinavian Journal of Statistics, 26(3):395–411, 1999. 4, 7, 29 [Hoe94] Wassily Hoeffding. Probability inequalities for sums of bounded random variables. In The collected works of Wassily Hoeffding, pages 409–426. Springer, 1994. 37, 48 53 [HS17] David G. Harris and Aravind Srinivasan. A constructive Lovász local lemma for permutations. Theory Comput., 13:Paper No. 17, 41, 2017. (Conference version in SODA’14). doi:10.4086/toc.2017.v013a017. 2 [HS19] David G. Harris and Aravind Srinivasan. The Moser-Tardos framework with partial resampling. J. ACM, 66(5):Art. 36, 45, 2019. (Conference version in FOCS ’13). doi:10.1145/3342222. 2 [HSS11] Bernhard Haeupler, Barna Saha, and Aravind Srinivasan. New constructive aspects of the Lovász local lemma. J. ACM, 58(6):28, 2011. (Conference version in FOCS ’10). doi:10.1145/2049697.2049702. 2, 11 [HSZ19] Jonathan Hermon, Allan Sly, and Yumeng Zhang. Rapid mixing of hypergraph independent sets. Random Struct. Algorithms, 54(4):730–767, 2019. 8 [Hub98] Mark Huber. Exact sampling and approximate counting techniques. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 31–40. ACM, 1998. 4, 7, 29 [Hub04] Mark Huber. Perfect sampling using bounding chains. Ann. Appl. Probab., 14(2):734– 753, 2004. 4 [JPV20] Vishesh Jain, Huy Tuan Pham, and Thuy Duong Vuong. Towards the sampling lovász local lemma. CoRR, abs/2011.12196, 2020. URL: https://arxiv.org/abs/2011.12196, arXiv:2011.12196. 2, 3, 5, 42 [JPV21] Vishesh Jain, Huy Tuan Pham, and Thuy Duong Vuong. On the sampling lovász local lemma for atomic constraint satisfaction problems. CoRR, abs/2102.08342, 2021. URL: https://arxiv.org/abs/2102.08342, arXiv:2102.08342. 2, 3, 4, 5, 6, 8, 19, 42 [JSS20] Vishesh Jain, Ashwin Sah, and Mehtaab Sawhney. Perfectly sampling k≥ (8/3 +o(1))∆-colorings in graphs. CoRR, abs/2007.06360, 2020. URL: https://arxiv.org/abs/2007.06360, arXiv:2007.06360. 4 [JVV86a] Mark Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science, 43:169– 188, 1986. 2, 5 [JVV86b] Mark R. Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoret. Comput. Sci., 43:169– 188, 1986. doi:10.1016/0304-3975(86)90174-X. 3, 4 [KM11] Kolipaka Kashyap, Babu Rao and Szegedy Mario. Moser and Tardos meet Lovász. In STOC, pages 235–244, 2011. doi:10.1145/1993636.1993669. 2 [LP17] David A Levin and Yuval Peres. Markov chains and mixing times. American Mathematical Soc., 2017. doi:10.1090/mbk/107. 4, 27 [LS16] Eyal Lubetzky and Allan Sly. Information percolation and cutoff for the stochastic ising model. Journal of the American Mathematical Society, 29(3):729–774, 2016. 8 54 [Moi19] Ankur Moitra. Approximate counting, the Lovász local lemma, and inference in graphical models. J. ACM, 66(2):10:1–10:25, 2019. (Conference version in STOC ’17). doi:10.1145/3268930. 2, 3 [Mos09] Robin A. Moser. A constructive proof of the Lovász local lemma. In STOC, pages 343–350, 2009. doi:10.1145/1536414.1536462. 2 [MR98] Michael Molloy and Bruce Reed. Further algorithmic aspects of the local lemma. In STOC, pages 524–529, 1998. doi:10.1145/276698.276866. 2 [MT10] Robin A. Moser and Gábor Tardos. A constructive proof of the general Lovász local lemma. J. ACM, 57(2):11, 2010. doi:10.1145/1667053.1667060. 2, 4, 35 [PW96] James G. Propp and David B. Wilson. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures Algorithms, 9(1-2):223– 252, 1996. 4, 7, 29 [She85] James B. Shearer. On a problem of Spencer. Combinatorica, 5(3):241–245, 1985. doi:10.1007/BF02579368. 2 [ŠVV09a] Daniel Štefankovič, Santosh Vempala, and Eric Vigoda. Adaptive simulated annealing: A near-optimal connection between sampling and counting. J. ACM, 56(3):18, 2009. doi:10.1145/1516512.1516520. 3 [SVV09b] Daniel Stefankovic, Santosh S. Vempala, and Eric Vigoda. Adaptive simulated annealing: A near-optimal connection between sampling and counting. J. ACM, 56(3):18:1– 18:36, 2009. doi:10.1145/1516512.1516520. 5 [Wig19] Avi Wigderson. Mathematics and computation. Princeton University Press, 2019. 6 A Proof of Fact 5.21 Recall our setting: N ≥ 8 is an integer, R = ⌊N 1−η ⌋ where η = 0.595, A(N, x) = ⌊N/x⌋·(x+1)−N , and B(N, x) = N − ⌊N/x⌋ · x. Fact (Fact 5.21 restated). If x ∈ {R − 1, R, R + 1}, then 1 ≤ x ≤ N and A(N, x), B(N, x) ≥ 0. Proof. Since 1 ≤ x ≤ N is equivalent to 2 ≤ R ≤ N − 1, we only need to check √ 2 = ⌊81−η ⌋ ≤ ⌊N 1−η ⌋ = R ≤ N 1−η ≤ N ≤ N − 1. Also B(N, x) is always non-negative as ⌊N/x⌋ ≤ N/x. Hence we focus on the A(N, x) ≥ 0 part. Since ⌊t⌋ > t − 1, we have   N N − 1 · (x + 1) − N = − x − 1. A(N, x) > x x Since n > t is equivalent to n ≥ ⌊t + 1⌋ for all integer n, we have   N −x . A(N, x) ≥ x 55 √ √ Therefore if x ≤ N , then A(N, x) ≥ 0. Since R = ⌊N 1−η ⌋ ≤ N , this shows A(N, R −√1) and A(N, R) are all non-negative. Now√we deal with A(N, R + 1). Note that ⌊N 1−η ⌋ ≤ N 1−η ≤ N − 1 holds for all N ≥ 18, thus R + 1 ≤ N and A(N, R + 1) ≥ 0 for all N ≥ 18. Finally for 8 ≤ N ≤ 17, A(N, R + 1) ≥ 0 can be verified numerically. 56