A SAT Attack On The Erd Os Discrepancy Conjecture: N D 2d 3d KD K I 1 Id
A SAT Attack On The Erd Os Discrepancy Conjecture: N D 2d 3d KD K I 1 Id
A SAT Attack On The Erd Os Discrepancy Conjecture: N D 2d 3d KD K I 1 Id
Abstract In 1930s Paul Erdos conjectured that for any positive integer C in any
infinite 1 sequence (xn ) there exists a subsequence
xd , x2d , x3d , . . . , xkd , for
P
some positive integers k and d, such that | ki=1 xid |> C. The conjecture has
been referred to as one of the major open problems in combinatorial number theory and discrepancy theory. For the particular case of C = 1 a human proof of
the conjecture exists; for C = 2 a bespoke computer program had generated sequences of length 1124 of discrepancy 2, but the status of the conjecture remained
open even for such a small bound. We show that by encoding the problem into
Boolean satisfiability and applying the state of the art SAT solver, one can obtain
a discrepancy 2 sequence of length 1160 and a proof of the Erdos discrepancy
conjecture for C = 2, claiming that no discrepancy 2 sequence of length 1161,
or more, exists. We also present our partial results for the case of C = 3.
Introduction
Discrepancy theory is a branch of mathematics dealing with irregularities of distributions of points in some space in combinatorial, measure-theoretic and geometric settings [5,9,19,4]. The paradigmatic combinatorial discrepancy theory setting can be described in terms of a hypergraph H = (U, S), that is, a set U and a family of its subsets
S 2U . Consider a colouring c : U {+1, 1} of the elements of U in blue (+1) and
red (1) colours. Then one may ask whether there exists a colouring of the elements of
U such that in every element of S colours are distributed uniformly or a discrepancy of
colours is always inevitable. Formally, the discrepancy (deviation
from a uniform disP
tribution) of a hypergraph H is defined as minc (maxsS | es c(e)| ). Discrepancy
theory also has practical applications in computational complexity [9], complexity of
communication [1] and differential privacy [20].
One of the oldest problems of discrepancy theory is the discrepancy of hypergraphs
over the set of natural numbers with the subsets (hyperedges) forming arithmetical progressions over this set [18]. Roths theorem [22], one of the main results in the area,
states that for the hypergraph formed by the arithmetic progressions in {1, . . . , l}, that
is Hl = (Ul , Sl ), where Ul = {1, 2, . . . , l} and elements of Sl being of the form (ai+b)
1 1/4
for arbitrary a, b, the discrepancy grows at least as 20
l .
Surprisingly, for the more restricted case of homogeneous arithmetic progressions
of the form (ai), the question of the discrepancy bounds is open for more than eighty
years. In 1930s Paul Erdos conjectured [10] that the discrepancy is unbounded. This
conjecture became known as the Erdos discrepancy problem (EDP) and its proving or
disproving has been referred to as one of the major open problems in combinatorial
number theory and discrepancy theory [5,4,21].
SAT Encoding
start
+1
+1
+1
...
sC
s0
1
+1
...
sC
1
sB
1
+1
accept the subsequence xd , x2d , . . . , xkd , where k = b dl c, then the discrepancy of the
sequence x
does not exceed C.
The trace of the automaton AC on the subsequence xd , x2d , . . . , xkd can be encoded by a Boolean formula in the obvious way. To explain representation details, first
consider
(1,d)
(l,C,d) = s0
b dl c
i=1
Cj<C
sj
sj
(i,d)
(i,d)
pid
pid B
(i,d)
sC pid B ,
C<jC
(i,d)
sC
(i+1,d)
pid sj+1
(i+1,d)
sj1
(1)
(i,d)
where frame(l,C) is a Boolean formula encoding that the automaton state is correctly
(i,d)
defined, that is, exactly one proposition from each of the sets {sj
| C j C},
l
c and 1 i b dl c, is true in every model of (l,C) .
for d = 1, . . . , b C+1
The following statement can be easily proved by an investigation of models of (l,C)
and the traces of AC . Notice that although (l,C) encodes the traces of AC on all subsequences of x
they all share the same proposition Bas soon as the automaton accepts
any of these subsequences, the entire sequence should be rejected.
Proposition 1. The formula (l,C) is satisfiable if, and only if, there exists a 1 sequence x
= x1 , . . . , xl of length l of discrepancy C. Moreover, if (l,C) is satisfiable,
the sequence x
= x1 , . . . , xl of discrepancy C is uniquely identified by the assignment
of truth values to propositions p1 , . . . pl .
The encoding described above, albeit very natural, is quite wasteful: the size of
formula frame(l,C) is quadratic in the number of states. To reduce the size, in our implementation we use a slightly different encoding of the traces of AC . Namely, we
(i,d)
replace in (1) every occurrence of sj
with a conjunction of propositions representing
the numerical value of j in binary, where the most significant bit encodes the sign of j
and the other bits encode an unsigned number 0 . . . C in the usual way. We denote the
resulting formula b(l,C,d) .
For example, for C = 2 the values C . . . C can be represented in binary by 3 bits.
Then b(l,C,d) contains, for example,
(i,d)
(b2
(i,d)
b1
(i,d)
b0
(i+1,d)
) pid b2
(i+1,d)
b1
(i+1,d)
b0
l
c b dl c+1
b C+1
d=1
i=1
(i,d)
(b2
(i,d)
b1
(i,d)
(b2
(i,d)
b1
(i,d)
b0
(i,d)
b0 )
(i,d)
(i,d)
(i,d)
(b2 b1 b0 ) .
The first conjunct disallows the binary value 100, a negated zero, the other two encode
that AC , for C = 2, does not have neither s3 nor s3 . The following statement is a
direct consequence of Proposition 1.
Proposition 2. The formula
b(l,C)
l
b C+1
c
= B
b(l,C,d)
d=1
= x1 , . . . , x l
V
frameb(l,C) is satisfiable
of length l of discrepancy
C. Moreover, if (l,C) is satisfiable, the sequence x
= x1 , . . . , xl of discrepancy C is
uniquely identified by the assignment of truth values to propositions p1 , . . . pl .
Results
In our experiments we used the Lingeling SAT solver [7] version ats, the winner of
the SAT-UNSAT category of the SAT13 competition [3] and the Glucose solver [2]
version 3.0, the winner of the certified UNSAT category of the SAT13 competition [3].
All experiments were conducted on PCs equipped with an Intel Core i5-2500K CPU
running at 3.30GHz and 16GB of RAM.
By iteratively increasing the length of the sequence, we establish precisely that the
maximal length of a 1 sequence of discrepancy 2 is 1160. On our system it took
Plingeling, the parallel version of the Lingeling solver, about 800 seconds1 to find a
satisfying assignment. One of the sequences of length 1160 of discrepancy 2 can be
found in Appendix A for readers amusement.
1
The time taken by the solver varies significantly from experiment to experiment; in one rerun
it took the solver just 166.8 seconds to find a satisfying assignment.
Discussion
We have demonstrated that SAT-based methods can be used to tackle the longstanding
mathematical question on the discrepancy of 1 sequences. For EDP with C = 2
we have identified the exact boundary between satisfiability and unsatisfiability, that
is, we found the longest discrepancy 2 sequence and proved that no larger sequence of
discrepancy 2 exists. There is, however, a noticeable asymmetry between these findings.
The fact that a sequence of length 1160 has discrepancy 2 can be easily checked either
by a straightforward computer program or even manually. The negative witness, that is,
References
1. Alon, N.: Transmitting in the n-dimensional cube. Discrete Applied Mathematics 37/38, 9
11 (1992)
2. Audemard, G., Simon, L.: Glucose 2.3 in the SAT 2013 Competition. In: Proceedings of SAT
Competition 2013. pp. 4243. University of Helsinki (2013)
3. Balint, A., Belov, A., Heule, M.J.H., Jarvisalo, M. (eds.): Proceedings of SAT competition
2013. University of Helsinki (2013)
4. Beck, J., Sos, V.T.: Discrepancy theory. In: Graham, R.L., Grotschel, M., Lovasz, L. (eds.)
Handbook of combinatorics, vol. 2, pp. 14051446. Elsivier (1995)
5. Beck, J., Chen, W.W.L.: Irregularities of Distribution. Cambridge University Press (1987)
6. Biere, A.: Bounded model checking. In: Handbook of Satisfiability, Frontiers in Artificial
Intelligence and Applications, vol. 185, pp. 457481. IOS Press (2009)
7. Biere, A.: Lingeling, Plingeling and Treengeling entering the SAT Competition 2013. In:
Proceedings of SAT Competition 2013. pp. 5152. University of Helsinki (2013)
8. Borwein, P., Choi, S.K.K., Coons, M.: Completely multiplicative functions taking values in
{1, 1}. Transactions of the American Mathematical Society 362(12), 62796291 (2010)
9. Chazelle, B.: The Discrepancy Method: Randomness and Complexity. New York: Cambridge
University Press (2000)
10. Erdos, P.: Some unsolved problems. The Michigan Mathematical Journal 4(3), 291300
(1957)
11. Erdos discrepancy problem: Polymath wiki. http://michaelnielsen.org/
polymath1/index.php?title=The_Erd%C5%91s_discrepancy_problem,
accessed 29 January 2014
12. Goldberg, E.I., Novikov, Y.: Verification of proofs of unsatisfiability for CNF formulas. In:
Proceedings of Design, Automation and Test in Europe Conference and Exposition (DATE
2003), 3-7 March 2003, Munich, Germany. pp. 1088610891 (2003)
13. Gowers,
T.:
Is
massively
collaborative
mathematics
possible?
http://gowers.wordpress.com/2009/01/27/is-massively-collaborative-mathematics-possible/,
accessed 29 January 2014
14. Gowers, T.: Erdos and arithmetic progressoins. In: Erdos Centennial conference
(2013), http://www.renyi.hu/conferences/erdos100/program.html, accessed 29 January 2014
15. Heule, M.J.H.: DRUP checker. http://www.cs.utexas.edu/marijn/drup/,
accessed 29 January 2014
16. Konev, B., Lisitsa, A.: Addendum to: A SAT attack on the Erdos discrepancy conjecture.
http://www.csc.liv.ac.uk/konev/SAT14
17. Mathias, A.R.D.: On a conjecture of Erdos and Cudakov.
Combinatorics, geometry and probability (1993)
18. Matousek, J., Spencer, J.: Discrepancy in arithmetic progressions. Journal of the American
Mathematical Society 9(1), 195204 (1996)
19. Matousek, J.: Geometric Discrepancy: An Illustrated Guide, Algorithms and combinatorics,
vol. 18. Springer (1999)
20. Muthukrishnan, S., Nikolov, A.: Optimal private halfspace counting via discrepancy. In: Proceedings of the 44th Symposium on Theory of Computing. pp. 12851292. STOC 12, ACM,
New York, NY, USA (2012)
21. Nikolov, A., Talwar, K.: On the hereditary discrepancy of homogeneous arithmetic progressions. CoRR abs/1309.6034v1 (2013)
22. Roth, K.F.: Remark concerning integer sequence. Acta Arithmetica 9, 257260 (1964)
+
+
+
+
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
-
+
+
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
-
+
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
+
-
+
+
+
+
+
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
-
+
+
+
+
+
+
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
+
+
+
+
+
+
+
+
+
+
+
+
+